Parallel BLAST - help

Ting ting at
Tue Apr 16 18:08:59 EDT 2002

Hello, All,

  I have three nodes Beowulf cluster MPI environment up and running now.
  And download the FASTA from NCBI on the master node.
  I successful wrote a code to break the data,
  but unfortunately I could not have the runable code to get the
  data back from the nodes to the host(master). :-(

  Can anyone give me some suggestion or web site that I
  can have the runable code to use?  It would help me a lot.

  Thank you very much.


-----Original Message-----
From: Steve Gaudet
Sent: Monday, April 15, 2002 11:12 AM
To: 'William R. Pearson'; beowulf at
Subject: RE: Parallel BLAST

> -----Original Message-----
> From: William R. Pearson
> Sent: Sunday, April 14, 2002 10:32 PM
> To: beowulf at
> Subject: Parallel BLAST
> > Why is it that BLAST is not available for MPI/PVM?  I would think
> > clusters would be the prefect host for such an application.
> > Is it there is no need because BLAST is already so fast and
> > no one wants to break the database out onto node-resident disks?
> > Or is it that BLAST is kept running on single processor or
> shared memory
> > machines BLAST so that the DB is always in memory ready to
> roll without
> > loading and doing the same for a cluster is not worth it
> > because the same trick is difficult to do on a node given
> the current
> > way clusters are built?  I assume the same is true for FASTA?
> I suspect that BLAST is not available for MPI/PVM because (1) it is
> too fast, and (2) there is not much demand for it.
> 95% of the time, BLAST is almost an in-memory grep (the other 5% of
> the time it is working on the things it is looking for).  Sequence
> comparison is embarrassingly parallel, and very easily threaded.
> Distributing the sequence databases and collecting results has more
> overhead (there probably aren't many distributed grep programs
> either).  FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is
> another 5-20X slower than FASTA.  Here, the communications overhead is
> low, and distributed systems work OK for FASTA, and great for
> Smith-Waterman (where the overhead fraction is very small).
> Of course, it is a lot easier to compile a threaded program, and just
> run it, than it is to install and configure the MPI or PVM environment
> and the programs to run in it.  Bioinformatics software is often run
> by computer savvy biologists, not high-performance computing folks,
> and not having to install and configure PVM/MPI is a big advantage.
> The NCBI probably does not make a PVM/MPI parallel BLAST because there
> is very little demand for it, and it does not meet their computational
> needs.

There's also a commerical version from Turbogenomics.


1) Ready to go, plug-n-play solution for parallel BLAST
2) Expertise and 20+ years of experience in parallel computing
3) Dynamic database splitting feature to take advantage of computers that
have less memory than the size of the database
4) Smart load balancing - achieve linear to superlinear speedup
5) No modification made to the NCBI BLAST algorithm to ensure identical
results with the non-parallel version
6) Easy drop-in update whenever NCBI releases newer versions of their
7) Excellent support
8) 30-days money back guarantee


Steve Gaudet
Linux Solutions Engineer

| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet at  |
|                            web: |

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list