Table of Contents
List of Tables
List of Examples
diagonalsw is a software that implements the Smith-Waterman algorithm with the SIMD instruction sets found in modern CPUs ( SSE4.1 for the x86 platform and AltiVec for the PowerPC platform ). The software is open source software. It is written in the programming languages C and C++ and it is licensed under the MIT license.
The primary URL for this document is http://diagonalsw.sourceforge.net.
Download the software from the sourceforge project page. The latest version of diagonalsw is 0.9.0.
As Intel Intel® Threading Building Blocks (TBB) version 2.2 is rather new and has not been packaged yet for all platforms, building from source code is the recommended way right now ( 2009-11-26 ). Try "make package" if you want to start building your own rpm or deb package. Related to the packaging of diagonalsw is the FAQ ( Is there a version of TBB that provides statically linked libraries? ).
To build diagonalsw on Unix ( e.g. Linux, MacOSX, CygWin ) you need to have this installed
To install TBB run something like this:
$ tar xfz /scratch/tbb22_20091101oss_src.tgz $ cd tbb22_20091101oss/ $ less README $ make
If you have installed Intel Intel® Threading Building Blocks (TBB) into a non-standard location, please first source the tbbvars.sh
, in other words something like this
$ . /home/myuser/tbb22_20091101oss/build/linux_intel64_gcc_cc4.3.3_libc2.9_kernel2.6.28_release/tbbvars.sh
If you have the diagonalsw source code in the directory /tmp/diagonalsw-0.9.0
and you want to install diagonalsw into the directory /tmp/install
, and you have installed gengetopt in the non-default location ~/gengetopt/inst/bin/
, then
first run cmake then make and then make install
$ mkdir /tmp/build $ cd /tmp/build $ cmake -DCMAKE_INSTALL_PREFIX=/tmp/install -DCMAKE_PREFIX_PATH=~/gengetopt/inst/bin/ /tmp/diagonalsw--0.9.0 && make && make install -- The C compiler identification is GNU -- Check for working C compiler: /usr/bin/gcc -- Check for working C compiler: /usr/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Configuring done -- Generating done -- Build files have been written to: /tmp/build [ 5%] Generating smith_waterman_sse_word.2loadEFH.c [ 10%] Generating cmdline_diagonalsw.c, cmdline_diagonalsw.h [ 15%] Generating smith_waterman_sse_byte.2load_HFsave8thnormal.c [ 20%] Generating smith_waterman_sse_word.2loadnormal.c [ 25%] Generating smith_waterman_sse_byte.2load_HFsave8thH.c [ 30%] Generating smith_waterman_sse_word.2loadH.c [ 35%] Generating smith_waterman_sse_byte.2load_HFsave8thEFH.c Scanning dependencies of target diagonalsw [ 40%] Building C object src/c/CMakeFiles/diagonalsw.dir/read_matrix.c.o [ 45%] Building C object src/c/CMakeFiles/diagonalsw.dir/read_sequence.c.o [ 50%] Building C object src/c/CMakeFiles/diagonalsw.dir/diagonalsw.c.o [ 55%] Building C object src/c/CMakeFiles/diagonalsw.dir/cmdline_diagonalsw.c.o [ 60%] Building C object src/c/CMakeFiles/diagonalsw.dir/create_profile.c.o [ 65%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_reference_impl.c.o [ 70%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_sse_byte.2load_HFsave8thnormal.c.o [ 75%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_sse_word.2loadnormal.c.o [ 80%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_sse_byte.2load_HFsave8thH.c.o [ 85%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_sse_word.2loadH.c.o [ 90%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_sse_byte.2load_HFsave8thEFH.c.o [ 95%] Building C object src/c/CMakeFiles/diagonalsw.dir/smith_waterman_sse_word.2loadEFH.c.o [100%] Building C object src/c/CMakeFiles/diagonalsw.dir/sse_funcs.c.o Linking C executable diagonalsw [100%] Built target diagonalsw
If you want to build the html documentation ( i.e. this page ) you need to pass the -DBUILD_DOCUMENTATION=ON option to cmake.
Type diagonalsw --help
to see the command line options
[user@linux ~]$ diagonalsw --help diagonalsw 0.9.0 A high-performance version of the Smith-Waterman alignment algorithm Usage: diagonalsw [OPTIONS]... -h, --help Print help and exit --version Print version and exit -d, --databasefile=filename A FASTA-formatted sequence database file -q, --queryfile=filename A single sequence file in FASTA-format -p, --profilefile=filename A profile file -s, --matrixfile=filename Integer similarity matrix file ( scoring matrix ) -i, --gapopen=INT Penalty for starting alignment gap (non-positive integer) (default=`-10') -e, --gapextend=INT Penalty for extending alignment gap (non-positive integer) (default=`-2') -u, --print-sequence-id In addition to the score, also print the sequence-id (default=off) -V, --verify-algorithm verify vector implementation against scalar implementation (default=off) -D, --disable-threading disable all threading (default=off) -t, --nr-of-threads=INT numbers of threads (normally autodetected) -T, --tokens-per-thread=INT numbers of tokens per thread (normally autodetected) (default=`5') -W, --sequences-per-compute-chunk=INT nr of sequences each thread are given per computation iteration ( doesn't influence the result but may influence the speed ) (default=`10') benchmarking: To get a more fair benchmark you should run a computation that lasts at least some minutes on a machine that is not under load by other processes. -b, --benchmark print computation speed (unit is GigaCells/s) (default=off) -B, --benchmark-verbose print computation speed (unit is GigaCells/s) and additional information such as user time, number of db sequences that needed to use the word sized function, length of querysequence, sum of db sequence lengths, number of db sequences (default=off) limits: the default limits should work fine in most cases. Warning: Increasing these values will increase the memory consumption of the program. -K, --max-sequence-name-length=INT Maximum allowed length of a sequence name (default=`256') -L, --max-query-sequence-length=INT Maximum allowed length of the query sequence (default=`2000') -M, --max-db-sequence-length=INT Maximum allowed length of any of the db sequences (default=`10000')
Table 1. diagonalsw input file formats
file format | short option | description |
---|---|---|
fasta format | -q | query file in fasta format Section 5.1, “Fasta format” |
fasta format | -d | database file in fasta format Section 5.1, “Fasta format” |
profile format | -p | query file in profile format Section 5.2, “Profile file format” |
scoring matrix file format | -s | Substitution matrix ( Scoring matrix ) Section 5.3, “Scoring matrix file format” |
Example 1. diagonalsw with query in fasta format
We use the file described in Example 6, “db.fasta, an example file in fasta format” as database file and we use the file described in Example 8, “blosum50.mat, an example file in the Scoring matrix file format” as scoring matrix.
[user@linux ~]$ cat query.fasta >MyInterestingProtein EHIATYYNDQMLKKPTWYVBZX
[user@linux ~]$ diagonalsw -q query.fasta -d db.fasta -s blosum50.mat 146 135 68
Example 2. diagonalsw with specified gap penalties
This example is the same as Example 1, “diagonalsw with query in fasta format” but with gap penalties specified. The penalty for starting an alignment gap was chosen to be -8 and the penalty for extending an alignment gap was chosen to be -1.
[user@linux ~]$ diagonalsw -q query.fasta -d db.fasta -s blosum50.mat -i -8 -e -1 146 135 74
Example 3. diagonalsw with query in profile file format
Note | |
---|---|
Specifying query in profile format is not implemented yet. The function read_profile() needs to be written. |
We use the file described in Example 7, “1R69.psi, an example file in the profile file format” as the query. We use the file described in Example 6, “db.fasta, an example file in fasta format” as database file and we use the file described in Example 8, “blosum50.mat, an example file in the Scoring matrix file format” as scoring matrix.
[user@linux ~]$ diagonalsw -p 1R69.psi -d db.fasta -s blosum50.mat error: Sorry. The profile parsing function read_profile() is not yet implemented
Example 4. diagonalsw with benchmarking
This example is the same as Example 1, “diagonalsw with query in fasta format” but with benchmarking. The computation speed Gigacells/second is printed.
[user@linux ~]$ diagonalsw -q query.fasta -d db.fasta -s blosum50.mat -b 0.000783
Note, db.fasta is too small ( containing only 3 sequences ) to give a fair result for Gigacells/second. See also Section 7, “Performance - benchmarked computation speed”.
Example 5. diagonalsw with verbose benchmarking
This example is the same as Example 1, “diagonalsw with query in fasta format” but with verbose benchmarking. Benchmarking information is printed.
[user@linux ~]$ diagonalsw -q query.fasta -d db.fasta -s blosum50.mat -B time = 0.00279107 nr of db sequences that needed to use the word sized function due to an overflow in the byte sized function = 0 nr of db sequences = 3 computation speed ( Giga cells/second ) = 0.000489
Note, db.fasta is too small ( containing only 3 sequences ) to give a fair result for Gigacells/second. See also Section 7, “Performance - benchmarked computation speed”.
This software package handles the following file formats
Read more about the Fasta format on Wikipedia.
Example 6. db.fasta, an example file in fasta format
The example file db.fasta contains
>Alpha EHIATYYNDQMLKKPTWYVBZX >Beta AHIATYYNDQMLKKPTWYVB >Gamma AYTMLKKYYNDQPTWYVBYP
Example 7. 1R69.psi, an example file in the profile file format
The example file 1R69.psi contains
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 1 S 1 -4 3 -3 -4 -3 -3 -4 -4 -4 -5 -3 -4 -5 -4 3 6 -6 -5 -3 13 0 12 0 0 0 0 0 0 0 0 0 0 0 0 25 50 0 0 0 1.12 1.44 2 I -4 -5 -6 -7 -4 -4 -6 -7 -6 5 3 -6 1 4 -4 -5 -4 -4 -2 3 0 0 0 0 0 1 0 0 0 31 30 0 3 16 1 0 0 0 1 17 1.04 1.63 3 S 2 -3 -2 -4 -4 -3 -4 6 -2 -6 -6 -1 -4 -5 -3 1 -3 -6 -4 -4 15 1 1 0 0 1 0 64 1 0 0 3 0 0 1 9 1 0 0 1 1.28 1.67 4 S 0 0 1 2 -3 3 4 -3 1 -4 -3 1 -2 -5 -2 0 -1 -6 -3 -5 8 4 6 11 0 11 28 1 3 1 2 10 1 0 1 8 3 0 1 0 0.51 1.69 5 R -3 7 1 -3 -5 0 -2 -5 -2 -3 -4 3 -3 -4 -5 -3 -2 -3 -3 -3 1 62 6 1 0 3 1 0 1 2 1 15 0 1 0 1 2 0 1 1 1.33 1.68 6 V -4 -6 -6 -6 -4 -6 -6 -7 -6 6 4 -6 0 -1 -6 -5 -3 -5 -4 2 0 0 0 0 0 0 0 0 0 45 38 0 2 2 0 0 0 0 0 11 1.27 1.68 7 K 0 5 -2 -3 -4 0 -1 -5 -2 -3 -4 5 -4 -4 -5 -1 -2 -4 -3 -3 7 35 2 1 0 3 4 0 1 1 2 36 0 0 0 3 2 0 1 2 0.98 1.71 8 S 1 1 -1 -1 -5 3 3 -3 1 -2 -1 1 0 -1 -4 0 0 -3 1 -2 11 8 2 3 0 11 21 2 4 3 5 7 2 3 0 7 6 0 4 2 0.29 1.70 9 K 3 2 -4 -5 -2 -1 -1 -4 -2 1 3 -1 -1 -2 -4 -3 -3 1 -1 0 22 13 0 0 1 3 4 1 1 8 31 3 1 2 0 1 0 2 3 5 0.45 1.69 10 R -3 8 -4 -5 -3 -2 -3 -6 -4 -3 -2 -1 3 -4 -5 -3 -3 -6 -4 -4 1 82 0 0 1 0 0 0 0 2 4 1 7 1 0 1 1 0 0 1 1.84 1.68 11 I -1 1 0 -1 -2 1 3 -3 -2 -1 -1 4 0 -5 -3 -2 -1 0 -5 -3 4 6 4 3 1 7 22 1 1 5 6 28 2 0 1 1 5 1 0 2 0.50 1.68 12 Q 1 2 -1 -1 -3 2 3 -3 1 -2 -1 2 1 -4 -5 -1 -2 -2 -1 -3 14 10 2 3 0 9 19 1 4 3 8 13 3 1 0 4 2 1 2 1 0.32 1.68 13 L 1 3 0 -4 -1 -1 -3 -4 -1 -2 2 3 0 -2 -5 -2 -2 -3 -2 -2 13 18 5 0 1 3 1 1 2 2 24 19 3 2 0 3 2 0 1 2 0.44 1.67 14 G -2 -1 3 0 -3 -1 -2 6 0 -5 -6 1 -4 -6 -5 -1 -4 -6 -4 -5 2 4 14 4 1 3 2 53 2 1 0 11 0 0 0 3 0 0 0 0 1.04 1.64 15 L -4 -3 -6 -6 -3 -4 -4 -6 -3 2 5 -4 5 0 -6 -5 -3 3 1 -1 1 1 0 0 0 0 1 0 1 9 51 0 18 4 0 0 1 5 4 3 1.05 1.63 16 N -3 -2 0 -3 -4 -3 -3 -3 -2 -4 -4 0 -4 -5 -3 4 6 -6 -5 -3 0 1 4 1 0 0 1 1 1 0 1 6 0 0 1 34 50 0 0 1 1.17 1.62 17 Q -1 0 -3 -3 -5 8 -1 -3 -2 -2 0 -1 0 -5 -3 -3 -3 -5 -4 -2 4 4 0 0 0 67 1 2 0 3 9 1 2 0 1 1 1 0 0 2 1.35 1.63 18 A 1 2 -1 1 -5 2 3 -2 -1 -4 -4 2 -1 -5 -2 1 0 -6 -2 -2 14 10 2 6 0 10 19 3 1 1 2 11 1 0 2 9 6 0 1 3 0.36 1.64 19 E 1 -2 -1 3 -6 4 5 -3 -1 -5 -5 0 -2 -5 -4 -1 -2 -6 -2 -3 12 2 2 13 0 16 38 1 1 0 0 5 1 0 0 3 2 0 1 1 0.80 1.64 20 L -3 -5 -6 -7 -4 -5 -6 -7 -6 1 6 -6 1 2 -6 -5 -3 -5 -3 1 2 0 0 0 0 0 0 0 0 4 73 0 3 9 0 0 1 0 0 8 1.30 1.63 21 A 6 -3 -4 -5 -1 -4 -2 1 -5 -5 -5 -3 -4 -5 -4 0 -3 -6 -5 -2 79 1 0 0 1 0 2 10 0 0 0 1 0 0 0 4 0 0 0 2 1.39 1.63 22 Q -1 2 0 2 -6 1 4 -2 0 -3 -3 2 -3 -4 -4 0 -1 -6 -3 -3 5 10 3 13 0 5 28 3 2 1 2 14 1 1 0 5 3 0 1 2 0.51 1.61 23 K 1 4 -3 -3 -3 1 0 -3 -1 0 1 3 1 -3 -4 -2 -3 -3 -2 -2 14 20 1 1 0 5 4 1 2 5 14 20 3 1 0 3 1 0 2 2 0.43 1.71 24 V 2 -5 -5 -4 2 -5 -5 -3 -5 3 3 -5 2 -1 -5 0 1 -5 -4 3 17 0 0 1 4 0 0 2 0 15 24 0 4 2 0 7 7 0 0 17 0.53 1.67 25 G -3 -2 1 1 -3 -3 -2 6 -1 -6 -5 -2 -5 -4 -5 -1 -3 -6 -3 -5 1 2 6 6 1 1 2 71 1 0 1 2 0 1 0 3 1 0 1 1 1.39 1.66 26 T -3 -3 -6 -4 0 -3 -5 -6 -6 4 2 -3 2 -1 -6 -3 -1 1 -3 5 1 1 0 1 2 1 0 0 0 23 15 1 4 2 0 1 4 2 1 40 0.84 1.68 27 T 0 -2 0 1 -2 -2 -2 -1 0 -3 -5 1 -4 -6 -1 5 3 -6 -5 -3 5 2 3 6 1 1 2 4 3 1 1 8 0 0 3 40 21 0 0 1 0.67 1.69 28 Q -1 4 0 -2 -3 4 0 -3 -1 -2 -2 2 -4 0 1 -2 -2 -2 0 -1 5 20 4 1 0 19 5 1 1 2 3 13 0 4 7 2 3 1 4 4 0.43 1.67 29 Q 0 0 1 -2 -5 4 -1 1 -2 -4 -5 -1 -1 -6 1 3 0 -1 -3 -2 7 4 8 1 0 22 3 10 1 1 1 3 2 0 8 21 7 1 1 3 0.47 1.67 30 S 2 -2 -2 -4 -4 2 -2 -3 -1 -2 -3 -2 1 -1 -4 2 4 0 2 -2 18 1 2 0 0 9 2 1 1 1 2 2 4 3 0 15 28 1 7 2 0.50 1.68 31 I -3 -6 -6 -6 -2 -2 -6 -6 -5 6 2 -5 1 -2 -6 -5 -3 -1 1 3 1 0 0 0 1 2 0 0 0 50 15 0 2 1 0 0 1 1 5 19 1.12 1.68 32 E 0 -1 0 -3 -3 2 -2 0 1 -3 -3 -2 -2 -3 -4 5 -1 0 -1 -4 7 2 5 1 0 10 1 8 3 1 2 2 1 1 0 48 2 2 3 1 0.68 1.69 33 Q -2 3 3 0 -6 3 -1 -2 1 -4 -2 4 0 -3 -5 -1 -3 -6 -2 -4 3 15 15 5 0 13 2 3 3 1 5 24 2 1 0 4 1 0 2 1 0.54 1.69 34 L -3 -6 -6 -5 -5 -3 -6 -6 -2 5 1 -6 0 0 -6 -4 -3 8 5 0 2 0 0 1 0 1 0 0 1 30 14 0 2 2 0 1 1 22 19 6 1.26 1.69 35 E -3 -2 -3 -1 -2 -1 7 -5 -3 -4 -2 -1 -2 -3 -4 -3 -3 -6 -5 -4 1 1 0 1 1 1 85 0 0 1 5 2 1 1 0 0 1 0 0 1 1.70 1.68 36 N -1 4 4 -2 -3 0 -3 -3 1 -4 -3 2 -3 -6 -5 2 0 -6 -3 -4 6 23 22 1 1 3 1 1 3 1 2 12 1 0 0 17 6 0 1 1 0.66 1.68 37 G -2 -2 2 0 -5 -2 -2 6 0 -6 -6 -3 -3 -6 -5 -1 -3 -6 -4 -4 3 2 9 6 0 1 2 67 2 0 0 1 1 0 0 4 1 0 1 1 1.31 1.65 38 K -1 3 0 0 -5 1 2 -2 -1 -2 -1 4 -2 -4 -3 -2 0 -6 -3 -2 4 16 3 4 0 6 15 3 2 3 6 25 1 1 1 1 5 0 1 3 0.44 1.67 39 T 0 3 2 -2 -5 -1 -2 -2 -2 0 -3 -1 -1 -3 -3 3 4 -5 -1 -1 7 14 8 2 0 2 3 2 1 5 2 2 2 1 1 20 23 0 2 4 0.42 1.69 40 K -2 2 1 -2 -6 0 -2 -2 -3 -2 -2 1 -2 -3 6 -1 -2 -6 -4 -2 2 12 7 2 0 5 2 3 0 2 5 9 1 1 42 3 2 0 1 2 0.87 1.72 41 R -1 2 2 1 -2 -2 -2 0 -2 -1 -3 0 -2 -4 2 3 0 -6 -3 -2 4 13 10 7 1 2 2 8 1 3 3 5 1 1 10 21 6 0 1 3 0.32 1.76 42 P 1 -2 -3 -2 -3 -2 -2 -1 -3 1 0 -1 0 -2 3 3 0 -3 -2 -1 11 2 1 2 0 2 2 4 1 8 9 3 2 2 15 24 6 0 1 5 0.30 1.77 43 R 0 1 -1 2 -5 -1 3 -1 -1 -3 -3 1 -2 -4 -2 2 0 -1 -3 -2 8 8 3 12 0 2 17 6 1 1 3 7 1 0 2 17 6 1 1 2 0.30 1.77 44 F -2 1 2 0 -4 -1 -1 -4 0 -2 -1 2 0 1 -3 0 3 -2 1 -1 4 8 8 5 0 3 3 1 2 2 7 13 3 6 1 7 19 1 5 4 0.24 1.78 45 L -2 -3 -4 -6 -4 -4 -5 -6 -6 3 5 -4 0 -1 -6 -4 -2 -1 -3 1 3 2 1 0 0 0 1 0 0 13 63 1 2 1 0 1 3 1 1 8 1.04 1.79 46 P 1 -1 -1 -2 -2 1 0 -3 0 2 1 0 1 0 0 -1 -2 0 0 1 12 3 3 2 1 7 7 2 2 11 11 6 3 5 4 5 2 1 3 11 0.11 1.82 47 E 1 2 -1 0 -4 2 2 -3 -3 -4 -3 4 0 -6 -3 0 -2 -6 -3 -3 14 11 3 5 0 9 15 2 0 1 3 22 2 0 1 6 2 0 1 2 0.45 1.82 48 L -3 -4 -6 -7 -1 -6 -5 -7 -6 5 4 -6 1 0 -6 -5 -3 -1 -2 1 2 1 0 0 1 0 0 0 0 39 43 0 3 3 0 0 1 1 1 6 1.16 1.80 49 A 6 -3 -4 -4 2 -3 -3 -2 -3 -3 -1 -2 -3 -3 -4 2 -2 -6 -4 -3 65 1 0 1 4 0 1 2 0 1 5 2 0 1 0 15 1 0 0 1 1.09 1.80 50 S 0 2 2 2 -4 2 3 -1 0 -3 -4 3 -2 -5 -4 -1 -1 -4 -5 -4 8 12 9 10 0 8 16 4 2 2 2 18 1 0 0 5 4 0 0 1 0.42 1.80 51 A 4 -2 -4 -5 0 -2 -2 -2 -3 1 0 -1 -1 1 -5 -2 -1 -5 0 2 41 2 1 0 2 2 2 2 1 8 8 3 1 5 0 2 3 0 4 14 0.52 1.81 52 L -4 -4 -6 -6 1 -4 -5 -6 -3 -1 6 -5 0 4 -6 -5 -2 -4 0 -2 0 1 0 0 3 0 1 0 1 2 68 1 1 17 0 1 2 0 3 1 1.25 1.79 53 G -3 -2 3 2 -6 0 1 5 1 -6 -6 0 -4 -6 -4 -1 -4 -6 -6 -5 1 2 15 13 0 4 8 43 3 0 0 5 0 0 1 4 0 0 0 0 0.90 1.81 54 V -2 -3 -6 -6 3 -5 -5 -5 -4 3 -1 -3 0 -2 -6 -4 0 -2 -2 6 2 2 0 0 6 0 0 1 0 14 2 1 2 1 0 1 6 1 1 59 1.13 1.81 55 S -2 -1 2 3 -4 -2 -1 -3 -1 -4 -5 -2 -5 -6 2 4 2 -6 -5 -4 2 3 9 16 0 1 4 1 1 1 1 2 0 0 12 35 12 0 0 1 0.71 1.80 56 V 1 -3 -4 -4 -4 -4 -2 -6 -3 3 1 -4 0 -3 4 -3 0 -6 -2 3 10 1 0 0 0 0 2 0 1 15 12 1 2 1 22 2 5 0 1 24 0.64 1.80 57 D -1 -3 1 5 -3 -1 2 -2 0 -3 -1 -2 0 -4 -3 1 0 -6 -3 -3 5 1 6 29 1 2 14 3 3 2 7 2 3 1 1 12 5 0 1 2 0.48 1.80 58 W -2 -3 -4 -1 -6 0 1 -4 -1 -3 -4 -3 -2 2 -6 -2 -2 9 6 -2 4 1 0 4 0 4 8 1 1 1 1 1 1 8 0 3 3 30 25 2 1.34 1.80 59 L -5 -3 -6 -7 -5 -5 -5 -7 -6 0 6 -6 0 4 -6 -5 -4 -4 -3 -1 0 1 0 0 0 0 1 0 0 4 72 0 1 16 0 0 1 0 0 3 1.40 1.77 60 L 0 -3 -5 -4 -4 -1 -1 -5 -3 2 4 -1 1 4 -5 -3 -3 -4 -1 0 7 1 0 1 0 3 5 0 0 10 35 3 3 20 0 1 1 0 2 6 0.61 1.75 61 N -1 0 0 -1 -3 -2 2 2 1 -4 -4 -1 -1 0 -2 1 3 -5 1 -4 5 5 5 3 0 1 12 16 3 1 1 4 1 4 2 9 22 0 5 1 0.33 1.73 62 G -2 -1 -2 -1 -5 -3 -1 6 -5 -4 -3 -2 -4 -2 -4 0 -1 -3 -4 -5 2 4 2 3 0 1 5 66 0 1 3 2 0 2 1 5 3 0 1 1 1.17 1.73 63 T -1 0 1 1 -5 0 3 -4 -1 -1 -1 1 -1 -3 -1 1 2 -6 -2 -1 5 5 6 9 0 3 19 1 1 4 6 7 1 1 4 9 14 0 1 5 0.25 1.71 64 S -2 -1 1 3 -3 -1 1 4 -2 -6 -5 -2 -4 -4 0 1 -1 -6 -4 -4 2 3 5 16 0 2 10 36 1 0 1 2 0 1 5 10 4 0 1 1 0.64 1.69 65 D -1 -2 1 4 -6 0 4 -4 -2 -3 -4 -1 -3 -6 0 -1 2 -6 -4 -4 4 2 5 26 0 4 31 1 1 1 1 3 1 0 4 4 11 0 0 1 0.73 1.68 66 S 1 -3 -3 -3 -5 -2 -2 -3 -3 -2 -3 -2 6 -3 5 2 1 -6 -5 -3 10 1 1 1 0 2 3 1 0 2 1 2 20 1 31 13 10 0 0 1 0.85 1.63 67 N 2 -1 5 2 -5 1 0 -3 0 -4 -5 -1 -4 -3 -1 1 -1 -6 -1 -3 21 3 31 10 0 5 4 0 2 0 0 4 0 1 2 9 3 0 3 1 0.61 1.46 68 V 3 -3 -2 -3 -4 -3 -2 -4 -5 3 0 -3 -2 -4 -1 -2 -1 -5 0 4 26 1 2 1 0 1 3 0 0 17 9 1 0 0 4 2 3 0 3 28 0.51 1.33 69 R -4 7 -3 -4 -6 2 -2 -4 -3 -5 -5 4 -4 -5 -4 -3 -3 -5 -4 -5 0 69 0 0 0 9 0 0 0 0 0 22 0 0 0 0 0 0 0 0 1.54 1.06 K Lambda Standard Ungapped 0.1251 0.3089 Standard Gapped 0.0410 0.2670 PSI Ungapped 0.1906 0.3174 PSI Gapped 0.0583 0.2670
Example 8. blosum50.mat, an example file in the Scoring matrix file format
The example file blosum50.mat contains
# Matrix made by matblas from blosum50.iij # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 50 # Entropy = 0.4808, Expected = -0.3573 A R N D C Q E G H I L K M F P S T W Y V B Z X A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0 -2 -1 -1 R -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3 -1 0 -1 N -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3 4 0 -1 D -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4 5 1 -1 C -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -3 -3 -2 Q -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3 0 4 -1 E -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3 1 5 -1 G 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 -1 -2 -2 H -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4 0 0 -1 I -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4 -4 -3 -1 L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1 -4 -3 -1 K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3 0 1 -1 M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1 -3 -1 -1 F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1 -4 -4 -2 P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3 -2 -1 -2 S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2 0 0 -1 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0 0 -1 0 W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3 -5 -2 -3 Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1 -3 -2 -1 V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5 -4 -3 -1 B -2 -1 4 5 -3 0 1 -1 0 -4 -4 0 -3 -4 -2 0 0 -5 -3 -4 5 2 -1 Z -1 0 0 1 -3 4 5 -2 0 -3 -3 1 -1 -4 -1 0 -1 -2 -2 -3 2 5 -1 X -1 -1 -1 -1 -2 -1 -1 -2 -1 -1 -1 -1 -1 -2 -2 -1 0 -3 -1 -1 -1 -1 -1
The elements in the score matrix are calculated in another order than for instance in the software projects in Section 8.1, “Striped Smith-Waterman by Farrar” and Section 8.2, “swps3”. In the diagonalsw implementaion the elements will be calculatad as a diagonal vector ( of length 8 or 16 ) slides along the db sequence axis. Reaching the end it iterates the whole thing 8 or 16 positions further down the query sequence axis.
The alignment cost is precalculated so that an alignment cost vector for a certain db sequence element can be loaded as a column. These values must later be transferend into a sliding diagonal vector. If k is the length of the vectors, one diagonal vector will contain values from k column vectors. Or a bit more formally:
Let k be the length of the vectors: If the first load is the vector: (0,0),(0,1),(0,2),...,(0,k) and the n:th load (n,0),(n,1),(n,2),...,(n,k) we need to create a vector with (n,0),(n-1,1),...,(n-k,k)
The problem can be solved by loading column vectors and moving the elements around with the _mm_blendv_epi8 instruction. First we initialize some variables ( byte-version, 16 elements per vector )
const __m128i mask1=_mm_set_epi8(0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255); const __m128i mask2=_mm_set_epi8(0,0,0,0,255,255,255,255,0,0,0,0,255,255,255,255); const __m128i mask3=_mm_set_epi8(0,0,255,255,0,0,255,255,0,0,255,255,0,0,255,255); const __m128i mask4=_mm_set_epi8(0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255); __m128i v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15;
Then for each step the variable v_score_load is loaded with the new column elements.
v15 = v_score_load; v7 = _mm_blendv_epi8(v7,v15,mask1); v3 = _mm_blendv_epi8(v3,v7,mask2); v1 = _mm_blendv_epi8(v1,v3,mask3); v0 = _mm_blendv_epi8(v0,v1,mask4); v_diagonal = v0;
In the next step the variable name numbers in v0,v1,v3,v7,v15 are incremented by 1 and adjusted to be in the range 0 to 15. In other words: newnumber = ( oldnumber + 1 ) modulo 16.
v0 = v_score_load; v8 = _mm_blendv_epi8(v8,v0,mask1); v4 = _mm_blendv_epi8(v4,v8,mask2); v2 = _mm_blendv_epi8(v2,v4,mask3); v1 = _mm_blendv_epi8(v1,v2,mask4); v_diagonal = v1;
After 16 iterations the v_diagonal will start to contain the correct diagonal values. This algorithm has one column load per iteration. It can be sligthly modified to to use two column loads per iteration, leading to one _mm_blendv_epi8 instruction less per iteration.
A different method is used for the Altivec version. Here the instruction vec_perm is used. The values are moving through some queue vectors. First we initialize some variables ( this is the word-version, 8 elements per vector )
vector unsigned short v_score; vector unsigned short v_score_q1; vector unsigned short v_score_q2; vector unsigned short v_score_q3; vector unsigned short v_score_load; vector unsigned char queue1_to_score = (vector unsigned char)(16,17,2,3,4,5,6,7,8,9,10,11,12,13,14,15); vector unsigned char queue2_to_queue1 = (vector unsigned char)(0,1,18,19,4,5,6,7,8,9,10,11,12,13,14,15); vector unsigned char queue3_to_queue2 = (vector unsigned char)(16,16,16,16,16,21,16,0,16,1,16,2,16,3,16,4); vector unsigned char queue3_with_load = (vector unsigned char)(23,5,6,7,8,25,9,10,11,27,12,13,29,14,31,16);
Then for each step the variable v_score_load is loaded with the new column elements.
v_score = vec_perm(v_score_q1, v_score_load, queue1_to_score); v_score_q1 = vec_perm(v_score_q2, v_score_load, queue2_to_queue1); v_score_q2 = vec_perm(v_score_q3, v_score_load, queue3_to_queue2); v_score_q3 = vec_perm(v_score_q3, v_score_load, queue3_with_load);
v_score will contain the diagonal elements.
As the sequences in the db sequence database can be analyzed independently, diagonalsw could be parallized in a straightforward way. The initial attempt was done with a tbb::pipeline from the open source library Intel Intel® Threading Building Blocks (TBB). It doesn't scale linearly to the number of cores yet. It is a bit less. Right now there is roughly 3.4 times speedup on a Intel Core2 Quad. Maybe our threading implementation needs some more fine tuning or we need to implement it in another way.
You could test the parallelization speedup by adding a -D ( --disable-threading ). If it does not scale linearly to the number of cores you will get a better computation speeds if you run multiple processes instead.
This has not been investigated that much. Up to 9 GigaCells/second has been seen on a "Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz" with a binary built with the Intel Compiler. Note that the computation speed depends strongly how often the byte function overflows and also on the length of the query sequence. See also Example 4, “diagonalsw with benchmarking” and Example 5, “diagonalsw with verbose benchmarking”.
In the article published 2006, Striped Smith-Waterman speeds database searches six times over other SIMD implementations [farrar:2006], Michael Farrar describes a new clever way calculate the score matrix. Here SIMD vectors don't contain adjacent matrix elements, instead the matrix is striped. A SSE version was included in ssearch in the fasta software package ( see also Section 8.3, “ssearch in the fasta software package” ).
A SSE version was also released in the software package found at Smith-Waterman for the Cell Broadband Engine. That version also adds support for the Cell Broadband Engine. The software package is not open source but the source code can be downloaded. To get a permission to use the software you need to contact Michael Farrar ( read more in the file release/COPYRIGHT from the striped.tgz ).
swps3 is an open source reimplementation of Farrar's algorithm on IBM Cell/BE and 86/SSE by Adam Szalkowski, Christian Ledergerber, Philipp Krähenbühl and Christophe Dessimoz. The software is presented in the technical note SWPS3 - fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and 86/SSE [szalkowski:2008]. It is licensed under the MIT license. Unfortunately swps3 ( as of 2009-07-26 ) contains a bug making it sometimes calculate the wrong result. The bug has been reported to Adam Szalkowski who acknowledges that the bug is reproducible.
ssearch ( as of 2009-08-01 named ssearch_35 ) in the fasta software package implements the Smith-Waterman algorithm. This software is not open source but the source code can be downloaded. To get a permission to use the software you need to contact David Hudson, Assistant Provost for Research, University of Virginia ( read more in the file fasta-35.4.9/COPYRIGHT ). The SSE version is written by Farrar ( see also Section 8.1, “Striped Smith-Waterman by Farrar” ) and the AltiVec version is written by Erik Lindahl. The Altivec version is the same as the one found in diagonalsw.