FastA



Introduction


The FastA suite of programs provides methods for searching sequence databases and comparing sequences, as well as some more general analysis programs. FastA is based on a word, or tuple, search method which gives it its speed. Tuples are normally 4 to 6 for nucleic acids and 1 or 2 for amino acids. However, FastA cannot see matches of less than the tuple size. Generally, FastA is considered to be more sensitive than Blast, but somewhat slower. The most sensitive search uses the Smith-Waterman algorithm, but this is much slower. Psi-Blast, which iterates the database search using profiles, has also increased sensitivity.

FastA:


Programs and Documentation

FastA through the Web ( Documentation)
FastA at the EBI FastA at Virginia FastA at DDBJ
Smith-waterman searches at the EBI
ssearch3 fast, "true" S-W blitz (Compugen)
GCG FastA Documentation
FastA SSearch TFastA
TFastX FastX FastA parsable output

Programs in the FastA suite (from the FastA documentation)
fasta3 Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequence database using the FASTA algorithm. Search speed and selectivity are controlled with the ktup(word size) parameter. For protein comparisons, ktup = 2 by default; ktup =1 is more sensitive but slower. For DNA comparisons, ktup=6 by default; ktup=3 or ktup=4 provides higher sensitivity; ktup=1 should be used for oligonucleotides (DNA query lengths < 20).
fastx3
fasty3
Compare a DNA sequence to a protein sequence database, by comparing the translated DNA sequence in three frames and allowing gaps and frame shifts. fastx3 uses a simpler, faster algorithm for alignments that allows frame shifts only between codons; fasty3 is slower but produces better alignments with poor quality sequences because frame shifts are allowed within codons.
tfastx3
tfasty3
Compare a protein sequence to a DNA sequence database, calculating similarities with frame shifts to the forward and reverse orientations.
tfasta3 Compare a protein sequence to a DNA sequence database, calculating similarities (without frame shifts) to the 3 forward and three reverse reading frames. tfastx3 and tfasty3 are preferred because they calculate similarity over frame shifts.
ssearch3p Compare a protein sequence to a protein sequence database or a DNA sequence to a DNA sequence database using the Smith-Waterman algorithm. ssearch3 is about 10-times slower than FASTA3, but is more sensitive for full-length protein sequence comparison.
fastf3
tfastf3
Compares an ordered peptide mixture, as would be obtained by Edman degradation of a CNBr cleavage of a protein, against a protein (fastf) or DNA (tfastf) database.
fastf3
tfastf3
Compares set of short peptide fragments, as would be obtained from mass-spec. analysis of a protein, against a protein (fasts) or DNA (tfasts) database.
prss3 Evaluates the significance of pairwise similarity scores using a Monte Carlo analysis. Similarity scores for the two sequences are calculated, and then the second sequence is shuffled 200 to 1000 times and compared with the first sequence. prss3 can use one of two shuffling strategies. One strategy simply keeps the amino acid composition of the entire shuffled sequence identical to the unshuffled sequence. The second, local shuffle, destroys the order but preserves the composition of small (10 - 25 residue) segments of the shuffled sequence.
lfasta local similarity searches showing local alignments.
lalign A rigorous local sequence alignment program that will display the N-best local alignments. It incorporates the "sim" algorithm described by Huang and Miller (1991) Adv. Appl. Math. 12:337-357.
plalign A version of lalign that plots the local alignments to a postscript file.
flalign a version of lalign that plots the local alignments to a GCG Figure file.
prdf2 Calculates the probability of a similarity score more accurately by using a fit to an extreme value distribution.
align Optimal global alignment of two sequences
align0 As align, but does not penalize for end-gaps.
randseq Produces a randomly shuffled sequence from a query sequence.
crandseq Produces a randomly shuffled DNA sequence keeping the codons intact.
aacomp Calculates the amino acid composition and molecular weight of a sequence.
bestscor Calculates the best self-comparison score.
fromgb convert from GenBank LOCUS format to Pearson/FASTA format.
grease Kyte-Doolittle hydropathicity profile
tgrease Graphic plot of Kyte-Doolittle profile.
garnier A secondary structure prediction program using the method of Garnier, Osgusthorpe, and Robson, J. Mol. Biol., (1978) 120:97-120.
zs_exp takes a z-score (mean 50, s.d. 10) and converts it to an expectation value using the extreme value distribution and the size of the database


Home Introduction Software I Want To... People
Research Education Web Resources Search

Last Modified 23/04/00 by BRUHK support; Copyright © 1999 The University of Hong Kong
This page has been accessed times.