Package Biskit :: Package Mod :: Module NCBIStandalone
[hide private]
[frames] | no frames]

Module NCBIStandalone

source code

This module provides code to work with the standalone version of BLAST, either blastall or blastpgp, provided by the NCBI. http://www.ncbi.nlm.nih.gov/BLAST/

Classes: LowQualityBlastError Except that indicates low quality query sequences. BlastParser Parses output from blast. BlastErrorParser Parses output and tries to diagnose possible errors. PSIBlastParser Parses output from psi-blast. Iterator Iterates over a file of blast results.

_Scanner Scans output from standalone BLAST. _BlastConsumer Consumes output from blast. _PSIBlastConsumer Consumes output from psi-blast. _HeaderConsumer Consumes header information. _DescriptionConsumer Consumes description information. _AlignmentConsumer Consumes alignment information. _HSPConsumer Consumes hsp information. _DatabaseReportConsumer Consumes database report information. _ParametersConsumer Consumes parameters information.

Functions: blastall Execute blastall. blastpgp Execute blastpgp. rpsblast Execute rpsblast.

Classes [hide private]
  LowQualityBlastError
Error caused by running a low quality sequence through BLAST.
  ShortQueryBlastError
Error caused by running a short query sequence through BLAST.
  _Scanner
Scan BLAST output from blastall or blastpgp.
  BlastParser
Parses BLAST data into a Record.Blast object.
  PSIBlastParser
Parses BLAST data into a Record.PSIBlast object.
  _HeaderConsumer
  _DescriptionConsumer
  _AlignmentConsumer
  _HSPConsumer
  _DatabaseReportConsumer
  _ParametersConsumer
  _BlastConsumer
  _PSIBlastConsumer
  Iterator
Iterates over a file of multiple BLAST results.
  _BlastErrorConsumer
  BlastErrorParser
Attempt to catch and diagnose BLAST errors while parsing.

Functions [hide private]
  blastall(blastcmd, program, database, infile, **keywds)
Execute and retrieve data from blastall.
  blastpgp(blastcmd, database, infile, **keywds)
Execute and retrieve data from blastpgp.
  rpsblast(blastcmd, database, infile, **keywds)
Execute and retrieve data from standalone RPS-BLAST.
  _re_search(regex, line, error_msg)
  _get_cols(line, cols_to_get, ncols=None, expected={})
  _safe_int(str)
  _safe_float(str)

Function Details [hide private]

blastall(blastcmd, program, database, infile, **keywds)

source code 

Execute and retrieve data from blastall.  blastcmd is the command
used to launch the 'blastall' executable.  program is the blast program
to use, e.g. 'blastp', 'blastn', etc.  database is the path to the database
to search against.  infile is the path to the file containing
the sequence to search with.

You may pass more parameters to **keywds to change the behavior of
the search.  Otherwise, optional values will be chosen by blastall.

    Scoring
matrix              Matrix to use.
gap_open            Gap open penalty.
gap_extend          Gap extension penalty.
nuc_match           Nucleotide match reward.  (BLASTN)
nuc_mismatch        Nucleotide mismatch penalty.  (BLASTN)
query_genetic_code  Genetic code for Query.
db_genetic_code     Genetic code for database.  (TBLAST[NX])

    Algorithm
gapped              Whether to do a gapped alignment. T/F (not for TBLASTX)
expectation         Expectation value cutoff.
wordsize            Word size.
strands             Query strands to search against database.([T]BLAST[NX])
keep_hits           Number of best hits from a region to keep.
xdrop               Dropoff value (bits) for gapped alignments.
hit_extend          Threshold for extending hits.
region_length       Length of region used to judge hits.
db_length           Effective database length.
search_length       Effective length of search space.

    Processing
filter              Filter query sequence?  T/F
believe_query       Believe the query defline.  T/F
restrict_gi         Restrict search to these GI's.
nprocessors         Number of processors to use.

    Formatting
html                Produce HTML output?  T/F
descriptions        Number of one-line descriptions.
alignments          Number of alignments.
align_view          Alignment view.  Integer 0-6.
show_gi             Show GI's in deflines?  T/F
seqalign_file       seqalign file to output.

Returns:
read, error Undohandles

blastpgp(blastcmd, database, infile, **keywds)

source code 

Execute and retrieve data from blastpgp.  blastcmd is the command
used to launch the 'blastpgp' executable.  database is the path to the
database to search against.  infile is the path to the file containing
the sequence to search with.

You may pass more parameters to **keywds to change the behavior of
the search.  Otherwise, optional values will be chosen by blastpgp.

    Scoring
matrix              Matrix to use.
gap_open            Gap open penalty.
gap_extend          Gap extension penalty.
window_size         Multiple hits window size.
npasses             Number of passes.
passes              Hits/passes.  Integer 0-2.

    Algorithm
gapped              Whether to do a gapped alignment.  T/F
expectation         Expectation value cutoff.
wordsize            Word size.
keep_hits           Number of beset hits from a region to keep.
xdrop               Dropoff value (bits) for gapped alignments.
hit_extend          Threshold for extending hits.
region_length       Length of region used to judge hits.
db_length           Effective database length.
search_length       Effective length of search space.
nbits_gapping       Number of bits to trigger gapping.
pseudocounts        Pseudocounts constants for multiple passes.
xdrop_final         X dropoff for final gapped alignment.
xdrop_extension     Dropoff for blast extensions.
model_threshold     E-value threshold to include in multipass model.
required_start      Start of required region in query.
required_end        End of required region in query.

    Processing
XXX should document default values
program             The blast program to use. (PHI-BLAST)
filter              Filter query sequence with SEG?  T/F
believe_query       Believe the query defline?  T/F
nprocessors         Number of processors to use.

    Formatting
html                Produce HTML output?  T/F
descriptions        Number of one-line descriptions.
alignments          Number of alignments.
align_view          Alignment view.  Integer 0-6.
show_gi             Show GI's in deflines?  T/F
seqalign_file       seqalign file to output.
align_outfile       Output file for alignment.
checkpoint_outfile  Output file for PSI-BLAST checkpointing.
restart_infile      Input file for PSI-BLAST restart.
hit_infile          Hit file for PHI-BLAST.
matrix_outfile      Output file for PSI-BLAST matrix in ASCII.
align_infile        Input alignment file for PSI-BLAST restart.

Returns:
read, error Undohandles

rpsblast(blastcmd, database, infile, **keywds)

source code 

Execute and retrieve data from standalone RPS-BLAST.  blastcmd is the
command used to launch the 'rpsblast' executable.  database is the path
to the database to search against.  infile is the path to the file
containing the sequence to search with.

You may pass more parameters to **keywds to change the behavior of
the search.  Otherwise, optional values will be chosen by rpsblast.

Please note that this function will give XML output by default, by
setting align_view to seven (i.e. command line option -m 7).
You should use the NCBIXML.BlastParser() to read the resulting output.
This is because NCBIStandalone.BlastParser() does not understand the
plain text output format from rpsblast.

WARNING - The following text and associated parameter handling has not
received extensive testing.  Please report any errors we might have made...

    Algorithm/Scoring
gapped              Whether to do a gapped alignment.  T/F
multihit            0 for multiple hit (default), 1 for single hit
expectation         Expectation value cutoff.
range_restriction   Range restriction on query sequence (Format: start,stop) blastp only
                    0 in 'start' refers to the beginning of the sequence
                    0 in 'stop' refers to the end of the sequence
                    Default = 0,0
xdrop               Dropoff value (bits) for gapped alignments.
xdrop_final         X dropoff for final gapped alignment (in bits).
xdrop_extension     Dropoff for blast extensions (in bits).
search_length       Effective length of search space.
nbits_gapping       Number of bits to trigger gapping.
protein             Query sequence is protein.  T/F
db_length           Effective database length.

    Processing
filter              Filter query sequence with SEG?  T/F
case_filter         Use lower case filtering of FASTA sequence T/F, default F
believe_query       Believe the query defline.  T/F
nprocessors         Number of processors to use.
logfile             Name of log file to use, default rpsblast.log

    Formatting
html                Produce HTML output?  T/F
descriptions        Number of one-line descriptions.
alignments          Number of alignments.
align_view          Alignment view.  Integer 0-9.
show_gi             Show GI's in deflines?  T/F
seqalign_file       seqalign file to output.
align_outfile       Output file for alignment.

Returns:
read, error Undohandles

_re_search(regex, line, error_msg)

source code 

_get_cols(line, cols_to_get, ncols=None, expected={})

source code 

_safe_int(str)

source code 

_safe_float(str)

source code