Biskit :: Mod :: TemplateSearcher :: TemplateSearcher :: Class TemplateSearcher
[hide private]
[frames] | no frames]

Class TemplateSearcher

source code


Take a sequence and return a list of files of nonredundant PDB homologues (selecting for best resolution).

Instance Methods [hide private]
  __init__(self, outFolder='.', clusterLimit=20, verbose=1, log=None, silent=0)
  prepareFolders(self)
Create folders needed by this class.
{'pdb':str, 'chain':str } ] getSequenceIDs(self, blast_records)
Extract sequence ids (pdb codes and chain ID) from BlastParser result.
{ str: Bio.Fasta.Record } fastaFromIds(self, db, id_lst, fastaOut=None)
Use:
open file handle getLocalPDB(self, id, db_path=settings.pdb_path)
Get the coordinate file from a local pdb database.
open file handle getRemotePDB(self, id, rcsb_url=settings.rcsb_url)
Get the coordinate file remotely from the RCSB.
{'resolution':float } __extractPDBInfos(self, handle)
Extract extra infos from PDB file.
[str] retrievePDBs(self, outFolder=None, pdbCodes=None)
Get PDB from local database if it exists, if not try to download the coordinartes drom the RSCB.
Bio.Fasta.Record selectFasta(self, ids_in_cluster)
select one member of cluster of sequences.
  reportClustering(self, raw=None)
Report the clustering result.
{str, str} saveClustered(self, outFolder=None)
Copy best PDB of each cluster into another folder.

Inherited from SequenceSearcher.SequenceSearcher: clusterFasta, clusterFastaIterative, copyClusterOut, fastaRecordFromId, getClusteredRecords, getRecords, localBlast, localPSIBlast, remoteBlast, writeClusteredBlastResult, writeFasta, writeFastaAll, writeFastaClustered


Class Variables [hide private]
  NMR_RESOLUTION = 3.5
  F_RESULT_FOLDER = '/templates'
  F_FASTA_ALL = F_RESULT_FOLDER+ '/all.fasta'
  F_FASTA_NR = F_RESULT_FOLDER+ '/nr.fasta'
  F_CLUSTER_RAW = F_RESULT_FOLDER+ '/cluster_raw.out'
  F_CLUSTER_LOG = F_RESULT_FOLDER+ '/cluster_result.out'
  F_BLAST_OUT = F_RESULT_FOLDER+ '/blast.out'
  F_CLUSTER_BLAST_OUT = F_RESULT_FOLDER+ '/cluster_blast.out'
  F_ALL = F_RESULT_FOLDER+ '/all'
  F_NR = F_RESULT_FOLDER+ '/nr'
  F_CHAIN_INDEX = '/chain_index.txt'

Inherited from SequenceSearcher.SequenceSearcher: F_FASTA_TARGET


Method Details [hide private]

__init__(self, outFolder='.', clusterLimit=20, verbose=1, log=None, silent=0)
(Constructor)

source code 
Parameters:
  • outFolder (str) - project folder (results are put into subfolder) ['.']
  • clusterLimit (int) - maximal number of returned sequence clusters (default: 20)
  • verbose (1|0) - keep temporary files (default: 1)
  • log (LogFile) - log file instance, if None, STDOUT is used (default: None)
  • silent (1|0) - don't print messages to STDOUT is used (default: 0)
Overrides: SequenceSearcher.SequenceSearcher.__init__

prepareFolders(self)

source code 

Create folders needed by this class.
Overrides: SequenceSearcher.SequenceSearcher.prepareFolders

getSequenceIDs(self, blast_records)

source code 

Extract sequence ids (pdb codes and chain ID) from BlastParser result.
Parameters:
  • blast_records (Bio.Blast.Record.Blast) - result from BlastParser
Returns: {'pdb':str, 'chain':str } ]
list of dictionaries mapping pdb codes and chain IDs
Raises:
Overrides: SequenceSearcher.SequenceSearcher.getSequenceIDs

fastaFromIds(self, db, id_lst, fastaOut=None)

source code 

Use:
  fastaFromIds( id_lst, fastaOut ) -> { str: Bio.Fasta.Record }
Parameters:
  • db (str) - database name
  • id_lst ({'pdb':str, 'chain':str}]) - list of dictionaries with pdb codes and chain IDs
Returns: { str: Bio.Fasta.Record }
Dictionary mapping pdb codes to Bio.Fasta.Records. The returned records have an additional field: chain.
Overrides: SequenceSearcher.SequenceSearcher.fastaFromIds

getLocalPDB(self, id, db_path=settings.pdb_path)

source code 

Get the coordinate file from a local pdb database.
Parameters:
  • id (str) - pdb code, 4 characters
  • db_path (str) - path to local pdb database (default: settings.pdb_path)
Returns: open file handle
the requested pdb file as a file handle
Raises:

getRemotePDB(self, id, rcsb_url=settings.rcsb_url)

source code 

Get the coordinate file remotely from the RCSB.
Parameters:
  • id (str) - pdb code, 4 characters
  • rcsb_url (str) - template url for pdb download (default: settings.rcsb_url)
Returns: open file handle
the requested pdb file as a file handle
Raises:

__extractPDBInfos(self, handle)

source code 

Extract extra infos from PDB file. NMR files get resolution 3.5.
Parameters:
  • handle (open file handle OR strings) - open file handle OR string of file to examine
Returns: {'resolution':float }
pdb file as list of strings, dictionary with resolution
Raises:

retrievePDBs(self, outFolder=None, pdbCodes=None)

source code 

Get PDB from local database if it exists, if not try to download the coordinartes drom the RSCB. Write PDBs for given fasta records. Add PDB infos to internal dictionary of fasta records. NMR structures get resolution 3.5.
Parameters:
  • outFolder (str OR None) - folder to put PDB files into (default: F_ALL)
  • pdbCodes ([str]) - list of PDB codes [all previously found templates]
Returns: [str]
list of PDB file names
Raises:

selectFasta(self, ids_in_cluster)

source code 

select one member of cluster of sequences.
Parameters:
  • ids_in_cluster ([str]) - list of sequence ids defining the cluster
Returns: Bio.Fasta.Record
Bio.Fasta.Record
Overrides: SequenceSearcher.SequenceSearcher.selectFasta

reportClustering(self, raw=None)

source code 

Report the clustering result.

Writes:
Parameters:
  • raw (1|0) - write raw clustering result to disk (default: None)
Overrides: SequenceSearcher.SequenceSearcher.reportClustering

saveClustered(self, outFolder=None)

source code 

Copy best PDB of each cluster into another folder. Create index file in same folder. The returned dictionary or index file (F_CHAIN_INDEX) is used as input to TemplateCleaner.
Parameters:
  • outFolder (str OR None) - folder to write files to (default: F_NR)
Returns: {str, str}
{ str_filename : str_chain_id }, file names and chain ids
Raises:

Class Variable Details [hide private]

NMR_RESOLUTION

Value:
3.5                                                                    
      

F_RESULT_FOLDER

Value:
'/templates'                                                           
      

F_FASTA_ALL

Value:
F_RESULT_FOLDER+ '/all.fasta'                                          
      

F_FASTA_NR

Value:
F_RESULT_FOLDER+ '/nr.fasta'                                           
      

F_CLUSTER_RAW

Value:
F_RESULT_FOLDER+ '/cluster_raw.out'                                    
      

F_CLUSTER_LOG

Value:
F_RESULT_FOLDER+ '/cluster_result.out'                                 
      

F_BLAST_OUT

Value:
F_RESULT_FOLDER+ '/blast.out'                                          
      

F_CLUSTER_BLAST_OUT

Value:
F_RESULT_FOLDER+ '/cluster_blast.out'                                  
      

F_ALL

Value:
F_RESULT_FOLDER+ '/all'                                                
      

F_NR

Value:
F_RESULT_FOLDER+ '/nr'                                                 
      

F_CHAIN_INDEX

Value:
'/chain_index.txt'