Biskit :: Hmmer :: Hmmer :: Class Hmmer
[hide private]
[frames] | no frames]

Class Hmmer

source code

Search Hmmer Pfam database and retrieve conservation score for model

Instance Methods [hide private]
  __init__(self, hmmdb=hmmDatabase, verbose=1, log=StdLog())
  checkHmmdbIndex(self)
Checks if the hmm database has been indexed or not, if not indexing will be done.
dict, [list] searchHmmdb(self, target, noSearch=None)
Search hmm database with a sequence in fasta format.
dict selectMatches(self, matches, hits, score_cutoff=60, eValue_cutoff=1e-8)
select what hmm profiles to use based on score and e-Value cutoff
dict getHmmProfile(self, hmmName)
Get the hmm profile with name hmmName from hmm databse.
str, str, int, [int] align(self, model, hits)
Performs alignment If there is more than one hit with the profile, the sequence will be subdevided and the alignment will be performed on each part.
str, str, [int] removeGapInSeq(self, fasta, hmm)
Removes position scorresponding to insertions in search sequence
str OR None mergeHmmSeq(self, seq1, seq2)
Merges two sequence files into one.
str, str subAlign(self, file)
Align fasta formated sequence to hmm profile.
dict castHmmDic(self, hmmDic, repete, hmmGap, key)
Blow up hmmDic to the number of repetes of the profile used.
[float] matchScore(self, fastaSeq, hmmSeq, profileDic, key)
Get match emmision score for residues in search sequence
array, dict __score(self, model, key, hmmNames=None)
Get match emmission scores for search sequence.
  scoreAbsSum(self, model, hmmNames=None)
  scoreMaxAll(self, model, hmmNames=None)
  scoreEntropy(self, model, hmmNames=None)
  __list2array(self, lstOrAr)
  mergeProfiles(self, p0, p1, maxOverlap=3)
Merge profile p0 with profile p1, as long as they overlap in at most maxOverlap positions
  cleanup(self)
remove temp files

Method Details [hide private]

__init__(self, hmmdb=hmmDatabase, verbose=1, log=StdLog())
(Constructor)

source code 
Parameters:
  • hmmdb (str) - Pfam hmm database
  • verbose (1|0) - verbosity level (default: 1)
  • log (Biskit.LogFile) - Log file for messages [STDOUT]

checkHmmdbIndex(self)

source code 

Checks if the hmm database has been indexed or not, if not indexing will be done.

searchHmmdb(self, target, noSearch=None)

source code 

Search hmm database with a sequence in fasta format. If the profile names have been provided - skip the search and only write the temporary sequence files.
Parameters:
  • target (PDBModel or fasta file) - sequence
  • noSearch (1 OR None) - don't perform a seach
Returns: dict, [list]
dictionary witn profile names as keys and a list of lists containing information about the range where the profile matches the sequence

selectMatches(self, matches, hits, score_cutoff=60, eValue_cutoff=1e-8)

source code 

select what hmm profiles to use based on score and e-Value cutoff
Parameters:
  • matches (dict) - output from searchHmmdb
  • hits ([list]) - output from searchHmmdb
  • score_cutoff (float) - cutoff value for an acceptable score
  • eValue_cutoff (float) - cutoff value for an acceptable e-value
Returns: dict
{hmm_name : [[start,stop],[..]]}

getHmmProfile(self, hmmName)

source code 

Get the hmm profile with name hmmName from hmm databse. Extract some information about the profile as well as the match state emmission scores. Keys of the returned dictionary:
 'AA', 'name', 'NrSeq', 'emmScore', 'accession',
 'maxAllScale', 'seqNr', 'profLength', 'ent', 'absSum'
Parameters:
  • hmmName (str) - hmm profile name
Returns: dict
dictionary with warious information about the profile

align(self, model, hits)

source code 

Performs alignment If there is more than one hit with the profile, the sequence will be subdevided and the alignment will be performed on each part. a final merger profile for the profile will be returned.
Parameters:
  • model (PDBModel) - model
  • hits ([[int,int]]) - list with matching sections from searchHmmdb
Returns: str, str, int, [int]
fastaSeq hmmSeq repete hmmGap:
          fastaSeq - sequence
          hmmSeq - matching positions in profile
          repete - number of repetes of the profile
          hmmGap - list with gaps (deletions in search sequence) for
                   each repete

removeGapInSeq(self, fasta, hmm)

source code 

Removes position scorresponding to insertions in search sequence
Parameters:
  • fasta (str) - search sequence
  • hmm (str) - sequence, matching hmm positions
Returns: str, str, [int]
search sequence and profile match sequence with insertions and deletions removed and a list with the deleted positions

mergeHmmSeq(self, seq1, seq2)

source code 

Merges two sequence files into one. Multilple hits with one profile cannot overlap!! Overlap == ERROR
Parameters:
  • seq1 (str) - sequence
  • seq2 (str) - sequence
Returns: str OR None
merged sequence or None

subAlign(self, file)

source code 

Align fasta formated sequence to hmm profile.
Parameters:
  • file (str) - path to hmmalign output file
Returns: str, str
alignment and matching hmm positions with gaps

castHmmDic(self, hmmDic, repete, hmmGap, key)

source code 

Blow up hmmDic to the number of repetes of the profile used. Correct scores for possible deletions in the search sequence.
Parameters:
  • hmmDic (dict) - dictionary from getHmmProfile
  • repete (int) - repete information from align
  • hmmGap ([int]) - information about gaps from align
  • key (str) - name of scoring method to adjust for gaps and repetes
Returns: dict
dictionary with information about the profile

matchScore(self, fastaSeq, hmmSeq, profileDic, key)

source code 

Get match emmision score for residues in search sequence
Parameters:
  • fastaSeq (str) - search sequence
  • hmmSeq (str) - sequence, matching hmm positions
  • profileDic (dict) - from castHmmDic
  • key (str) - name of scoring method
Returns: [float]
list of emmision scores for sequence

__score(self, model, key, hmmNames=None)

source code 

Get match emmission scores for search sequence. If profile name(s) are provided, no search will be performed.
  • If names and positions of hmm profiles is NOT provided search performed -> score (array), hmmNames (dictionary)
  • If names and positions of hmm profiles is provided NO search performed -> score (array), hmmNames (dictionary - same as input)
Parameters:
  • model (PDBModel) - model
  • key (str) - name of scoring method
  • hmmNames (str OR None) - profile name OR None (default: None)
Returns: array, dict
score, hmmNames

scoreAbsSum(self, model, hmmNames=None)

source code 

scoreMaxAll(self, model, hmmNames=None)

source code 

scoreEntropy(self, model, hmmNames=None)

source code 

__list2array(self, lstOrAr)

source code 

mergeProfiles(self, p0, p1, maxOverlap=3)

source code 

Merge profile p0 with profile p1, as long as they overlap in at most maxOverlap positions
Parameters:
  • p0 ([float]) - profile
  • p1 ([float]) - profile
  • maxOverlap (int) - maximal allowed overlap between profiles
Returns:
array

cleanup(self)

source code 

remove temp files