Package Biskit :: Package Mod :: Module CheckIdentities :: Class Check_Identities
[hide private]
[frames] | no frames]

Class Check_Identities

source code

This class regroups the different methods to prevent the use of templates with a too high percentage of identities

Instance Methods [hide private]
  __init__(self, outFolder='.', verbose=1)
[str] get_lines(self, aln_file=None)
Retrieve the lines from an aln file
int search_length(self, string_lines)
This function returns the length an alignment in the file 'final.pir_aln'
dict get_aln_sequences(self, string_lines, aln_length)
Create a dictionary with the name of the target (i.e 'target') and sequence from the final output from T-Coffee (final.pir_aln).
dict get_aln_templates(self, string_lines, aln_dict, aln_length)
Add information about the name of the template sequences and the sequence that is aligned to the template.
dict identities(self, aln_dictionary)
Create a dictionary that contains information about all the alignments in the aln_dictionar using pairwise comparisons.
  __writeId(self, name, dic, key, description)
Write an sequence identity matrix to file.
  output_identities(self, aln_dictionary, identities_file=None, identities_info_file=None, identities_cov_file=None)
Writes three files to disk with identity info about the current multiple alignment.
  go(self, output_folder=None)
Perform sequence comparison.

Class Variables [hide private]
  F_INPUT_FOLDER = '/t_coffee'
  F_INPUT_ALNS = F_INPUT_FOLDER+ '/final.pir_aln'
  F_OUTPUT_IDENTITIES = '/identities.out'
  F_OUTPUT_IDENTITIES_INF = '/identities_info.out'
  F_OUTPUT_IDENTITIES_COV = '/identities_cov.out'

Method Details [hide private]

__init__(self, outFolder='.', verbose=1)
(Constructor)

source code 
Parameters:
  • outFolder (str) - base folder
  • verbose (1|0) - write intermediary files (default: 0)

get_lines(self, aln_file=None)

source code 

Retrieve the lines from an aln file
Parameters:
Returns: [str]
file as list of strings

search_length(self, string_lines)

source code 

This function returns the length an alignment in the file 'final.pir_aln'
Parameters:
  • string_lines ([str]) - aln file as list of string
Returns: int
length of alignment

get_aln_sequences(self, string_lines, aln_length)

source code 

Create a dictionary with the name of the target (i.e 'target') and sequence from the final output from T-Coffee (final.pir_aln).
Parameters:
  • string_lines ([str]) - aln file as list of string
  • aln_length (int) - length of alignment
Returns: dict
{'name':'target, 'seq': 'sequence of the target'}

get_aln_templates(self, string_lines, aln_dict, aln_length)

source code 

Add information about the name of the template sequences and the sequence that is aligned to the template. Data taken from the T-Coffee alignment (final.pir_aln).
Parameters:
  • string_lines ([str]) - aln file as list of string
  • aln_dict (dict) - alignment dictionary
Returns: dict
{{'name':str, 'seq': str} }

identities(self, aln_dictionary)

source code 

Create a dictionary that contains information about all the alignments in the aln_dictionar using pairwise comparisons.
Parameters:
  • aln_dictionary (dict) - alignment dictionary
Returns: dict
a dictionary of dictionaries with the sequence name as the top key. Each sub dictionary then has the keys:
  • 'name' - str, sequence name
  • 'seq' - str, sequence of
  • 'template_info' - list of the same length as the 'key' sequence excluding deletions. The number of sequences in tha multiple alignmentthat contain information at this position.
  • 'ID' - dict, sequence identity in precent comparing the 'key' sequence all other sequences (excluding deletions)
  • 'info_ID' - dict, same as 'ID' but compared to the template sequence length (i.e excluding deletions and insertions in the 'key' sequence )
  • 'cov_ID' - dict, same as 'info_ID' but insertions are defined comparing to all template sequences (i.e where 'template_info' is zero )

__writeId(self, name, dic, key, description)

source code 

Write an sequence identity matrix to file.
Parameters:
  • name (str) - file name
  • dic (dict) - alignment dictionary
  • key (key) - key in dictionary to write
  • description (str) - description to go into file (first line)

output_identities(self, aln_dictionary, identities_file=None, identities_info_file=None, identities_cov_file=None)

source code 

Writes three files to disk with identity info about the current multiple alignment.
Parameters:
  • aln_dictionary (dict) - alignment dictionary
  • identities_file (str) - name for file with sequence identity in percent comparing a sequence to another (excluding deletions in the first sequence) (default: None -> F_OUTPUT_IDENTITIES)
  • identities_info_file (str) - name for file with sequence identity in percent comparing a sequence to another (excluding deletions and insertions in the first sequence) (default: None -> F_OUTPUT_IDENTITIES_INF)
  • identities_cov_file (str) - name for file with sequence identity in percent comparing a sequence to another (excluding deletions and insertions in the first sequence but only when the first sequence doesn't match any other sequence in the multiple alignment) (default: None -> F_OUTPUT_IDENTITIES_COV)

go(self, output_folder=None)

source code 

Perform sequence comparison.
Parameters:
  • output_folder (str) - output folder

Class Variable Details [hide private]

F_INPUT_FOLDER

Value:
'/t_coffee'                                                            
      

F_INPUT_ALNS

Value:
F_INPUT_FOLDER+ '/final.pir_aln'                                       
      

F_OUTPUT_IDENTITIES

Value:
'/identities.out'                                                      
      

F_OUTPUT_IDENTITIES_INF

Value:
'/identities_info.out'                                                 
      

F_OUTPUT_IDENTITIES_COV

Value:
'/identities_cov.out'