Package Biskit :: Module match2seq
[hide private]
[frames] | no frames]

Module match2seq

source code

Match 2 sequences against each other, deleting all positions that differ. compareStructures() compares sequences of 2 structures and returns a residue mask for each of them.

Classes [hide private]
  Test
Test case

Functions [hide private]
[str] sequenceList(model)
Extracts a one letter amino acid sequence list
[tuples] getOpCodes(seq_1, seq_2)
Compares two sequences and returns a list with the information needed to convert the first one sequence into the second.
[tuple], [tuple] getSkipLists(seqDiff)
Extracts information about what residues that have to be removed from sequence 1 (delete code) and sequence 2 (insert code).
[tuple], [tuple] getEqualLists(seqDiff)
Extract information about regions in the sequences that are equal.
str repeateInMatch(deletedRes)
Takes a string and returns a list of strings containing internal repeates.
[str],[int] delete(seqAA, seqNr, delList)
Takes a amino acid and a sequence list and deletes positions according to the information given in the delList.
[str], [int] getEqual(seqAA, seqNr, equalList)
Gets only the postions of the sequences that are equal according to the OpCodes.
[str], [int], [str], [int] iterate(seqAA_1, seqNr_1, seqAA_2, seqNr_2, del_1, del_2)
Delete residues until no more deletions are indicated by the sequence matcher.
([1|0...],[1|0...]) compareModels(model_1, model_2)
Initiates comparison of the sequences of two structure objects and returns two equal sequence lists (new_seqAA_1 and new_seqAA_2 should be identical) and the corresponding residue position lists.

Function Details [hide private]

sequenceList(model)

source code 

Extracts a one letter amino acid sequence list
Parameters:
  • model (PDBModel) - model
Returns: [str]
sequence ['A','G','R','T',.....]

getOpCodes(seq_1, seq_2)

source code 

Compares two sequences and returns a list with the information needed to convert the first one sequence into the second.
Parameters:
  • seq_1 ([ str ]) - list of single letters
  • seq_2 ([ str ]) - list of single letters
Returns: [tuples]
Optimization code from difflib:
        [('delete', 0, 1, 0, 0), ('equal', 1, 4, 0, 3),
         ('insert', 4, 4, 3, 4), ('equal', 4, 180, 4, 180)]

getSkipLists(seqDiff)

source code 

Extracts information about what residues that have to be removed from sequence 1 (delete code) and sequence 2 (insert code). Returns deletion codes in the format (start_pos, length).
Parameters:
  • seqDiff ([tuples]) - opcodes
Returns: [tuple], [tuple]
Lists of tuples containing regions of the sequences that should be deteted. Example:
 strucDel_1 = [(0, 1), (180, 4)]
 strucDel_2 = [(3, 1), (207, 4)]

getEqualLists(seqDiff)

source code 

Extract information about regions in the sequences that are equal. Returns deletion codes in the format (start_pos, length).
Parameters:
  • seqDiff ([tuples]) - opcodes
Returns: [tuple], [tuple]
Lists of tuples containing regions of the sequences that are equal. Example:
 strucEqual_1 = [(0, 216)]
 strucEqual_2 = [(0, 216)]

repeateInMatch(deletedRes)

source code 

Takes a string and returns a list of strings containing internal repeates.
Parameters:
  • deletedRes (str) - string of one letter amino acid sequence
Returns: str
smallest self repete. Example:
           'GSGS'     returns  'GS'
           'ABABABAB'    "     'ABAB' and 'AB'
           'FFF'         "     'F'

delete(seqAA, seqNr, delList)

source code 

  • Takes a amino acid and a sequence list and deletes positions according to the information given in the delList.
  • Furthermore, compares the deleted sequence with the following and preceding sequence. If they are identical also these residues are deleted.
  • If the sequence contains internal repeates (as in the sequence 'GSGS') and 2) does not apply the preceding and following sequence is also scanned for this sequence (here 'GS').
  • Parameters:
    • seqAA ([str]) - list with the amino acid sequence in one letter code
    • seqNr ([int]) - list with the amino acid postitons
    • delList ([tuple]) - list of residues to be deleted (postiton, length)
    Returns: [str],[int]
    sequence and positions, example:
              seqAA - ['A','G','R','T',.....]
              seqNr - [ 0,  1,  2,  3 ,.....]  
              delList - [(0, 2), (180, 4)]
              ->  seqAA - ['R','T',.....]
                  seqNr - [ 2,  3 ,.....]
    

    getEqual(seqAA, seqNr, equalList)

    source code 

    Gets only the postions of the sequences that are equal according to the OpCodes. This should not be nessesary but might be usefull to skip 'replace' OpCode.
    Parameters:
    • seqAA ([str]) - list with the amino acid sequence in one letter code
    • seqNr ([int]) - list with the amino acid postitons
    • equalList ([tuple], [tuple]) - Lists of tuples containing regions of the sequences that are equal
    Returns: [str], [int]
    lists of amino acids and positions where equal

    iterate(seqAA_1, seqNr_1, seqAA_2, seqNr_2, del_1, del_2)

    source code 

    Delete residues until no more deletions are indicated by the sequence matcher. Return the final sequences
    Parameters:
    • seqAA_1 ([str]) - list with the amino acid sequence in one letter code
    • seqNr_1 ([int]) - list with the amino acid postitons
    • seqAA_2 ([str]) - list with the amino acid sequence in one letter code
    • seqNr_2 ([int]) - list with the amino acid postitons
    • del_1 ([tuple]) - Lists of tuples containing regions of the sequences that should be deteted
    • del_2 ([tuple]) - Lists of tuples containing regions of the sequences that should be deteted
    Returns: [str], [int], [str], [int]
    the final sequence and position lists

    compareModels(model_1, model_2)

    source code 

    Initiates comparison of the sequences of two structure objects and returns two equal sequence lists (new_seqAA_1 and new_seqAA_2 should be identical) and the corresponding residue position lists.
    Parameters:
    • model_1 (PDBModel) - model
    • model_2 (PDBModel) - model
    Returns: ([1|0...],[1|0...])
    tuple of atom masks for model_1 and model_2:
             e.g. ( [0001011101111111], [1110000111110111] )