Biskit :: PDBModel :: PDBModel :: Class PDBModel
[hide private]
[frames] | no frames]

Class PDBModel

source code

Store and manipulate coordinates and atom infos stemming from a PDB file. Coordinates are stored in the Numeric array 'xyz'; the additional atom infos from the PDB are stored in the list of dictionaries 'atoms'. Methods are provided to remove items from both atoms and xyz simultaniously, to restore the PDB file, to get masks for certain atom types/names/residues, to iterate over residues, sort atoms etc.

Atom- or residue-related values can be put into 'profile' arrays. See setAtomProfile() and setResProfile(). Any reordering or removal of atoms is also applied to the profiles so that they should always match the current atom/residue order.

The object remembers its source (a PDB file or a PDBModel in memory or a pickled PDBModel on disc) and keeps track of whether atoms, xyz, or (which) profiles have been changed with respect to the source. The slim() method is called before pickling a PDBModel. It sets to None the atoms and/or xyz array or any profiles if they have not been changed since beeing read from a source on disc. The change status of xyz and atoms is reported by isChanged() (with respect to the direct source) and isChangedFromDisc() (with respect to the source of the source... just in case). You can trick this mechanism by setting atomsChanged or xyzChanged back to 0 if you want to make only temporary changes that are lost after a call to slim().

Additional infos about the model can be put into a dictionary 'info'.


To Do:

Instance Methods [hide private]
  __init__(self, source=None, pdbCode=None, noxyz=0, skipRes=None)
PDBModel() creates an empty Model to which coordinates (field xyz) and PDB infos (field atoms) have still to be added.
  version(self)
  __getstate__(self)
called before pickling the object.
  __setstate__(self, state)
called for unpickling the object.
  __len__(self)
  __defaults(self)
backwards compatibility to earlier pickled models
  update(self, skipRes=None, lookHarder=0, force=0)
Read coordinates, atoms, fileName, etc.
list of int __pdbTer(self, rmOld=0)
Returns list of atom indices that are followed by a TER record (marked with 'after_ter' flag of the next atom by __collectAll).
array setXyz(self, xyz)
Replace coordinates.
list of dict setAtoms(self, atoms)
Replace atoms list of dictionaries.
  setSource(self, source)
array getXyz(self, mask=None)
Get coordinates, fetch from source PDB or pickled PDBModel, if necessary.
list of dic getAtoms(self, mask=None)
Get atom dictionaries, fetch from source PDB or pickled PDBModel, if necessary.
  setResProfile(self, name, prof, mask=None, default=None, asarray=1, comment=None, **moreInfo)
Add/override residue-based profile.
  setAtomProfile(self, name, prof, mask=None, default=None, asarray=1, comment=None, **moreInfo)
Add/override atom-based profile.
array resProfile(self, name, default=None)
Use:
array atomProfile(self, name, default=None)
Use:
  profile(self, name, default=None, lookHarder=0)
Use:
  profileInfo(self, name, lookHarder=0)
Use:
  setProfileInfo(self, name, **args)
{'bin':'whatif'})
int removeProfile(self, *names)
Remove residue or atom profile(s)
(1||0, 1||0) isChanged(self)
Tell if xyz or atoms have been changed compared to source file or source object (which can be still in memory).
(1||0, 1||0) isChangedFromDisc(self)
Tell whether xyz and atoms can currently be reconstructed from a source on disc.
int profileChangedFromDisc(self, pname)
Check if profile has changed compared to source.
  __slimProfiles(self)
Remove profiles, that haven't been changed from a direct or indirect source on disc AUTOMATICALLY CALLED BEFORE PICKLING
  slim(self)
Remove xyz array and list of atoms if they haven't been changed and could hence be loaded from the source file (only if there is a source file...).
str or PDBModel or None validSource(self)
Check for a valid source on disk.
str sourceFile(self)
Name of pickled source or PDB file.
  disconnect(self)
Disconnect this model from its source (if any).
str getPdbCode(self)
Return pdb code of model.
  setPdbCode(self, code)
Set model pdb code.
str sequence(self, mask=None, xtable=molUtils.xxDic)
Amino acid sequence in one letter code.
list of atom dictionaries xplor2amber(self, change=1, aatm=1)
Rename atoms so that tleap from Amber can read the PDB.
  writePdb(self, fname, ter=1, amber=0, original=0, left=0, wrap=0, headlines=None, taillines=None)
Save model as PDB file.
list of strings returnPdb(self, out=None, ter=1, headlines=None, taillines=None)
Restore PDB file from (possibly transformed) coordinates and pdb line dictionaries in self.atoms.
  saveAs(self, path)
Pickle this PDBModel to a file, set the 'source' field to this file name and mark atoms, xyz, and profiles as unchanged.
array or list maskF(self, atomFunction, numpy=1)
Create list whith result of atomFunction( atom ) for each atom.
array maskCA(self, force=0)
Short cut for mask of all CA atoms.
array maskBB(self, force=0)
Short cut for mask of all backbone atoms.
array maskHeavy(self, force=0)
Short cut for mask of all heavy atoms.
array maskH(self)
Short cut for mask of hydrogens.
array maskCB(self)
Short cut for mask of all CB and CA of GLY.
array maskH2O(self)
Short cut for mask of all atoms in residues named TIP3, HOH and WAT
array maskSolvent(self)
Short cut for mask of all atoms in residues named TIP3, HOH, WAT, Na+, Cl-
array maskHetatm(self)
Short cut for mask of all HETATM
array maskProtein(self, standard=0)
Short cut for mask containing all atoms of amino acids.
Numeric array indices(self, what)
Get atom indices conforming condition
Numeric array mask(self, what, numpy=1)
Get atom mask.
array of int atom2resMask(self, atomMask)
Mask (0) residues for which all atoms are masked (0) in atomMask.
list of int atom2resIndices(self, indices)
Get list of indices of residue for which any atom is in indices.
array of int res2atomMask(self, resMask)
convert residue mask to atom mask.
list of int res2atomIndices(self, indices)
Convert residue indices to atom indices.
list or array res2atomProfile(self, p)
Get an atom profile where each atom has the value its residue has in the residue profile.
list of int atom2chainIndices(self, indices, breaks=0)
Convert atom indices to chain indices.
list of int chain2atomIndices(self, indices, breaks=0)
Convert chain indices into atom indices.
array profile2mask(self, profName, cutoff_min=None, cutoff_max=None)
profile2mask( str_profname, [cutoff_min, cutoff_max=None])
array profile2atomMask(self, profName, cutoff_min=None, cutoff_max=None)
profile2atomMask( str_profname, [cutoff_min, cutoff_max=None]) Same as profile2mask, but converts residue mask to atom mask.
list of int __takeAtomIndices(self, oldI, takeI)
Translate atom positions so that they point to the same atoms after a call to N.take() (if they are kept at all).
  __takeResIndices(self, oldI, takeI)
  concat(self, *models)
Concatenate atoms, coordinates and profiles.
PDBModel take(self, i, deepcopy=0)
All fields of the result model are shallow copies of this model's fields.
  keep(self, i)
Replace atoms,coordinates,profiles of this(!) model with sub-set.
PDBModel clone(self, deepcopy=0)
Clone PDBModel.
PDBModel compress(self, mask, deepcopy=0)
Compress PDBmodel using mask.
array remove(self, what)
Convenience access to the 3 different remove methods.
PDBModel takeChains(self, chainLst, deepcopy=0, breaks=0, maxDist=None)
Get copy of this model with only the given chains.
  addChainFromSegid(self, verbose=1)
Takes the last letter of the segment ID and adds it as chain ID.
  addChainId(self, first_id=None, keep_old=0, breaks=0)
Assign consecutive chain identifiers A - Z to all atoms.
  renumberResidues(self, mask=None, start=1, addChainId=1)
Make all residue numbers consecutive and remove any insertion code letters.
int lenAtoms(self)
Number of atoms in model.
int lenResidues(self)
Number of resudies in model.
int lenChains(self, breaks=0, maxDist=None)
Number of chains in model.
list of dictionaries resList(self, mask=None)
Return list of lists of atom dictionaries per residue, which allows to iterate over residues and atoms of residues.
list of PDBModels resModels(self)
Creates a PDBModel per residue in PDBModel.
list of int resMapOriginal(self, mask=None)
Generate list to map from any atom to its ORIGINAL(!) PDB residue number.
list of int __calcResMap(self, mask=None)
Create a map of residue residue for atoms in model.
list of int resMap(self, mask=None, force=0, cache=1)
Get list to map from any atom to a continuous residue numbering (starting with 0).
list of int resIndex(self, mask=None, force=0, cache=1)
Get the position of the each residue's first atom.
list of int resEndIndex(self, mask=None)
Get the position of the each residue's last atom.
list of int chainMap(self, breaks=0, maxDist=None)
Get chain index of each atom.
list of int chainIndex(self, breaks=0, maxDist=None)
Get indices of first atom of each chain.
list of int chainBreaks(self, breaks_only=1, maxDist=None)
Identify discontinuities in the molecule's backbone.
  removeRes(self, resname)
Remove all atoms with a certain residue name.
float rms(self, other, mask=None, mask_fit=None, fit=1, n_it=1)
Rmsd between two PDBModels.
array, array transformation(self, refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')
Get the transformation matrix which least-square fits this model onto the other model.
PDBModel transform(self, *rt)
Transform coordinates of PDBModel.
PDBModel fit(self, refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')
Least-square fit this model onto refMode
PDBModel magicFit(self, refModel, mask=None)
Superimpose this model onto a ref.
PDBModel centered(self, mask=None)
Get model with centered coordinates.
(float, float, float) center(self, mask=None)
Geometric centar of model.
(float, float, float) centerOfMass(self)
Center of mass of PDBModel.
array of floats masses(self)
Collect the molecular weight of all atoms in PDBModel.
float mass(self)
Molecular weight of PDBModel.
array of float residusMaximus(self, atomValues, mask=None)
Take list of value per atom, return list where all atoms of any residue are set to the highest value of any atom in that residue.
list of int argsort(self, cmpfunc=None)
Prepare sorting atoms within residues according to comparison function.
PDBModel sort(self, sortArg=None, deepcopy=0)
Apply a given sort list to the atoms of this model.
list of int unsort(self, sortList)
Undo a previous sorting on the model itself (no copy).
list of str atomNames(self, start=None, stop=None)
Return a list of atom names from start to stop RESIDUE index
1|0 __testDict_and(self, dic, condition)
Test if all key-value pairs of condition are matched in dic
1|0 __testDict_or(self, dic, condition)
Test if any key-value pairs of condition are matched in dic
list of int filterIndex(self, mode=0, **kw)
Get atom positions that match a combination of key=values.
PDBModel filter(self, mode=0, **kw)
Extract atoms that match a combination of key=values.
list if int equals(self, ref, start=None, stop=None)
Compares the residue and atom sequence in the given range.
(array, array) equalAtoms(self, ref)
Apply to SORTED models without HETATOMS.
([int], [int]) compareAtoms(self, ref)
Get list of atom indices for this and reference model that converts both into 2 models with identical residue and atom content.
float __chainFraction(self, chain, ref)
Look how well a given chain matches a continuous stretch of residues in ref.
([int], [int]) compareChains(self, ref, breaks=0, fractLimit=0.2)
Get list of corresponding chain indices for this and reference model.

Method Details [hide private]

__init__(self, source=None, pdbCode=None, noxyz=0, skipRes=None)
(Constructor)

source code 

  • PDBModel() creates an empty Model to which coordinates (field xyz) and PDB infos (field atoms) have still to be added.
  • PDBModel( file_name ) creates a complete model with coordinates and PDB infos taken from file_name (pdb, pdb.gz, pickled PDBModel)
  • PDBModel( PDBModel ) creates a copy of the given model
  • PDBModel( PDBModel, noxyz=1 ) creates a copy without coordinates
Parameters:
  • source (str or PDBModel) - str, file name of pdb/pdb.gz file OR pickled PDBModel OR PDBModel, template structure to copy atoms/xyz field from
  • pdbCode (str or None) - PDB code, is extracted from file name otherwise
  • noxyz (0||1) - 0 (default) || 1, create without coordinates
Raises:
  • PDBError - if file exists but can't be read

version(self)

source code 

__getstate__(self)

source code 

called before pickling the object.

__setstate__(self, state)

source code 

called for unpickling the object.

__len__(self)
(Length operator)

source code 

__defaults(self)

source code 

backwards compatibility to earlier pickled models

update(self, skipRes=None, lookHarder=0, force=0)

source code 

Read coordinates, atoms, fileName, etc. from PDB or pickled PDBModel - but only if they are currently empty. The atomsChanged and xyzChanged flags are not changed.
Parameters:
  • skipRes (list of str) - names of residues to skip if updating from PDB
  • lookHarder (0|1) - 0(default): update only existing profiles
  • force (0|1) - ignore invalid source (0) or report error (1)
Raises:
  • PDBError - if file can't be unpickled or read:

__pdbTer(self, rmOld=0)

source code 
Parameters:
  • rmOld (1||0) - 1, remove after_ter=0 flags from all atoms
Returns: list of int
list of atom indices that are followed by a TER record (marked with 'after_ter' flag of the next atom by __collectAll).

setXyz(self, xyz)

source code 

Replace coordinates.
Parameters:
  • xyz (array) - Numpy array ( 3 x N_atoms ) of float
Returns: array
N.array( 3 x N_atoms ) or None, old coordinates

setAtoms(self, atoms)

source code 

Replace atoms list of dictionaries. self.__terAtoms is re-created from the 'after_ter' records in atoms
Parameters:
  • atoms (list of dictionaries) - list of dictionaries as returned by PDBFile.readLine() (nearly, for differences see __collectAll() )
Returns: list of dict
[ dict ], old atom dictionaries

setSource(self, source)

source code 
Parameters:
  • source - LocalPath OR PDBModel OR str

getXyz(self, mask=None)

source code 

Get coordinates, fetch from source PDB or pickled PDBModel, if necessary.
Parameters:
  • mask (list of int OR array of 1||0) - atom mask
Returns: array
xyz-coordinates, N.array( 3 x N_atoms, 'f' )

getAtoms(self, mask=None)

source code 

Get atom dictionaries, fetch from source PDB or pickled PDBModel, if necessary. See also __collectAll().
Parameters:
  • mask (list of int OR array of 1||0) - atom mask
Returns: list of dic
atom dictionary, list of dictionaries from PDBFile.readline()

setResProfile(self, name, prof, mask=None, default=None, asarray=1, comment=None, **moreInfo)

source code 

Add/override residue-based profile.
Parameters:
  • name (str) - name to access profile
  • prof (list OR array) - list/array of values
  • mask (list OR array) - list/array 1 x N_residues of 0|1, N.sum(mask)==len(prof)
  • default (any) - value for masked residues default: None for lists, 0 for arrays
  • asarray (0|1|2) - store as list (0), as array (2) or store numbers as array but everything else as list (1) default: 1
  • comment (str) - goes into aProfiles_info[name]['comment']
  • moreInfo ((key, value)) - additional key-value pairs for aProfiles_info[name]
Raises:

setAtomProfile(self, name, prof, mask=None, default=None, asarray=1, comment=None, **moreInfo)

source code 

Add/override atom-based profile.
Parameters:
  • name (str) - name to access profile
  • prof (list OR array) - list/array of values
  • mask (list OR array) - list/array 1 x N_residues of 0 || 1, profile items
  • default (any) - value for masked residues default: None for lists, 0 for arrays
  • asarray (0|1|2) - store as list (0), as array (2) or store numbers as array but everything else as list (1) default: 1
  • comment (str) - goes into aProfiles_info[name]['comment']
  • moreInfo ((key, value)) - additional key-value pairs for aProfiles_info[name]
Raises:

resProfile(self, name, default=None)

source code 

Use:
  resProfile( profile_name ) -> array 1 x N_res with residue values
Parameters:
  • name (str) - name to access profile
Returns: array
residue profile array 1 x N_res with residue values
Raises:

atomProfile(self, name, default=None)

source code 

Use:
  atomProfile( profile_name ) -> array 1 x N_atoms with atom values
Parameters:
  • name (str) - name to access profile
Returns: array
atom profile array 1 x N_atoms with atom values
Raises:

profile(self, name, default=None, lookHarder=0)

source code 

Use:
  profile( name, lookHarder=0) -> atom or residue profile
Parameters:
  • name (str) - name to access profile
  • default (any) - default result if no profile is found, if None, raise exception
  • lookHarder (0||1) - update from source before reporting missing profile
Raises:
  • ProfileError - if neither atom- nor rProfiles contains |name|

profileInfo(self, name, lookHarder=0)

source code 

Use:
  profileInfo( name ) -> dict with infos about profile
Parameters:
  • name (str) - name to access profile
  • lookHarder (0|1) - update from source before reporting missing profile:
                      Guaranteed infos: 'version'->str,
                                        'comment'->str,
                                        'changed'->1||0
    
Raises:
  • ProfileError - if neither atom - nor rProfiles contains |name|

setProfileInfo(self, name, **args)

source code 

{'bin':'whatif'})
Parameters:
  • name (str) - name to access profile
Raises:

removeProfile(self, *names)

source code 

Remove residue or atom profile(s)

Use:
  removeProfile( str_name [,name2, name3] ) -> 1|0,
Parameters:
  • names (str OR list of str) - name or list of residue or atom profiles
Returns: int
1 if at least 1 profile has been deleted, 0 if none has been found

isChanged(self)

source code 

Tell if xyz or atoms have been changed compared to source file or source object (which can be still in memory).
Returns: (1||0, 1||0)
(1,1)..both xyz and atoms field have been changed

isChangedFromDisc(self)

source code 

Tell whether xyz and atoms can currently be reconstructed from a source on disc. Same as isChanged() unless source is another not yet saved PDBModel instance that made changes relative to its own source ...
Returns: (1||0, 1||0)
(1,1)..both xyz and atoms field have been changed

profileChangedFromDisc(self, pname)

source code 

Check if profile has changed compared to source.
Returns: int
1, if profile |pname| can currently not be reconstructed from a source on disc.
Raises:
  • ProfileError - if there is no atom or res profile with pname

__slimProfiles(self)

source code 

Remove profiles, that haven't been changed from a direct or indirect source on disc AUTOMATICALLY CALLED BEFORE PICKLING

slim(self)

source code 

Remove xyz array and list of atoms if they haven't been changed and could hence be loaded from the source file (only if there is a source file...). Remove any unchanged profiles. AUTOMATICALLY CALLED BEFORE PICKLING

validSource(self)

source code 

Check for a valid source on disk.
Returns: str or PDBModel or None
str or PDBModel, None if this model has no valid source

sourceFile(self)

source code 

Name of pickled source or PDB file.
Returns: str
file name of pickled source or PDB file
Raises:
  • PDBError - if there is no valid source

disconnect(self)

source code 

Disconnect this model from its source (if any).

Note: If this model has an (in-memory) PDBModel instance as source, the entries of 'atoms' could still reference the same dictionaries.

getPdbCode(self)

source code 

Return pdb code of model.
Returns: str
pdb code

setPdbCode(self, code)

source code 

Set model pdb code.
Parameters:
  • code (str) - new pdb code

sequence(self, mask=None, xtable=molUtils.xxDic)

source code 

Amino acid sequence in one letter code.
Parameters:
  • mask (list or array) - atom mask, to apply before (default None)
  • xtable (dict) - {str:str}, additional residue:single_letter mapping for non-standard residues (default molUtils.xxDic)
Returns: str
1-letter-code AA sequence (based on first atom of each res).

xplor2amber(self, change=1, aatm=1)

source code 

Rename atoms so that tleap from Amber can read the PDB. If HIS residues contain atoms named HE2 or/and HD2, the residue name is changed to HIE or HID or HIP, respectively. Disulfide bonds are not yet identified - CYS -> CYX renaming must be done manually (see AmberParmBuilder for an example). Internally amber uses H atom names ala HD21 while standard pdb files use 1HD2. By default, ambpdb produces 'standard' pdb atom names but it gives the less ambiguous amber names with switch -aatm.
Parameters:
  • change (1|0) - change this model's atoms directly (default:1)
  • aatm (1|0) - use, for example, HG23 instead of 3HG2 (default:1)
Returns: list of atom dictionaries
{..} ], list of atom dictionaries

writePdb(self, fname, ter=1, amber=0, original=0, left=0, wrap=0, headlines=None, taillines=None)

source code 

Save model as PDB file.
Parameters:
  • fname (str) - name of new file
  • ter (0, 1, 2 or 3) - Option of how to treat the retminal record:
               0, don't write any TER statements
               1, restore original TER statements (doesn't work,
                    if preceeding atom has been deleted) [default]
               2, put TER between all detected chains
               3, as 2 but also detect and split discontinuous chains
    
  • amber (1||0) - amber formatted atom names (ter=3, left=1, wrap=0) (default 0)
  • original (1||0) - revert atom names to the ones parsed in from PDB (default 0)
  • left (1||0) - left-align atom names (as in amber pdbs)(default 0)
  • wrap (1||0) - write e.g. 'NH12' as '2NH1' (default 0)
  • headlines (list of tuples) - [( str, dict or str)], list of record / data tuples:
                     e.g. [ ('SEQRES', '  1 A 22  ALA GLY ALA'), ]
    
  • taillines (list of tuples) - same as headlines but appended at the end of file

returnPdb(self, out=None, ter=1, headlines=None, taillines=None)

source code 

Restore PDB file from (possibly transformed) coordinates and pdb line dictionaries in self.atoms. This is an older version of writePdb that returns a list of PDB lines instead of writing to a file.
Parameters:
  • out (stdout or None) - stdout or None, if None a list is returned
  • ter (0, 1 or 2) - Option of how to treat the retminal record:
               0, don't write any TER statements
               1, restore original TER statements (doesn't work,
                    if preceeding atom has been deleted)
               2, put TER between all detected chains
    
  • headlines (list of tuples) - [( str, dict or str)], list of record / data tuples:
                     e.g. [ ('SEQRES', '  1 A 22  ALA GLY ALA'), ]
    
  • taillines (list of tuples) - same as headlines but appended at the end of file
Returns: list of strings
[ str ], lines of a PDB file

saveAs(self, path)

source code 

Pickle this PDBModel to a file, set the 'source' field to this file name and mark atoms, xyz, and profiles as unchanged. Normal pickling of the object will only dump those data that can not be reconstructed from the source of this model (if any). saveAs creates a 'new source' without further dependencies.
Parameters:
  • path (str OR LocalPath instance) - target file name

maskF(self, atomFunction, numpy=1)

source code 

Create list whith result of atomFunction( atom ) for each atom.
Parameters:
  • atomFunction (1||0) - function( dict_from_PDBFile.readline() ), true || false (Condition)
  • numpy (int) - 1(default)||0, convert result to Numpy array of int
Returns: array or list
Numpy N.array( [0,1,1,0,0,0,1,0,..], 'i') or list

maskCA(self, force=0)

source code 

Short cut for mask of all CA atoms.
Parameters:
  • force (0||1) - force calculation even if cached mask is available
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskBB(self, force=0)

source code 

Short cut for mask of all backbone atoms.
Parameters:
  • force (0||1) - force calculation even if cached mask is available
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskHeavy(self, force=0)

source code 

Short cut for mask of all heavy atoms. ('element' <> H)
Parameters:
  • force (0||1) - force calculation even if cached mask is available
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskH(self)

source code 

Short cut for mask of hydrogens. ('element' == H)
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskCB(self)

source code 

Short cut for mask of all CB and CA of GLY.
Returns: array
mask of all CB and CA of GLY

maskH2O(self)

source code 

Short cut for mask of all atoms in residues named TIP3, HOH and WAT
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskSolvent(self)

source code 

Short cut for mask of all atoms in residues named TIP3, HOH, WAT, Na+, Cl-
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskHetatm(self)

source code 

Short cut for mask of all HETATM
Returns: array
N.array( 1 x N_atoms ) of 0||1

maskProtein(self, standard=0)

source code 

Short cut for mask containing all atoms of amino acids.
Parameters:
  • standard (0|1) - only standard residue names (not CYX, NME,..) (default 0)
Returns: array
N.array( 1 x N_atoms ) of 0||1, mask of all protein atoms (based on residue name)

indices(self, what)

source code 

Get atom indices conforming condition
Parameters:
  • what (function OR list of str or int OR int) - Selection:
        - function applied to each atom entry,
           e.g. lambda a: a['residue_name']=='GLY'
        - list of str, allowed atom names
        - list of int, allowed atom indices OR mask with only 1 and 0
        - int, single allowed atom index
    
Returns: Numeric array
N_atoms x 1 (0||1 )
Raises:
  • PDBError - if what is neither of above

mask(self, what, numpy=1)

source code 

Get atom mask.
Parameters:
  • what (function OR list of str or int OR int) - Selection:
                - function applied to each atom entry,
                   e.g. lambda a: a['residue_name']=='GLY'
                - list of str, allowed atom names
                - list of int, allowed atom indices OR mask with
                  only 1 and 0
                - int, single allowed atom index
    
Returns: Numeric array
N_atoms x 1 (0||1 )
Raises:
  • PDBError - if what is neither of above

atom2resMask(self, atomMask)

source code 

Mask (0) residues for which all atoms are masked (0) in atomMask.
Parameters:
  • atomMask (list/array of int) - list/array of int, 1 x N_atoms
Returns: array of int
1 x N_residues (0||1 )

atom2resIndices(self, indices)

source code 

Get list of indices of residue for which any atom is in indices.
Parameters:
  • indices (list of int) - list of atom indices
Returns: list of int
indices of residues

res2atomMask(self, resMask)

source code 

convert residue mask to atom mask.
Parameters:
  • resMask (list/array of int) - list/array of int, 1 x N_residues
Returns: array of int
1 x N_atoms

res2atomIndices(self, indices)

source code 

Convert residue indices to atom indices.
Parameters:
  • indices (list/array of int) - list/array of residue indices
Returns: list of int
list of atom indices

res2atomProfile(self, p)

source code 

Get an atom profile where each atom has the value its residue has in the residue profile.
Parameters:
  • p (str) - name of existing residue profile OR ... [ any ], list of lenResidues() length
Returns: list or array
[ any ] OR array, atom profile

atom2chainIndices(self, indices, breaks=0)

source code 

Convert atom indices to chain indices. Each chain is only returned once.
Parameters:
  • indices (list of int) - list of atom indices
  • breaks (0||1) - look for chain breaks in backbone coordinates (def. 0)
Returns: list of int
chains any atom which is in indices

chain2atomIndices(self, indices, breaks=0)

source code 

Convert chain indices into atom indices.
Parameters:
  • indices (list of int) - list of chain indices
  • breaks (0||1) - look for chain breaks in backbone coordinates (def. 0)
Returns: list of int
all atoms belonging to the given chains

profile2mask(self, profName, cutoff_min=None, cutoff_max=None)

source code 

profile2mask( str_profname, [cutoff_min, cutoff_max=None])
Parameters:
  • cutoff_min (float) - low value cutoff
  • cutoff_max (float) - high value cutoff
Returns: array
mask len( profile(profName) ) x 1||0
Raises:

profile2atomMask(self, profName, cutoff_min=None, cutoff_max=None)

source code 

profile2atomMask( str_profname, [cutoff_min, cutoff_max=None]) Same as profile2mask, but converts residue mask to atom mask.
Parameters:
  • cutoff_min (float) - low value cutoff
  • cutoff_max (float) - high value cutoff
Returns: array
mask N_atoms x 1|0
Raises:

__takeAtomIndices(self, oldI, takeI)

source code 

Translate atom positions so that they point to the same atoms after a call to N.take() (if they are kept at all).
Parameters:
  • oldI (list of int) - indices to translate
  • takeI (list of int) - indices for N
Returns: list of int
indices of current atoms after take, resorted

__takeResIndices(self, oldI, takeI)

source code 

concat(self, *models)

source code 

Concatenate atoms, coordinates and profiles. source and fileName are lost, so are profiles that are not available in all models. model0.concat( model1 [, model2, ..]) -> single PDBModel.
Parameters:
  • models (model OR list of models) - models to concatenate

Note: info records of given models are lost.

take(self, i, deepcopy=0)

source code 

All fields of the result model are shallow copies of this model's fields. I.e. removing or reordering of atoms does not affect the original model but changes to entries in the atoms dictionaries would also change the atom dictionaries of this model.

take( atomIndices, [deepcopy=0] ) -> PDBModel / sub-class.
Parameters:
  • i (list/array of int) - atomIndices, positions to take in the order to take
  • deepcopy (0||1) - deepcopy atom dictionaries (default 0)
Returns: PDBModel
PDBModel / sub-class

Note: the position of TER records is translated to the new atoms. Chain boundaries can hence be lost (if the terminal atoms are not taken) or move into the middle of a residue (if atoms are resorted).

keep(self, i)

source code 

Replace atoms,coordinates,profiles of this(!) model with sub-set. (in-place version of N.take() )
Parameters:
  • i () - lst/array of int

clone(self, deepcopy=0)

source code 

Clone PDBModel.
Returns: PDBModel
PDBModel / subclass, copy of this model, see comments to Numeric.take()

compress(self, mask, deepcopy=0)

source code 

Compress PDBmodel using mask. compress( mask, [deepcopy=0] ) -> PDBModel
Parameters:
  • mask (array) - N.array( 1 x N_atoms of 1 or 0 ) 1 .. keep this atom
  • deepcopy (1||0) - deepcopy atom dictionaries of this model and result (default 0 )
Returns: PDBModel
compressed PDBModel using mask

remove(self, what)

source code 

Convenience access to the 3 different remove methods. The mask used to remove atoms is returned. This mask can be used to apply the same change to another array of same dimension as the old(!) xyz and atoms.
Parameters:
  • what (list of int or int) - Decription of what to remove:
         - function( atom_dict ) -> 1 || 0    (1..remove) OR
         - list of int [4, 5, 6, 200, 201..], indices of atoms to remove
         - list of int [11111100001101011100..N_atoms], mask (1..remove)
         - int, remove atom with this index
    
Returns: array
N.array(1 x N_atoms_old) of 0||1, mask used to compress the atoms and xyz arrays.
Raises:
  • PDBError - if what is neither of above

takeChains(self, chainLst, deepcopy=0, breaks=0, maxDist=None)

source code 

Get copy of this model with only the given chains.
Parameters:
  • chainLst (list of int) - list of chains
  • deepcopy (0||1) - deepcopy atom dictionaries (default 0)
  • breaks (0|1) - split chains at chain breaks (default 0)
  • maxDist (float) - (if breaks=1) chain break threshold in Angstrom
Returns: PDBModel
PDBModel (by default, all fields are shallow copies, see Numeric.take() )

addChainFromSegid(self, verbose=1)

source code 

Takes the last letter of the segment ID and adds it as chain ID.

addChainId(self, first_id=None, keep_old=0, breaks=0)

source code 

Assign consecutive chain identifiers A - Z to all atoms.
Parameters:
  • first_id (str) - str (A - Z), first letter instead of 'A'
  • keep_old (1|0) - don't override existing chain IDs (default 0)
  • breaks (1|0) - consider chain break as start of new chain (default 0)

renumberResidues(self, mask=None, start=1, addChainId=1)

source code 

Make all residue numbers consecutive and remove any insertion code letters. Note that a backward jump in residue numbering is interpreted as end of chain by chainMap() and chainIndex(). Chain borders might hence get lost if there is no change in chain label or segid.
Parameters:
  • mask (list of int) - [ 0||1 x N_atoms ] atom mask to apply BEFORE
  • start (int) - starting number (default 1)
  • addChainId (1|0) - add chain IDs if they are missing

lenAtoms(self)

source code 

Number of atoms in model.
Returns: int
number of atoms

lenResidues(self)

source code 

Number of resudies in model.
Returns: int
total number of residues

lenChains(self, breaks=0, maxDist=None)

source code 

Number of chains in model.
Parameters:
  • breaks (0||1) - detect chain breaks from backbone atom distances (def 0)
  • maxDist (float) - maximal distance between consequtive residues [ None ] .. defaults to twice the average distance
Returns: int
total number of chains

resList(self, mask=None)

source code 

Return list of lists of atom dictionaries per residue, which allows to iterate over residues and atoms of residues.
Parameters:
  • mask () - [ 0||1 x N_atoms ] atom mask to apply BEFORE
Returns: list of dictionaries
A list of dictionaries:
 [ {'name':'N', 'residue_name':'LEU', ..},          
   {'name':'CA','residue_name':'LEU', ..} ],        
 [ {'name':'CA', 'residue_name':'GLY', ..}, .. ] ]      

resModels(self)

source code 

Creates a PDBModel per residue in PDBModel.
Returns: list of PDBModels
list of PDBModels, one for each residue

resMapOriginal(self, mask=None)

source code 

Generate list to map from any atom to its ORIGINAL(!) PDB residue number.
Parameters:
  • mask (list of int (1||0)) - [00111101011100111...] consider atom: yes or no len(mask) == N_atoms
Returns: list of int
list all [000111111333344444..] with residue number for each atom

__calcResMap(self, mask=None)

source code 

Create a map of residue residue for atoms in model.
Parameters:
  • mask (list of int (1||0)) - atom mask
Returns: list of int
array [00011111122223333..], residue index for each unmasked atom

resMap(self, mask=None, force=0, cache=1)

source code 

Get list to map from any atom to a continuous residue numbering (starting with 0). A new residue is assumed to start whenever the 'residue_number' or the 'residue_name' record changes between 2 atoms. The mask is applied BEFORE looking for residue borders, i.e. it can change the residue numbering.

See resList() for an example of how to use the residue map.
Parameters:
  • mask (list of int (1||0)) - [0000011111001111...] include atom: yes or no len(atom_mask) == number of atoms in self.atoms/self.xyz
  • force (0||1) - recalculate map even if cached one is available (def 0)
  • cache (0||1) - cache new map (def 1)
Returns: list of int
array [00011111122223333..], residue index for each unmasked atom

resIndex(self, mask=None, force=0, cache=1)

source code 

Get the position of the each residue's first atom. The result is by default cached. That's not really necessary - The calculation is fast.
Parameters:
  • force (1||0) - re-calculate even if cached result is available (def 0)
  • cache (1||0) - cache the result (def 1)
  • mask (list of int (1||0)) - atom mask to apply before (i.e. result indices refer to compressed model)
Returns: list of int
index of the first atom of each residue

resEndIndex(self, mask=None)

source code 

Get the position of the each residue's last atom.
Parameters:
  • mask (list of int (1||0)) - atom mask to apply before (i.e. result indices refer to compressed model)
Returns: list of int
index of the last atom of each residue

chainMap(self, breaks=0, maxDist=None)

source code 

Get chain index of each atom. A new chain is started between 2 atoms if the chain_id or segment_id changes, the residue numbering jumps back or a TER record was found.
Parameters:
  • breaks (1||0) - split chains at chain breaks (def 0)
  • maxDist (float) - (if breaks=1) chain break threshold in Angstrom
Returns: list of int
array 1 x N_atoms of int, e.g. [000000011111111111122222...]

chainIndex(self, breaks=0, maxDist=None)

source code 

Get indices of first atom of each chain.
Parameters:
  • breaks (1||0) - split chains at chain breaks (def 0)
  • maxDist (float) - (if breaks=1) chain break threshold in Angstrom
Returns: list of int
array (1 x N_chains) of int

chainBreaks(self, breaks_only=1, maxDist=None)

source code 

Identify discontinuities in the molecule's backbone.
Parameters:
  • breaks_only (1|0) - don't report ends of regular chains (def 1)
  • maxDist (float) - maximal distance between consequtive residues [ None ] .. defaults to twice the average distance
Returns: list of int
atom indices of last atom before a probable chain break

removeRes(self, resname)

source code 

Remove all atoms with a certain residue name.
Parameters:
  • resname (str OR list of str) - name of residue to be removed

rms(self, other, mask=None, mask_fit=None, fit=1, n_it=1)

source code 

Rmsd between two PDBModels.
Parameters:
  • other (PDBModel) - other model to compare this one with
  • mask (list of int) - atom mask for rmsd calculation
  • mask_fit (list of int) - atom mask for superposition (default: same as mask)
  • fit (1||0) - superimpose first (default 1)
  • n_it (int) - number of fit iterations:
                  1 - classic single fit (default)
                  0 - until convergence, kicking out outliers on the way
    
Returns: float
rms in Angstrom

transformation(self, refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')

source code 

Get the transformation matrix which least-square fits this model onto the other model.
Parameters:
  • refModel (PDBModel) - reference PDBModel
  • mask (list of int) - atom mask for superposition
  • n_it (int) - number of fit iterations:
                  1 - classic single fit (default)
                  0 - until convergence
    
  • z (float) - number of standard deviations for outlier definition (default 2)
  • eps_rmsd (float) - tolerance in rmsd (default 0.5)
  • eps_stdv (float) - tolerance in standard deviations (default 0.05)
  • profname (str) - name of new atom profile getting outlier flag
Returns: array, array
array(3 x 3), array(3 x 1) - rotation and translation matrices

transform(self, *rt)

source code 

Transform coordinates of PDBModel.
Parameters:
  • rt (array OR array, array) - rotational and translation array: array( 4 x 4 ) OR array(3 x 3), array(3 x 1)
Returns: PDBModel
PDBModel with transformed coordinates

fit(self, refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')

source code 

Least-square fit this model onto refMode
Parameters:
  • refModel (PDBModel) - reference PDBModel
  • mask (list of int (1||0)) - atom mask for superposition
  • n_it (int) - number of fit iterations:
                  1 - classic single fit (default)
                  0 - until convergence
    
  • z (float) - number of standard deviations for outlier definition (default 2)
  • eps_rmsd (float) - tolerance in rmsd (default 0.5)
  • eps_stdv (float) - tolerance in standard deviations (default 0.05)
  • profname (str) - name of new atom profile containing outlier flag
Returns: PDBModel
PDBModel with transformed coordinates

magicFit(self, refModel, mask=None)

source code 

Superimpose this model onto a ref. model with similar atom content. magicFit( refModel [, mask ] ) -> PDBModel (or subclass )
Parameters:
  • refModel (PDBModel) - reference PDBModel
  • mask (list of int (1||0)) - atom mask to use for the fit
Returns: PDBModel
fitted PDBModel or sub-class

centered(self, mask=None)

source code 

Get model with centered coordinates.
Parameters:
  • mask (list of int (1||0)) - atom mask applied before calculating the center
Returns: PDBModel
model with centered coordinates

center(self, mask=None)

source code 

Geometric centar of model.
Parameters:
  • mask (list of int (1||0)) - atom mask applied before calculating the center
Returns: (float, float, float)
xyz coordinates of center

centerOfMass(self)

source code 

Center of mass of PDBModel.
Returns: (float, float, float)
array('f')

masses(self)

source code 

Collect the molecular weight of all atoms in PDBModel.
Returns: array of floats
1-D array with mass of every atom in 1/12 of C12 mass.
Raises:
  • PDBError - if the model contains elements of unknown mass

mass(self)

source code 

Molecular weight of PDBModel.
Returns: float
total mass in 1/12 of C12 mass
Raises:
  • PDBError - if the model contains elements of unknown mass

residusMaximus(self, atomValues, mask=None)

source code 

Take list of value per atom, return list where all atoms of any residue are set to the highest value of any atom in that residue. (after applying mask)
Parameters:
  • atomValues (list) - values per atom
  • mask (list of int (1||0)) - atom mask
Returns: array of float
array with values set to the maximal intra-residue value

argsort(self, cmpfunc=None)

source code 

Prepare sorting atoms within residues according to comparison function.
Parameters:
  • cmpfunc (function) - function( self.atoms[i], self.atoms[j] ) -> -1, 0, +1
Returns: list of int
suggested position of each atom in re-sorted model ( e.g. [2,1,4,6,5,0,..] )

sort(self, sortArg=None, deepcopy=0)

source code 

Apply a given sort list to the atoms of this model.
Parameters:
  • sortArg (function) - comparison function
  • deepcopy (0||1) - deepcopy atom dictionaries (default 0)
Returns: PDBModel
copy of this model with re-sorted atoms (see Numeric.take() )

unsort(self, sortList)

source code 

Undo a previous sorting on the model itself (no copy).
Parameters:
  • sortList (list of int) - sort list used for previous sorting.
Returns: list of int
the (back)sort list used ( to undo the undo...)
Raises:
  • PDBError - if sorting changed atom number

atomNames(self, start=None, stop=None)

source code 

Return a list of atom names from start to stop RESIDUE index
Parameters:
  • start (int) - index of first residue
  • stop (int) - index of last residue
Returns: list of str
['C','CA','CB' .... ]

__testDict_and(self, dic, condition)

source code 

Test if all key-value pairs of condition are matched in dic
Parameters:
  • condition (dictionary) - {..}, key-value pairs to be matched
  • dic (dictionary) - {..}, dictionary to be tested
Returns: 1|0
1|0, 1 if all key-value pairs of condition are matched in dic

__testDict_or(self, dic, condition)

source code 

Test if any key-value pairs of condition are matched in dic
Parameters:
  • condition (dictionary) - {..}, key-value pairs to be matched
  • dic (dictionary) - {..}, dictionary to be tested
Returns: 1|0
1|0, 1 if any key-value pairs of condition are matched in dic

filterIndex(self, mode=0, **kw)

source code 

Get atom positions that match a combination of key=values. E.g. filter( chain_id='A', name=['CA','CB'] ) -> index
Parameters:
  • mode (0||1) - 0 combine with AND (default), 1 combine with OR
  • kw (filter options, see example) - combination of atom dictionary keys and values/list of values that will be used to filter
Returns: list of int
sort list

filter(self, mode=0, **kw)

source code 

Extract atoms that match a combination of key=values. E.g. filter( chain_id='A', name=['CA','CB'] ) -> PDBModel
Parameters:
  • mode (0||1) - 0 combine with AND (default), 1 combine with OR
  • kw (filter options, see example) - combination of atom dictionary keys and values/list of values that will be used to filter
Returns: PDBModel
filterd PDBModel

equals(self, ref, start=None, stop=None)

source code 

Compares the residue and atom sequence in the given range. Coordinates are not checked, profiles are not checked.
Parameters:
  • start (int) - index of first residue
  • stop (int) - index of last residue
Returns: list if int
[ 1||0, 1||0 ], first position sequence identity 0|1, second positio atom identity 0|1

equalAtoms(self, ref)

source code 

Apply to SORTED models without HETATOMS. Coordinates are not checked.
Parameters:
  • ref (PDBModel) - reference PDBModel
Returns: (array, array)
(mask, mask_ref), two atom masks for all equal (1) atoms in models

Note: in some rare cases m1.equalAtoms( m2 ) gives a different result than m2.equalAtoms( m1 ). This is due to the used SequenceMatcher class.

To Do: option to make sure atoms are also in same order

compareAtoms(self, ref)

source code 

Get list of atom indices for this and reference model that converts both into 2 models with identical residue and atom content.

E.g.
>>> m2 = m1.sort()    ## m2 has now different atom order
>>> i2, i1 = m2.compareAtoms( m1 )
>>> m1 = m1.take( i1 ); m2 = m2.take( i2 )
>>> m1.atomNames() == m2.atomNames()  ## m2 has again same atom order
Returns: ([int], [int])
indices, indices_ref

__chainFraction(self, chain, ref)

source code 

Look how well a given chain matches a continuous stretch of residues in ref.
Parameters:
  • chain (int) - chain index
  • ref (PDBModel) - reference PDBModel
Returns: float
average relative length of matching chain fragments

compareChains(self, ref, breaks=0, fractLimit=0.2)

source code 

Get list of corresponding chain indices for this and reference model. Use takeChains() to create two models with identical chain content and order from the result of this function.
Parameters:
  • ref (PDBModel) - reference PDBModel
  • breaks (1||0) - look for chain breaks in backbone coordinates
  • fractLimit (float) -
Returns: ([int], [int])
chainIndices, chainIndices_ref