Package Biskit :: Package PVM :: Module TrackingJobMaster :: Class TrackingJobMaster
[hide private]
[frames] | no frames]

Class TrackingJobMaster

source code


TrackingJobMaster

This class extends JobMaster with the following extras:

The calculation is performed non-blocking in a thread after a call to master.start(). The end of calculation is signalled on master.lock / master.lockMsg. The result can then be obtained with getResult().

Alternatively, a callback method can be registered that is called after the calculation finished (master.setCallback()).

The perhaps easiest (but also least flexible) way is to instead use the calculateResult() method. This starts the calculation and blocks execution until the result is returned.

Consider overriding cleanup(), done() and getResult().

An interrupted calculation can be restarted from a restart file: Manual restart is possible as follows:
  • pickle master.data, master.result, master.status.objects
  • master.exit() / Exception / kill, etc.
  • initialize master with same parameters as before
  • unpickle and re-assign master.data, master.result, master.status.objects
  • master.start()



  • Note: The master sends out an exit signal to all slaves but doesn't wait for a response (there isn't any) and continues in the finish() method. Since, at the end, the same job is distributed to several slaves, some of them might still be running when cleanup() or done() are executed. The slave script must tolerate errors that, e.g., happen if cleanup() is called while it is running.

    To Do:

    Instance Methods [hide private]
      __init__(self, data={}, chunk_size=5, hosts=[], niceness={'default': 20}, slave_script='', verbose=1, show_output=0, add_hosts=1, redistribute=1)
      hostnameFromTID(self, slave_tid)
    Get nickname of host from TaskID.
      is_valid_slave(self, slave_tid)
    Override JobMaster method to disable slow nodes on the fly
      mark_slow_slaves(self, host_list, slow_factor)
      start_job(self, slave_tid)
    Overriding JobMaster method
      job_done(self, slave_tid, result)
    Overriding JobMaster method
      reportProgress(self)
    Report how many jobs were processed in what time per host.
      setCallback(self, funct)
    Register function to be called after calculation is finished.
      cleanup(self)
    Called after exit.
      done(self)
    Called by finish() after exit(), cleanup(), and reportProgress(), but before thread notification (notifyAll() ) and before executing the callBack method.
      notifyAll(self)
    Notify thread waiting on self.lockMsg that master has finished.
      finish(self)
    Called one time, after last job result has been received.
    {any:any} getResult(self, **arg)
    Return result dict, if it is available.
    array calculateResult(self, **arg)
    Convenience function that is starting the parallel calculation and blocks execution until it is finished.
    dict getRst(self)
    Get data necessary for a restart of the running calculation.
      saveRst(self, fname)
    Pickle data necessary for a restart of the running calculation.
    dict setRst(self, rst_data)
    Prepare this master for restart, called by restart().

    Inherited from dispatcher.JobMaster: bindMessages, getInitParameters, get_slave_chunk, initializationDone, spawn, spawnAll, start

    Inherited from PVMThread.PVMMasterSlave: exit, initialize, messageLoopIsUp, startMessageLoop

    Inherited from PVMThread.PVMThread: bind, getBindings, getMessageLoopDelay, getParent, getPingTimeout, getTID, getTasks, isStopped, log, messageLoopIsStopped, nicknameFromTID, ping, post, post_execute_method, post_message_received, post_message_sent, rm_log, run, send, sendToAll, send_primitive, setMessageLoopDelay, setPingTimeout, stop, stopMessageLoop, unbind

    Inherited from PVMThread.PVMThread (private): _ping

    Inherited from threading.Thread: __repr__, getName, isAlive, isDaemon, join, setDaemon, setName

    Inherited from threading.Thread (private): _set_daemon

    Inherited from threading._Verbose (private): _note

    Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__


    Properties [hide private]

    Inherited from object: __class__


    Method Details [hide private]

    __init__(self, data={}, chunk_size=5, hosts=[], niceness={'default': 20}, slave_script='', verbose=1, show_output=0, add_hosts=1, redistribute=1)
    (Constructor)

    source code 
    Parameters:
    • data ({str_id:any}) - dict of items to be processed
    • chunk_size (int) - number of items that are processed per job
    • hosts ([str]) - list of host-names
    • niceness ({str:int}) - {str_host-name: int_niceness}
    • slave_script (str) - absolute path to slave-script
    • verbose (1|0) - verbosity level (default: 1)
    • show_output (1|0) - display one xterm per slave (default: 0)
    • add_hosts (1|0) - add hosts to PVM before starting (default: 1)
    • redistribute (1|0) - at the end, send same job out several times (default: 1)
    Overrides: dispatcher.JobMaster.__init__

    hostnameFromTID(self, slave_tid)

    source code 

    Get nickname of host from TaskID.
    Parameters:
    • slave_tid (int) - slave task tid

    is_valid_slave(self, slave_tid)

    source code 

    Override JobMaster method to disable slow nodes on the fly
    Parameters:
    • slave_tid (int) - slave task tid
    Overrides: dispatcher.JobMaster.is_valid_slave

    mark_slow_slaves(self, host_list, slow_factor)

    source code 
    Parameters:
    • host_list ([str]) - list of hosts
    • slow_factor (float) - factor describing the calculation speed of a node

    start_job(self, slave_tid)

    source code 

    Overriding JobMaster method
    Parameters:
    • slave_tid (int) - slave task tid
    Overrides: dispatcher.JobMaster.start_job

    job_done(self, slave_tid, result)

    source code 

    Overriding JobMaster method
    Parameters:
    • slave_tid (int) - slave task tid
    • result (dict) - slave result dictionary
    Overrides: dispatcher.JobMaster.job_done

    reportProgress(self)

    source code 

    Report how many jobs were processed in what time per host.

    setCallback(self, funct)

    source code 

    Register function to be called after calculation is finished.
    Parameters:
    • funct (function) - will be called with an instance of the master as single argument

    cleanup(self)

    source code 

    Called after exit. Override.

    done(self)

    source code 

    Called by finish() after exit(), cleanup(), and reportProgress(), but before thread notification (notifyAll() ) and before executing the callBack method. Override.
    Overrides: dispatcher.JobMaster.done

    notifyAll(self)

    source code 

    Notify thread waiting on self.lockMsg that master has finished.

    finish(self)

    source code 

    Called one time, after last job result has been received. It should not be necessary to override this further. Override done() instead.
    Overrides: dispatcher.JobMaster.finish

    getResult(self, **arg)

    source code 

    Return result dict, if it is available. Override to return something else - which will also be the return value of calculateResult().
    Parameters:
    • arg ({key:value}) - keyword-value pairs, for subclass implementations
    Returns: {any:any}
    {any:any}

    calculateResult(self, **arg)

    source code 

    Convenience function that is starting the parallel calculation and blocks execution until it is finished.
    Parameters:
    • arg ({key:value}) - keyword-value pairs, for subclass implementations
    Returns: array
    array( (n_frames, n_frames), 'f'), matrix of pairwise rms

    getRst(self)

    source code 

    Get data necessary for a restart of the running calculation. Locks, file handles and private data are *NOT* saved. Override if necessary but call this method in child method.
    Returns: dict
    {..}, dict with 'pickleable' fields of master

    saveRst(self, fname)

    source code 

    Pickle data necessary for a restart of the running calculation.
    Parameters:
    • fname (str) - file name

    setRst(self, rst_data)

    source code 

    Prepare this master for restart, called by restart(). Override if necessary but call in child.
    Parameters:
    • rst_data (dict) - {..}, parameters for master.__dict__ + some special fields
    Returns: dict
    {..}, parameters for master.__dict__ without special fields