Biskit.Executor

43 """ 44 All calls of external programs should be done via this class or subclasses. 45 46 Executor gets the necessary information about a program (binary, 47 environment variables, etc) from ExeConfigCache, creates an input file 48 or pipe from a template (if available) or an existing file, wrapps the 49 program call into ssh and nice (if necessary), spawns an external process 50 via subprocess.Popen, communicates the input file or string, waits for 51 completion and collects the output file or string, and cleans up 52 temporary files. 53 54 There are two ways of using Executor 55 ==================================== 56 57 1. (recommended) Create a subclass of Executor for a certain program 58 call. Methods to override would be: 59 60 - __init__ ... to set your own default values 61 (call parent __init__!) 62 - prepare ... called BEFORE program execution 63 - cleanup ... called AFTER program execution 64 (call parent cleanup!) 65 - finish ... called AFTER successful program execution 66 - isfailed ... to detect the success status after program execution 67 - fail ... called if execution fails 68 69 Additionally, you should provide a simple program configuration file 70 in biskit/external/defaults/. See L{Biskit.ExeConfig} for 71 details and examples! 72 73 2. Use Executor directly. 74 An example is given in the __main__ section of this module. 75 You first have to create an Executor instance with all the 76 parameters, then call its run() method and collect the result. 77 78 In the most simple cases this can be combined into one line: 79 80 >>> out, error, returncode = Executor('ls', strict=0).run() 81 82 strict=0 means, ExeConfig does not insist on an existing exe_ls.dat 83 file and instead looks for a program called 'ls' in the search path. 84 85 86 Templates 87 ========= 88 Templates are files or strings that contain place holders like, 89 for example:: 90 91 file_in=%(f_in)s 92 file_out=%(f_out)s 93 94 At run time, Executor will create an input file or pipe from the 95 template by replacing all place holders with values from its own 96 fields. Let's assume, the above example is put into a file 'in.template'. 97 98 >>> x = Executor( 'ls', template='in.template', f_in='in.dat') 99 100 ... will then pass the following input to the ls program:: 101 102 file_in=in.dat 103 file_out=/tmp/tmp1HYOvO 104 105 However, the following input template will raise an error:: 106 107 file_in=%(f_in)s 108 seed=%(seed)i 109 110 ...because Executor doesn't have a 'seed' field. You could provide 111 one by overwriting Executor.__init__. Alternatively, you can 112 provide seed as a keyword to the original Executor.__init__: 113 114 >>> x = Executor('ls', template='in.template',f_in='in.dat', seed=1.5) 115 116 This works because Executor.__init__ puts all unknown key=value pairs 117 into the object's name space and passes them on to the template. 118 119 120 Communicating Input 121 =================== 122 123 Programs often expect scripts, commands or additional parameters 124 from StdIn or from input files. Executor tries to support many 125 scenarios -- which one is chosen mainly depends on the 126 L{ExeConfig} `pipes` setting in exe_<program>.dat and on the 127 `template` parameter given to Executor.__init__. (Note: Executor 128 loads the ExeConfig instance for the given program `name` into its 129 `self.exe` field.) 130 131 Here is an overview over the different scenarios and how to 132 activate them: 133 134 1. B{ no input (default behaviour)} 135 136 The program only needs command line parameters 137 138 Condition: 139 140 - template == None 141 142 2. B{ input pipe from STDIN 143 (== ``myprogram | 'some input string'``) } 144 145 Condition: 146 147 - exe.pipes == 1 / True 148 - template != None ((or f_in points to existing file)) 149 150 Setup: 151 152 1. `template` points to an existing file: 153 154 Executor reads the template file, completes it in 155 memory, and pushes it directly to the program. 156 157 2. `template` points to string that doesn't look like a file name: 158 159 Executor completes the string in memory (using 160 `self.template % self.__dict__`) and pushes it 161 directly to the program. This is the fastest option 162 as it avoids file access alltogether. 163 164 3. `template` == None but f_in points to an *existing* file: 165 166 Executor will read this file and push it unmodified to 167 the program via StdIn. (kind of an exception, if used at 168 all, f_in usual points to a *non-existing* file that 169 will receive the completed input.) 170 171 3. B{ input from file 172 (== ``myprogram < input_file``) } 173 174 Condition: 175 176 - exe.pipes == 0 / False 177 - template != None 178 - push_inp == 1 / True (default) 179 180 Setup: 181 182 1. `template` points to an existing file: 183 184 Executor reads the template file, completes it in 185 memory, saves the completed file to disc (creating or 186 overriding self.f_in), opens the file and passes the 187 file handle to the program (instead of STDIN). 188 189 2. `template` points to string that doesn't look like a file name: 190 191 Same as 3.1, except that the template is not read 192 from disc but directly taken from memory (see 2.2). 193 194 4. B{ input from file passed as argument to the program 195 (== ``myprogram input_file``) } 196 197 Condition: 198 199 - exe.pipes == 0 / False 200 201 For this it is up to you to provide the correct program 202 argument. 203 204 Setup: 205 206 1. Use template completion: 207 208 The best option would be to set an explicit file name 209 for `f_in` and include this file name into `args`, Example:: 210 211 exe = ExeConfigCache.get('myprogram') 212 assert not exe.pipes 213 214 x = Executor( 'myprogram', args='input.in', f_in='input.in', 215 template='/somewhere/input.template', cwd='/tmp' ) 216 217 Executor create your input file on the fly which is then 218 passed as first argument. 219 220 2. Without template completion: 221 222 Similar, just that you don't give a template:: 223 224 x = Executor( 'myprogram', args='input.in', f_in='input.in', 225 cwd='/tmp' ) 226 227 It would then be up to you to provide the correct 228 input file in `/tmp/input.in`. You could override the 229 L{prepare()} hook method for creating it. 230 231 There are other ways of doing the same thing. 232 233 234 Look at L{generateInp()} to see what is actually going on. 235 236 237 References 238 ========== 239 240 - See also L{Biskit.IcmCad} for an Example of how to overwrite and 241 use Executor. 242 243 - See also L{Biskit.ExeConfig} for a description of program 244 configuration. 245 """ 246

247 - def __init__( self, name, args='', template=None, f_in=None, f_out=None, 248 f_err=None, strict=1, catch_out=1, push_inp=1, catch_err=0, 249 node=None, nice=0, cwd=None, log=None, debug=0, 250 verbose=None, **kw ):

251 252 """ 253 Create Executor. *name* must point to an existing program configuration 254 unless *strict*=0. Executor will create a program input from 255 the template and its own fields and put it into f_in. If f_in but 256 no template is given, the unchanged f_in is used as input. If neither 257 is given, the program is called without input. If a node is given, 258 the process is wrapped in a ssh call. If *nice* != 0, the process 259 is preceeded by nice. *cwd* specifies the working directory. By 260 default, this setting is taken from the configuration file which 261 defaults to the current working directory. 262 263 @param name: program name (configured in .biskit/exe_name.dat) 264 @type name: str 265 @param args: command line arguments 266 @type args: str 267 @param template: template for input file -- this can be the template 268 itself or the path to a file containing it 269 (default: None) 270 @type template: str 271 @param f_in: target for completed input file (default: None, discard) 272 @type f_in: str 273 @param f_out: target file for program output (default: None, discard) 274 @type f_out: str 275 @param f_err: target file for error messages (default: None, discard) 276 @type f_err: str 277 @param strict: strict check of environment and configuration file 278 (default: 1) 279 @type strict: 1|0 280 @param catch_out: catch output in file (f_out or temporary) 281 (default: 1) 282 @type catch_out: 1|0 283 @param catch_err: catch errors in file (f_out or temporary) 284 (default: 1) 285 @type catch_err: 1|0 286 @param push_inp: push input file to process via stdin ('< f_in') [1] 287 @type push_inp: 1|0 288 @param node: host for calculation (None->no ssh) (default: None) 289 @type node: str 290 @param nice: nice level (default: 0) 291 @type nice: int 292 @param cwd: working directory, overwrites ExeConfig.cwd (default: None) 293 @type cwd: str 294 @param log: execution log (None->STOUT) (default: None) 295 @type log: Biskit.LogFile 296 @param debug: keep all temporary files (default: 0) 297 @type debug: 0|1 298 @param verbose: print progress messages to log (default: log != STDOUT) 299 @type verbose: 0|1 300 @param kw: key=value pairs with values for template file 301 @type kw: key=value 302 303 @raise ExeConfigError: if environment is not fit for running 304 the program 305 """ 306 self.exe = ExeConfigCache.get( name, strict=strict ) 307 self.exe.validate() 308 309 self.f_out = t.absfile( f_out ) 310 if not f_out and catch_out: 311 self.f_out = tempfile.mktemp( '.out' ) 312 313 self.f_err = t.absfile( f_err ) 314 if not f_err and catch_err: 315 self.f_err = tempfile.mktemp( '.err' ) 316 317 self.keep_out = f_out is not None 318 self.catch_out = catch_out 319 self.catch_err = catch_err 320 321 self.f_in = f_in #: will be overridden by self.convertInput() 322 self.keep_inp = f_in is not None 323 self.push_inp = push_inp 324 325 self.args = args 326 self.template = template 327 328 self.node = node ## or os.uname()[1] 329 self.nice = nice 330 self.debug = debug 331 332 self.cwd = cwd or self.exe.cwd 333 334 #: Log object for own program messages 335 self.log = log or StdLog() 336 self.verbose = verbose 337 if self.verbose is None: 338 self.verbose = (log is not None) 339 340 ## these are set by self.run(): 341 self.runTime = 0 #: time needed for last run 342 self.output = None #: STDOUT returned by process 343 self.error = None #: STDERR returned by process 344 self.returncode = None #: int status returned by process 345 self.pid = None #: process ID 346 347 self.result = None #: set by self.finish() 348 349 self.initVersion = self.version() 350 351 self.__dict__.update( kw )

352 353

354 - def version( self ):

355 """Version of class (at creation). 356 @return: version 357 @rtype: str 358 """ 359 return 'Executor $Revision: 2.17 $'

360 361

362 - def communicate( self, cmd, inp, bufsize=-1, executable=None, 363 stdin=None, stdout=None, stderr=None, 364 shell=0, env=None, cwd=None ):

365 """ 366 Start and communicate with the new process. Called by execute(). 367 See subprocess.Popen() for a detailed description of the parameters! 368 This method should work for pretty much any purpose but may fail for 369 very long pipes (more than 100000 lines). 370 371 @param inp: (for pipes) input sequence 372 @type inp: str 373 @param cmd: command 374 @type cmd: str 375 @param bufsize: see subprocess.Popen() (default: -1) 376 @type bufsize: int 377 @param executable: see subprocess.Popen() (default: None) 378 @type executable: str 379 @param stdin: subprocess.PIPE or file handle or None (default: None) 380 @type stdin: int|file|None 381 @param stdout: subprocess.PIPE or file handle or None (default: None) 382 @type stdout: int|file|None 383 @param stderr: subprocess.PIPE or file handle or None (default: None) 384 @type stderr: int|file|None 385 @param shell: wrap process in shell; see subprocess.Popen() 386 (default: 0, use exe_*.dat configuration) 387 @type shell: 1|0 388 @param env: environment variables (default: None, use exe_*.dat config) 389 @type env: {str:str} 390 @param cwd: working directory (default: None, means self.cwd) 391 @type cwd: str 392 393 @return: output and error output 394 @rtype: str, str 395 396 @raise RunError: if OSError occurs during Popen or Popen.communicate 397 """ 398 try: 399 p = subprocess.Popen( cmd.split(), 400 bufsize=bufsize, executable=executable, 401 stdin=stdin, stdout=stdout, stderr=stderr, 402 shell=shell or self.exe.shell, 403 env=env or self.environment(), 404 cwd=cwd or self.cwd ) 405 406 self.pid = p.pid 407 408 output, error = p.communicate( inp ) 409 410 self.returncode = p.returncode 411 412 except OSError, e: 413 raise RunError, \ 414 "Couldn't run or communicate with external program: %r"\ 415 % e.strerror 416 417 return output, error

418 419

420 - def execute( self, inp=None ):

421 """ 422 Run external command and block until it is finished. 423 Called by L{ run() }. 424 425 @param inp: input to be communicated via STDIN pipe (default: None) 426 @type inp: str 427 428 @return: execution time in seconds 429 @rtype: int 430 431 @raise RunError: see communicate() 432 """ 433 start_time = time.time() 434 435 cmd = self.command() 436 437 shellexe = None 438 if self.exe.shell and self.exe.shellexe: 439 shellexe = self.exe.shellexe 440 441 stdin = stdout = stderr = None 442 443 if self.exe.pipes: 444 stdin = subprocess.PIPE 445 stdout= subprocess.PIPE 446 stderr= subprocess.PIPE 447 else: 448 inp = None 449 if self.f_in and self.push_inp: 450 stdin = open( self.f_in ) 451 if self.f_out and self.catch_out: 452 stdout= open( self.f_out, 'w' ) 453 if self.f_err and self.catch_err: 454 stderr= open( self.f_err, 'w' ) 455 456 if self.verbose: 457 self.log.add('executing: %s' % cmd) 458 self.log.add('in folder: %s' % self.cwd ) 459 self.log.add('input: %r' % stdin ) 460 self.log.add('output: %r' % stdout ) 461 self.log.add('errors: %r' % stderr ) 462 self.log.add('wrapped: %r'% self.exe.shell ) 463 self.log.add('shell: %r' % shellexe ) 464 self.log.add('environment: %r' % self.environment() ) 465 if self.exe.pipes and inp: 466 self.log.add('%i byte of input pipe' % len(str(inp))) 467 468 self.output, self.error = self.communicate( cmd, inp, 469 bufsize=-1, executable=shellexe, stdin=stdin, 470 stdout=stdout, stderr=stderr, 471 shell=self.exe.shell, 472 env=self.environment(), cwd=self.cwd ) 473 474 if self.exe.pipes and self.f_out: 475 open( self.f_out, 'w').writelines( self.output ) 476 477 if self.verbose: self.log.add(".. finished.") 478 479 return time.time() - start_time

480 481

482 - def run( self, inp_mirror=None ):

483 """ 484 Run the callculation. This calls (in that order): 485 - L{ prepare() }, 486 - L{ execute() }, 487 - L{ postProcess() }, 488 - L{ finish() } OR L{ fail() }, 489 - L{ cleanup() } 490 491 @param inp_mirror: file name for formatted copy of inp file 492 (default: None) [not implemented] 493 @type inp_mirror: str 494 495 @return: calculation result 496 @rtype: any 497 """ 498 try: 499 self.prepare() 500 501 self.inp = self.generateInp() 502 503 self.runTime = self.execute( inp=self.inp ) 504 505 self.postProcess() 506 507 except IOError, why: 508 try: 509 self.fail() 510 finally: 511 self.cleanup() 512 raise RunError, why 513 514 try: 515 if self.isFailed(): 516 self.fail() 517 else: 518 self.finish() 519 finally: 520 self.cleanup() 521 522 return self.result

523 524

525 - def command( self ):

526 """ 527 Compose command string from binary, arguments, nice, and node. 528 Override (perhaps). 529 530 @return: the command to execute 531 @rtype: str 532 """ 533 exe = t.absbinary( self.exe.bin ) 534 535 if self.args: 536 exe = exe + ' ' + self.args 537 538 str_nice = str_ssh = '' 539 540 if self.nice != 0: 541 str_nice = "%s -%i" % (s.nice_bin, self.nice) 542 543 if self.node is not None: 544 str_ssh = "%s %s" % (s.ssh_bin, self.node ) 545 546 cmd = "%s %s %s" % (str_ssh, str_nice, exe ) 547 cmd = cmd.strip() 548 549 return cmd

550 551

552 - def environment( self ):

553 """ 554 Setup the environment for the process. Override if needed. 555 556 @return: environment dictionary 557 @rtype: dict OR None 558 """ 559 if not self.exe.replaceEnv: 560 return None 561 562 return self.exe.environment()

563 564

565 - def prepare( self ):

566 """ 567 called before running external program, override! 568 """ 569 pass

570 571

572 - def postProcess( self ):

573 """ 574 called directly after running the external program, override! 575 """ 576 pass

577 578

579 - def cleanup( self ):

580 """ 581 Clean up after external program has finished (failed or not). 582 Override, but call in child method! 583 """ 584 if not self.keep_out and not self.debug and self.f_out: 585 t.tryRemove( self.f_out ) 586 587 if not self.keep_inp and not self.debug: 588 t.tryRemove( self.f_in ) 589 590 if self.f_err and not self.debug: 591 t.tryRemove( self.f_err )

592 593

594 - def fail( self ):

595 """ 596 Called if external program failed, override! 597 """ 598 pass

599 600

601 - def finish( self ):

602 """ 603 Called if external program finished successfully, override! 604 """ 605 self.result = self.output, self.error, self.returncode

606 607

608 - def isFailed( self ):

609 """ 610 Detect whether external program failed, override! 611 """ 612 return 0

613 614

615 - def fillTemplate( self ):

616 """ 617 Create complete input string from template with place holders. 618 619 @return: input 620 @rtype: str 621 622 @raise TemplateError: if unknown option/place holder in template file 623 """ 624 inp = self.template 625 626 try: 627 if os.path.isfile( inp ): 628 inp = open( inp, 'r' ).read() 629 return inp % self.__dict__ 630 631 except KeyError, why: 632 s = "Unknown option/place holder in template file." 633 s += "\n template file: " + str( self.template ) 634 s += "\n Template asked for a option called " + str( why[0] ) 635 raise TemplateError, s

636 637

638 - def convertInput( self, inp):

639 """ 640 Convert the input to a format used by the selected execution method. 641 642 @param inp: path to existing input file or string with input 643 @type inp: str 644 645 @return: input string if self.exe.pipes; file name otherwise 646 @rtype: str 647 """ 648 if self.exe.pipes: 649 650 ## convert file to string 651 if not inp and os.path.exists( self.f_in or '' ): 652 653 return open( self.f_in, 'r' ).read() 654 655 return inp 656 657 ## no pipes and no input string 658 if inp is None: 659 660 return inp 661 662 ## else put input string into file 663 self.f_in = self.f_in or tempfile.mktemp('_exec.inp') 664 665 f = open( self.f_in, 'w') 666 f.write(inp) 667 f.close() 668 return self.f_in

669 670

671 - def generateInp(self):

672 """ 673 Prepare the program input (file or string) from a template (if 674 present, file or string). 675 676 @return: input file name OR (if pipes=1) content of input file 677 @rtype: str 678 679 @raise TemplateError: if error while creating template file 680 """ 681 try: 682 inp = None 683 684 if self.template: 685 inp = self.fillTemplate() 686 687 return self.convertInput( inp ) 688 689 except Exception, why: 690 s = "Error while creating template file." 691 s += "\n template file: " + str( self.template ) 692 s += "\n why: " + str( why ) 693 s += "\n Error:\n " + t.lastError() 694 raise TemplateError, s

Source Code for Module Biskit.Executor