On the integration of command line tools
Many workflows require the integration of compiled programs or scripts that cannot be immediately integrated with signac’s Python interface.
There are basically four alternatives to do so:
- Use signac’s native command line interface,
- generate a shell script for execution.
- use subprocess forking,
- use signac-flow.
We will leverage the standard ideal gas example, but we will assume that we need to interface with the idg
program instead of a Python script.
The idg
program expects the system size N, pressure p, and thermal energy kT to calculate the volume V according to the ideal gas law.
For example:
$ idg 1000 2.0 1.0
2000.0
The following demonstrations all basically implement the same workflow:
signac’s CLI
N=1000
kT=1.0
for p in 0.1 1.0 10.0; do
WS=$(cat << EOF | signac job -cw
{"p": ${p}, "N": ${N}, "kT": ${kT}}
EOF
)
./idg ${N} ${kT} ${p} > ${WS}/V.txt
done
Here we use the heredoc syntax to specify the state point in place, avoiding the awkward escaping of quotes, that would otherwise be needed.
Generate a shell script
Alternativly we can use a mixed Python-shell approach, where we use the Python script and signac’s Python interface to generate a shell script:
import signac
IDG = './idg {job.sp.N} {job.sp.kT} {job.sp.p} > {job.ws}/V.txt'
project = signac.get_project()
for p in 0.1, 1.0, 10.0:
sp = {'N': 1000, 'kT': 1.0, 'p': p}
job = project.open_job(sp)
job.init()
print(IDG.format(job=job))
Executing this script, will generate the necessary commands:
./idg 1000 1.0 0.1 > /home/johndoe/my_project/workspace/5a6c687f7655319db24de59a2336eff8/V.txt
./idg 1000 1.0 1.0 > /home/johndoe/my_project/workspace/ee617ad585a90809947709a7a45dda9a/V.txt
./idg 1000 1.0 10.0 > /home/johndoe/my_project/workspace/5a456c131b0c5897804a4af8e77df5aa/V.txt
We can execute these commands by piping them into a shell of our choosing, e.g., bash:
$ python run.py | /bin/bash
or into a script, which we submit to an HPC cluster scheduler:
$ python run.py > submit.sh
$ qsub submit.sh
In the latter case, we would need to add the necessary PBS instructions to the script’s header.
Use process forking
This approach is very similar to the previous example, but instead we fork the required processes immediately with the subprocess
package:
import signac
from subprocess import run
IDG = './idg {job.sp.N} {job.sp.kT} {job.sp.p} > {job.ws}/V.txt'
project = signac.get_project()
for p in 0.1, 1.0, 10.0:
sp = {'N': 1000, 'kT': 1.0, 'p': p}
job = project.open_job(sp)
job.init()
run(IDG.format(job=job), shell=True)
The subprocess.run()
command was introduced with Python version 3.5, you would use subprocess.call
or similar with previous versions.
Use signac-flow
Finally, if we already use signac-flow for our workflow implementation, we just add the command as a regular operation:
# project.py
from flow import FlowProject
# import flow.environments # uncomment to use default environments
class Project(FlowProject):
def __init__(self, *args, **kwargs):
super(Project, self).__init__(*args, **kwargs)
self.add_operation(
name='calc-volume',
cmd='idg {job.sp.N} {job.sp.kT} {job.sp.p} > {job.wd}/V.txt')
if __name__ == '__main__':
Project().main()
This workflow could be executed with
$ python project.py run
or submitted to an HPC cluster scheduler with
$ python project.py submit