Setting up Applications

The ApplicationDefinition

Each task submitted to Balsam corresponds to the execution of a single ApplicationDefinition. Before adding runs, we have to tell Balsam what the application is. An ApplicationDefinition object comprises the following fields:

Field	Description	Required?
`name`	A unique identifier for the application	yes
`executable`	First half of the command to execute	yes
`description`	Descriptive text for your reference	optional
`preprocess`	A script that runs prior to execution	optional
`postprocess`	A script that runs after execution	optional
`envscript`	Script for loading modules, setting envs	optional

Fully-qualified paths should be used in defining Applications. The executable can be a simple path to an executable file, or a more complex, multi-argument command line. The app executable is then joined task args field is concatenated to the app's executable field to form the full command line. The key is to understand that Balsam executes the following shell command for each task:

{application.executable} {task.args}

preprocess and postprocess may be used to attach scripts that run before and after an application's main executable.
The preprocess stage runs only once before a task is executed; it does not run again when a task is restarted due to timeout or failure. The postprocess stage normally runs only after successful execution of a task (application returns error code 0). A task can be configured to error handling by the postprocess script by setting post_error_handler=True (see below). These scripts run in the working directory of each task and have access to the task state via the Balsam Python API:

from balsam.launcher.dag import current_job

def timeout_handler():
    """Run this code before restarting a job that ran out of time"""
    pass

if current_job.state == "RUN_TIMEOUT":
    timeout_handle()

Creating ApplicationDefinitions

You can add Balsam Apps quickly from the command line:

balsam app --name MyApp --executable '/path/to/app' --preprocess `python /path/to/preproc.py`

Or via the Python API:

from balsam.core.models import ApplicationDefinition

myApp = ApplicationDefinition(
    name="myname",
    executable="singularity run /path/to/myImage.sif /bin/app",
    envscript="/path/to/setup-envs.sh",
    postprocess="python /path/to/post.py"
)
myApp.save()

Creating Tasks with BalsamJob

Once your Applications are defined, you can start composing workflows by adding tasks to the database. Tasks can be added with the balsam job command-line tool:

$ balsam job --help # see help menu with listing of fields
$ balsam job --name hello --workflow Test --app sayHi --args "world!" --ranks-per-node 2

Or, equivalently, using the balsam.launcher.dag.BalsamJob() constructor and Django model save method:

from balsam.launcher.dag import BalsamJob
job = BalsamJob(
    name = "hello",
    workflow = "hello",
    application = "sayHi",
    args = "world!",
    ranks_per_node = 2,
)
job.save()

A powerful concept in Balsam is that you can add tasks from anywhere at any time:

From a login shell, even in the middle of a running job
From inside a pre- or post-processing stage of a task
During the execution of an Application itself (either a system call to balsam job or direct use of Python API)

Jobs can be modified and removed from the command line (see balsam rm --help and balsam modify --help) or Python API.

Balsam uses the Django ORM, and the BalsamJob and ApplicationDefinition classes are just ordinary Django models. Users are strongly encouraged to read up on writing queries with Django. The API is intuitive and provides flexible methods to query and manipulate the BalsamJob table.

See the FAQs for some neat examples and links to further reading.

Balsam State Flow

As the Balsam components process your workflow, each task advances through a series of states according to the flow chart below.

Balsam processes each BalsamJob as a state-machine: tasks proceed from one state to the next according to this flow chart. For instance, to re-run a task, set its state to RESTART_READY.

BalsamJob Fields

Field	Description
`name`	Determines working directory. Should form a unique pair with `workflow`
`workflow`	Determines working directory. Should form a unique pair with `name`
`application`	Name of the `ApplicationDefinition` to run with
`args`	Command line arguments to the application executable
`data`	Arbitrary JSON data storage. Useful for storing results together with BalsamJob data
`user_workdir`	Override default directory naming scheme with a fully-qualified path
`description`	Arbitrary text description to associate with a task
`parents`	IDs of parent jobs (task will not start until dependencies satisfied)
`input_files`	glob (wildcard) patterns for files to copy from parent to child job
`wall_time_minutes`	Estimated task duration. Useful to set priority: longer tasks run first.
`num_nodes`	Number of nodes on which this task should run (usually 1 unless using MPI)
`ranks_per_node`	Number of MPI ranks per node (leave at 1 unless using MPI)
`cpu_affinity`	CPU-thread affinity option (on ALCF Theta, use either `depth` or `none`)
`threads_per_rank`	Number of threads per MPI rank (on Theta, the aprun `-d` flag)
`threads_per_core`	Number of threads per hardware core (on Theta, the aprun `-j` flag)
`node_packing_count`	For non-MPI tasks and serial job mode only: how many tasks to pack per node
`environ_vars`	Colon-separated list (`ENV1=VALUE1:ENV2=VALUE2`)
`post_error_handler`	Boolean: whether or not `postprocess` should be invoked to handle `RUN_ERROR` jobs
`post_timeout_handler`	Boolean: whether or not `postprocess` should be invoked to handle `RUN_TIMEOUT` jobs
`auto_timeout_retry`	Boolean: whether or not `RUN_TIMEOUT` jobs should automatically advance to `RESTART_READY`
`state`	Current job state
`state_history`	History of job states with timestamps for each transition