Versions Compared

    Key

    • This line was added.
    • This line was removed.
    • Formatting was changed.

    Job States
    As a job moves from submission to execution to completion, it goes through a variety of states . In fact, - and at any given moment every job is in exactly one of several possible states. Various commands issued either from the command line or through a Qube! GUI instruct the Supervisor to generate an event that changes the state of the job; this is called a transition. The description of all possible states and their transitions is called a state machine.

    The key to understand how to effectively use Qube! to manage jobs is to see how different commands change the state of a job. Normally, the submission of a job will place it in an initial state called pending. The Supervisor will take over from there, and without any other intervention, will place the job in running when it executes, and either failed or done when the job completes.

    Some of the commands have fairly straightforward effects. Killing, suspending, or blocking will change the state of a job to the corresponding state.

    Initial Job States
    Qube! jobs can be submitted in one of two initial states: pending and blocked. Pending is the default and signals the Supervisor that the job may be started at any time. Blocked tells the system to hold the job until it is unblocked. To specify the start state, a developer/user may specify it through the job structure in the API, or through the command line:

    Example:
    % qbsub --state blocked lsThe starting state of a job can be specified by the user or developer through the job structure in the API, or through the command line.

    StateMeaning
    pendingDefault state for submitted jobs. Signals to the Supervisor that the job may be started at any time. Jobs which have been suspended will also be marked as pending
    blockedAlternate state for submitted jobs. Tells the system to hold the job until it is unblocked by something, usually another job that this one depends on.
    runningJob that is doing work, with no failures.
    failingJob that has not finished, but has at least one frame or instance that has failed.
    retryingJobs that have retry counts greater than zero, and have been retried (automatically) at least once, are marked as retrying.
    killedJob that has been killed by a user. Killed jobs must be manually retried or resubmitted.
    completeJob is no longer running, and all frames have succeeded.
    failedJob is no longer running, and at least one frame or instance has failed.

    Actions

    States can be changed due to various actions taken by users or the Supervisor.

    ActionMeaning
    blockTypically done by users, but auto-wrangling will also block instances and jobs.
    interruptKill the current frame and put the job into a pending state, where it can be picked up and rerun.
    killEnd the current frame and don't restart the job. A user must retry or resubmit this job.
    resubmitBring up the submission UI and possibly modify the job's parameters before sending it back to the Supervisor.
    retryPut the job back onto the queue as-is, without modifying any of the submission parameters.
    suspendLike "interrupt" except that it allows the current frame to finish first.