Cluster Guide

From OPS
Jump to: navigation, search

Contents

Cluster

The cluster setup is a custom cresis toolbox interface to several different cluster interfaces. The cluster interfaces that are supported are:

Cluster Type ctrl.cluster.type Requires Compiler Description
Torque/PBS 'torque' Yes PBS or Torque scheduler using qsub, qstat, and qdel
Slurm 'slurm' Yes Slurm scheduler using sbatch, squeue, scancel, and scontrol
Matlab Distributed Computing Server No Not tested
Matlab parallel processing toolbox 'matlab' No Uses Matlab's parallel processing toolbox
Debug 'debug' Optional Runs tasks in the current matlab session. Tasks can be run in several modes.

When running the "Debug" cluster type, there are three modes of executing tasks and this is specified in ctrl.cluster.run_mode.

ctrl.cluster.run_mode Description
1 Runs cluster_job.m directly
2 Runs a Matlab compiled version of cluster_job.m
3 Runs the cluster_job.sh bash command (only works if bash is available from Matlab's system() function)

Description

Tasks to be run are grouped into jobs. The jobs are what are actually sent to the clusters so that a single job may do several tasks. Jobs are grouped into batches. The whole process does not go forward until all the jobs in the batch are run.

The commands for the cluster are all located in the cluster directory in the cresis-toolbox repository and named with cluster_*.

Quick Start Guide

Run "cluster_compile" from Matlab

>> cluster_compile
Start Compiling 17-Mar-2018 15:43:18

  mcc -m -d /work/ollie/jpaden/scripts/cresis-toolbox/cresis-toolbox/cluster -R '-singleCompThread,-nodisplay' /work/ollie/jpaden/scripts/cresis-toolbox/cresis-toolbox/cluster/cluster_job.m tomo_collate_task.m create_records_accum2_task.m create_records_acords_task.m create_records_mcords_task.m basic_load_fmcw.m basic_load_fmcw2.m rx_chan_equal_sar_task.m rx_chan_equal_raw_task.m coh_noise_tracker_task.m analysis_task.m radiometric_calibration_task.m get_heights_task.m get_heights_combine_task.m csarp_task.m combine_wf_chan_task.m nsidc_delivery_script_task.m hanning.m hamming.m blackman.m tukeywin.m tukeywin_trim.m chebwin.m kaiser.m boxcar.m butter.m array_proc_sv.m lever_arm.m doa_nonlcon.m
Startup Script Running
  Resetting path
  Adding cresis path: /work/ollie/jpaden/scripts/cresis-toolbox/cresis-toolbox/
  Adding personal path: /work/ollie/jpaden/scripts/matlab/
  Setting global preferences in global variable gRadar
Warning: Your deployed application may error out because file or folder paths
not present in the deployed environment may be included in your MATLAB startup
file. Use the MATLAB function "isdeployed" in your MATLAB startup file to
determine the appropriate execution environment when including file and folder
paths, and recompile your application.


Done Compiling 17-Mar-2018 15:44:11

Run "out = cluster_submit_batch('hanning',true,{10},1,60)". This is a very basic example of how to use the cresis-toolbox cluster interface and tests the core functions.

>> out = cluster_submit_batch('hanning',true,{10},1,60)
Submitting /work/ollie/jpaden/Scratch/mdce_tmp/cluster-temp/batch_3_tp05f451d4_4151_4f6c_b1bb_6c98c3135228
Submitted 1 tasks in cluster job 3/1739336:
  1
 QJob 3:1/1739336 status changed to P (17-Mar-2018 15:46:27)
 QJob 3:1/1739336 status changed to R (17-Mar-2018 15:46:28)
Chain 1 succeeded (17-Mar-2018 15:46:54)
  cluster_cleanup: Removing /work/ollie/jpaden/Scratch/mdce_tmp/cluster-temp/batch_3_tp05f451d4_4151_4f6c_b1bb_6c98c3135228
out =
  struct with fields:

        errorstruct: []
            argsout: {[10x1 double]}
    cpu_time_actual: 0.0376

For manual polling (nonblocking) cluster usage, also run the nonblocking example described by "help cluster_submit_batch".

>>   ctrl = cluster_submit_batch('hanning',false,{10},1,60);
Submitting /work/ollie/jpaden/Scratch/mdce_tmp/cluster-temp/batch_3_tp21b8dc80_173a_42b5_987f_ee0d741b60ee
>>   ctrl_chain = {{ctrl}};
>>   [~,chain_id] = cluster_save_chain(ctrl_chain);
Saving chain 1. Files/directories required:
/work/ollie/jpaden/Scratch/mdce_tmp/cluster-temp/chain_1_tp0258394f_2cde_4306_acef_b6c54a91ee4a /work/ollie/jpaden/Scratch/mdce_tmp/cluster-temp/batch_3_tp21b8dc80_173a_42b5_987f_ee0d741b60ee

Commands to load and run:
[ctrl_chain,chain_fn] = cluster_load_chain([],1);
ctrl_chain = cluster_run(ctrl_chain);

>>
>>   while true
    % These are the three lines of code that should be run to poll the job:
    fprintf('%s\nPoll the chain:\n',repmat('=',[1 40]));
    ctrl_chain = cluster_run(chain_id,false);
    cluster_save_chain(ctrl_chain,chain_id,false);

    % Check to see if any tasks are not complete and did not fail
    if ~any(isfinite(cluster_chain_stage(ctrl_chain)))
      % Done with all tasks (either completed or failed)
      break;
    end
    pause(10);
  end
========================================
Poll the chain:
Submitted 1 tasks in cluster job 3/1739636:
  1
Chain 1 not finished or failed (17-Mar-2018 16:16:21)
  Stage 1 not finished
========================================
Poll the chain:
 QJob 3:1/1739636 status changed to R (17-Mar-2018 16:16:31)
Chain 1 not finished or failed (17-Mar-2018 16:16:31)
  Stage 1 not finished
========================================
Poll the chain:
 QJob 3:1/1739636 status changed to R (17-Mar-2018 16:16:42)
Chain 1 succeeded (17-Mar-2018 16:16:42)
>>   [in,out] = cluster_print(ctrl_chain{1}{1}.batch_id,1,0);
>>   cluster_cleanup(ctrl_chain{1}{1}.batch_id);
  Deleting jobs in batch 3
  cluster_cleanup: Removing /work/ollie/jpaden/Scratch/mdce_tmp/cluster-temp/batch_3_tp21b8dc80_173a_42b5_987f_ee0d741b60ee

Step 1. Create a new batch

Batch status information is stored in a batch directory indicated by ctrl.cluster.data_location. The batch directory name is in the form "batch_1_UNIQUESTRING".

The function call to create a new batch is cluster_new_batch.

Step 2. Compile the code

Some of the cluster interfaces use the Matlab compiler. To compile the code for the cluster, run cluster_compile.

Note that the gRadar.cluster.hidden_depend_funs must be populated to ensure that the compiled code contains all the dependent functions. This is critical: your function will not work on the cluster if it or any of its dependencies are missing from the list that is passed to "mcc" during cluster_compile.m. gRadar.cluster.hidden_depend_funs is how these dependent functions are passed in.

When compiling, the matlab working directory should not be in a package directory (a directory starting with "+").

Step 3. Create the tasks

Tasks are created by calling cluster_new_task.m. Creation requires specifying a function handle for the task to run, the input arguments, the number of output arguments, a string containing notes about the task, the maximum cpu time in seconds for the task to run, the maximum number of bytes of memory required by the task, and a success condition. Note that package functions (Matlab functions residing in a directory starting with "+") can be called from a task, but the main task function that job_cluster.m calls cannot be a package function.

All the task information is stored in the files "BATCH_DIR/in/static.mat" and "BATCH_DIR/in/dynamic.mat". The files use the "-v6" Matlab format by default. The cluster.file_version parameter can be used to choose a different Matlab file version. "-v7.3" may be useful for interoperation with other languages because it is based on the the HDF5 format. A single struct variable is stored in static.mat called "static_param". A separate structure for each task is stored in dynamic.mat, called "dparam{TASK_ID}". The variable has a structure that contains all the fields that are different in param for each task and will override any common fields in static_param with the merge_structs command. When creating the tasks, you will need to separate static and dynamic parameters.

The tasks are run by a function called "cluster_job.m". cluster_job.m loads the input data created by cluster_new_task.m to determine which task to run and what its inputs are. cluster_job.m will load the static.mat file and the task specific structure in dynamic.mat and then merge these two structures param = merge_structs(sparam,dparam{TASK_ID}); followed by merging each of the individual input arguments argsin{idx}.


If you do not want to have to force_compile cluster_job.m every time or if you have functional dependency that is not obvious (e.g. passing in functions by string will not be detected by the compiler), then add your task to the gRadar.cluster.hidden_depend_funs list( found in your startup.m. Note that you would need to re-run startup.m after adding your task.)

Compiling and hidden_depend_funs

With a proper setup, changes to dependent function will automatically be detected and the cluster_job.m function will be recompiled automatically before it runs.

For this to happen, all "_task" functions called from cluster_job.m AND all hidden dependencies (e.g. functions that are passed in as function pointers or strings) need to be listed in the gRadar.sched.hidden_depend_funs list. This is set in startup.m. Each cell entry is a 2 element cell vector where the first entry is the filename string and the second entry is a date-check-level integer. Currently there are three levels of date checking:

  • 0: Do not check this file for changes (this is for Matlab functions that never change)
  • 1: Always check this file and its dependencies for changes (this is for cresis-toolbox functions that are called through function pointer passing so that Matlab will not be able to automatically detect dependency on the function)
  • 2: Only check this file for changes when the "fun" parameter is not specifically set when calling torque_compile (i.e. if a specific function is not specified during compile time, then this forces all functions to be date checked)

Technically every file could be set to level 1, but that would slow the date checking down a lot. It is better to specify 0 or 2 when it is appropriate.

Occasionally, you will need to force a recompile. If things are working properly, this should only need to be done when changes are made to the hidden dependency list and the function you want to call is going to use those new functions. Three examples are given below (note the time to execute each):

% Force a check on just the function to be called: csarp_task function
% and compile if there were changes (best)
>> tic; cluster_compile('csarp_task.m',[],0); toc
Elapsed time is 6.715058 seconds.

% Force a check on EVERY function and compile if there were changes
>> tic; cluster_compile([],[],0); toc
Elapsed time is 18.371072 seconds.

% Force a compile (skips the check and just runs the compile, necessary when new function dependencies are added)
>> cluster_compile
Start Compiling 07-Dec-2012 07:11:57

  mcc -m -d /users/paden/scripts/cresis-toolbox/cresis-toolbox/cluster -R '-singleCompThread,-nodisplay' -C /users/paden/scripts/cresis-toolbox/cluster/cluster_job.m create_records_accum2_task.m create_records_acords_task.m create_records_mcords_task.m create_records_mcrds_task.m basic_load_fmcw.m basic_load_fmcw2.m analysis_task.m radiometric_calibration_task.m get_heights_task.m csarp_task.m combine_wf_chan_task.m hanning.m blackman.m tukeywin.m tukeywin_trim.m chebwin.m kaiser.m boxcar.m array_proc_sv.m lever_arm_snow_2010_antarctica_DC8_ATM.m lever_arm_kuband_2010_antarctica_DC8_ATM.m lever_arm_snow_2011_antarctica_DC8_ATM.m lever_arm_kuband_2011_antarctica_DC8_ATM.m lever_arm_mcords2_2011_greenland_P3_ATM.m lever_arm_mcords2_2011_antarctica_TO.m lever_arm_mcords_2012_antarctica_DC8_ATM.m lever_arm_snow_2012_greenland_P3_ATM.m lever_arm_kuband_2012_greenland_P3_ATM.m lever_arm_kuband_2012_antarctica_DC8_ATM.m lever_arm_snow_2012_antarctica_DC8_ATM.m
Startup Script Running
  Resetting path
  Adding cresis path: /users/paden/scripts/cresis-toolbox/
  Adding personal path: /users/paden/scripts/matlab/
  Setting global preferences in global variable gRadar
Warning: Adding path
"/users/paden/scripts/cresis-toolbox/cresis-toolbox/cluster" to Compiler path
instance.

Done Compiling 07-Dec-2012 07:12:42
Elapsed time is 45 seconds.

ct_filename and gRadar paths

Note that all filepaths in the toolbox are supposed to be generated with ct_filename. Since the ct_filename functions rely on the global variable gRadar which is not available to your tasks when running on the cluster, it is important to pass the gRadar structure into your function. The standard way to do this is to use the parameter spreadsheet structure to pass these in. For example:

param; % This is the parameter spreadsheet structure
param.my_function; % This is a substructure with your function's parameters
global gRadar;
param = merge_structs(gRadar,param); % This will add any missing parameters (i.e. the file paths) from gRadar

Inside your task function, calls to ct_filename functions can just pass this "param" field in.

Step 4. Monitor jobs until completed

For all the batches that are to be run together, situate the control structures in a ctrl_chain cell array which describes the job chaining required.

ctrl_chain = cluster_run(ctrl_chain,mode);

To understand the ctrl_chain structure, consider this example of control structures:

{{ctrl1a,ctrl1b,ctrl1c},{ctrl2a,ctrl2b},{ctrl3a},{ctrl4a,ctrl4b}}

The following rules will be observed.

  1. Tasks will be grouped into jobs until the ctrl.cluster.max_time_per_job time limit is exceeded and then a new job will be created so that the time limit is never exceeded. If a single job exceeds the time limit, then an error will be generated. There is one exception to this; setting the time limit to zero will put one task in every job and ignore the time limit.
  2. Tasks from two different batches will never be combined into a single job.
  3. Each cell array on the first level will execute in parallel (e.g. tasks in ctrl1a and ctrl2b can be run simultaneously)
  4. Each sub-cell array will execute in the order specified (e.g. ctrl1a must complete before ctrl1b starts)
  5. The maximum number of jobs submitted which are not complete will never be more than ctrl.cluster.max_jobs_active
  6. Failed tasks (ones that do not meet the success condition) and whose error_flag meets the rerun flag mask, will be submitted up to ctrl.cluster.cluster_run_attempts times. These will be submitted with cluster_new_task and will use the same logic to distribute tasks across jobs.

To ensure that the rules are obeyed across all submissions, do not run more than one instance of torque_rerun for a particular cluster. There is however, no reason besides this to not run multiple instances of torque_rerun so as long as the rules in question do not matter in this regard, multiple instances are supported. If you need to stop and add or remove batches use cluster_stop and then run cluster_run again with the modified control chain. cluster_run will pickup from where it left off.

The mode input in a string that specifies 'block' and 'nonblock'. Blocking will sit and monitor/submit jobs in batches until all batches are complete. Nonblocking will update the status information for each task, submit new jobs if there is room in the cluster to do so, and then exit.

Step 5. Cleanup batch

Once all outputs have been checked or if a batch should be aborted, the cleanup function can be run to delete the contents of the batch directory and to delete all jobs in the cluster.

cluster_cleanup(ctrl_chain);

Cluster Debug Tools

Query Commands
Function Description
cluster_get_chain_list Prints a list of all stored chain lists.
cluster_print_chain(CHAIN_ID) Prints information for the chain specified by integer CHAIN_ID.
cluster_print(BATCH_ID, TASK_ID) Prints information for the task specified by integers BATCH_ID and TASK_ID.
task_info = cluster_print(BATCH_ID, TASK_IDS, 2) Prints information in a table for all the tasks listed (e.g. TASK_IDS could be 1:10000 to print all tasks for a batch). print_struct(task_info,1) can be used to copy and paste results into Excel.
Manipulate Chains
Function Description
cluster_set_chain(ctrl_chain,parameter_string,value) Sets variables in every batch in a chain list. For example: cluster_set_chain(ctrl_chain,'cluster.type','debug');. Note, that if cluster.type is changed to a cluster mode such as torque or slurm, then cluster_compile may need to be run manually to ensure the most recent code changes are compiled.
cluster_save_dparam Sets dynamic parameters
cluster_save_sparam Sets static parameters
Execute and Debug Commands
Function Description
cluster_hold Pauses or unpauses all chains (or specific chains/batches if specified).
cluster_reset Resets the state of unfinished jobs so that they can be run again. Typical usage is after cluster_run has completed, but some tasks failed. Then you could run ctrl_chain = cluster_reset(ctrl_chain); followed by ctrl_chain = cluster_run(ctrl_chain) to try to finish the remaining jobs.
cluster_stop Cancels submitted jobs in all chains (or specific chains/batches if specified).
cluster_cleanup Cancels submitted jobs AND removes files in all chains (or specific chains/batches if specified).
cluster_exec_task(BATCH_ID,TASK_IDS) Runs the specified tasks locally. Useful for debugging tasks that have failed.

Issues and Solutions

  • CPU time or WALL time exceeded: Increase the CPU time request multiplication factor by running ctrl_chain = cluster_set_chain(ctrl_chain,'cluster.cpu_time_mult',2);. If the control chain ran out of retries, you will need to also reset the control chain before running it again: ctrl_chain = cluster_reset(ctrl_chain);. Then rerun the job chain: ctrl_chain = cluster_run(ctrl_chain);. You can also run the job locally and monitor the memory usage with a script. The toolbox has a bash/linux script for doing this called bash/watch_cpu_mem.sh.
  • The "_task" function cannot be inside a package, but all other functions (including functions called by the _task function) may be in a package. (A package is a namespace device in Matlab where all package functions are put into a directory starting with a "+".)
  • The Matlab compiler should not be run from a package directory. For example, you may not be inside the package "+tomo" and run the matlab compiler. Function dependencies can end up incorrect in the compiled file if the compiler is run in a package directory. torque_compile tries to check for this condition to prevent it from happening.
  • Had to "rm -rf ~/.matlab/mcr*" to fix a directory permission problem in this directory.
  • To force a recompile (necessary if you change matlab versions or add hidden dependencies to startup since this cannot automatically be detected): run cluster_compile.m with no input arguments
  • Occasionally a file permission problem will occur that is temporary (i.e. just typing dbcont at the cluster_run.m keyboard prompt will work). An example of this type of error:
    Use cluster_print(batch_id,ctrl_id) to print out stdout from the failed job. The example of failure in stdout file:
    stdin: is not a tty
    terminate called after throwing an instance of 'dsFileBasedLockError'
      what():  File system error occurred: Exclusive create failed on /users/paden/.matlab/mcr_v716/.deploy_lock.1 (reason: Stale NFS file handle)
    ./run_worker_task.sh: line 36:  4052 Aborted                 (core dumped) ./worker_task
    

    Potentially, verify .deploy_lock directory is gone before running dbcont to resubmit the job(s) that failed:

    ls -la /users/paden/.matlab/mcr_v716/
    

Cluster Functions

cluster_chain_stage

active_stage = cluster_chain_stage (ctrl_chain)

Returns the active stage for each chain in ctrl_chain.

cluster_cleanup

ctrl = cluster_cleanup(...)

Cleans up (deletes temporary files and jobs) a batch, list of batches, or a batch chain.


cluster_compile

cluster_compile(...)

Compiles the cluster job function cluster_job.m.


cluster_error_mask

cluster_error_mask(error_mask)

If error_mask is passed in, interprets the error mask and prints out information about each error that occurred. If no input arguments, this function sets all the error mask variables in the caller's workspace.

cluster_exec_task

cluster_exec_task(ctrl,task_ids,run_mode)

Executes task(s) in the current matlab session. This is useful for debugging or sometimes to avoid long queue wait times if only a few jobs have failed and need to rerun. For example, to run task TASK_ID from batch BATCH_ID:

cluster_exec_task(BATCH_ID, TASK_ID)

For debugging purposes, there are 3 run modes (default is run_mode == 1) for testing the compiler and job script:

  1. Run task through uncompiled cluster_job.m function (default)
  2. Run task through compiled cluster_job.m function
  3. Run task through cluster_job.sh function

cluster_get_batch

ctrl = cluster_get_batch(ctrl,force_check,update_mode)

Gets a batch and populates a ctrl structure for that batch.

Update_mode should be set to 0 if this is not the process running the batch because it can cause errors in the status files if two processes are accessing them.

cluster_get_batch_list

ctrls = cluster_get_batch_list(param)

Get a list of all batches. Leave param undefined to use gRadar.cluster.data_location to search for batches.

cluster_get_chain_list

cluster_get_chain_list (param)

Prints a list of all chains. Leave param undefined to use gRadar.cluster.data_location to search for chains.

cluster_hold

ctrls = cluster_hold(batch_id,hold_state)

Sends a hold command to cluster_run and causes it to enter debug mode.


cluster_job

cluster_job

Matlab function that runs the job on the cluster. This is the function which is compiled and calls the desired task functions.


cluster_job.sh

cluster_job.sh

Bash script which is used by torque and slurm cluster types. This is the job which is called by torque or slurm. This bash script calls cluster_job.m (which when compiled creates run_cluster_job.sh).

cluster_load_chain

ctrl = cluster_load_chain([],chain_id)

Loads a batch chain from a chain file. chain_id is a unique positive integer.



cluster_new_batch

cluster_new_batch

Creates a new batch.

This creates a new batch directory and support files in the batch directory. The default parameters are loaded from gRadar.cluster.


cluster_new_task

ctrl = cluster_new_task(ctrl,sparam,dparam,varargin)

Create a new task in the batch specified by ctrl. The dparam_save option can be set to zero to speed up task creation if many tasks will be created. If this option is used, after all the tasks are created, the cluster_save_dparam function needs to be called or the dynamic parameters will not be saved to disk (the tasks will therefore not have access to the dynamic parameters when they run).

Input arguments:

Name Description
ctrl batch structure array (may not be a batch ID)
sparam static param structure (only has an effect for the first task that is created for the batch)
dparam dynamic param structure that will be merged with the sparam structure when the task is run
varargin name-value pairs (e.g. 'dparam_save',0 could be passed in as the 4th and 5th arguments to turn on dynamic parameter saving during task creation)

sparam and dparam each have the same fields. Each field only needs to be set in one or the other structure. For example, one could create the task like this:

sparam.task_function = @qlook;
sparam.num_args_out = 1;
dparam.cpu_time = 100;
dparam.mem = 100;
dparam.notes = 'Test';
param = struct('qlook',struct(some_static_field,3));
sparam.argsin{1} = param;
dparam.argsin{1}.qlook.some_dynamic_field = 2;

Note that sparam is only written out for the very first task. Any changes to sparam in subsequent tasks will have no effect.

Field Name Description
task_function function handle of job, this function handle tells cluster_job.m what to run
argsin cell vector of input arguments (default is {})
num_args_out number of output arguments to expect (default is 0)
notes optional note to print after successful submission of job default is '' (nothing is written out)
cpu_time maximum cpu time of this task in seconds
mem maximum memory usage of this task in bytes (default is 0)
file_success cell array of files that must exist to determine the task a success. If the file is a .mat file, then it must have the file_version variable and not be marked for deletion.

sparam and dparam will be merged when the task runs. The merging works across cell arrays too. Therefore, setting:

sparam.argsin{1}.my_task.sfield = 1;
dparam.argsin{1}.my_task.dfield= 2;

will cause the first argument to the task, (argsin{1}), to be a structure array with the field "my_task" which is itself a structure array with two fields: sfield and dfield. For example, if your task is a function:

function success = my_task(param)

then inside your function you will have these fields:

param.my_task.sfield = 1;
param.my_task.dfield = 2;

cluster_print

cluster_print(ctrl,ids,print_mode,ids_type)

Prints or gets information about a particular task or set of tasks in a batch specified by ctrl. The type of ID defaults to a task ID (ids_type equal to 0). If ids_type is set to 1, then it looks up by the torque job ID.

cluster_print(ctrl,task_id,1)
Prints full information about one task.

[in,out] = cluster_print(ctrl,task_id,0)
Returns input and output information for one task.

cluster_print(ctrl,task_ids,2)
Print tables and returns a struct with these fields for each task ID specified:

field description
task_id The task ID is the index into the task. It can be from 1 to N where N is the number of tasks in the batch.
job_id The ID of the job. This is the ID used by the cluster interface.
job_status Job status
error_flag Job error state
cpu_time_req CPU time requested in minutes
cpu_time CPU time currently used in minutes
memory_req Memory requested in MB
memory Memory currently being used in MB
schedule Scheduled time to run as date string.

cluster_print_chain

cluster_print_chain(ctrl_chain)

Prints and gets job state, cpu, and memory information about all jobs in a control chain. cluster_print_chain uses the saved chain information unless a chain cell array is passed in. This occasionally causes erroneous reporting. For example, if the chain is saved in the debug state, but is actually running on the cluster, the running and queued task totals will be zero and these jobs will all be marked as pending.

Number of tasks: 771, 198/315/255/3 C/R/Q/T, 0 error, 0 retries

The task output information stands for: C/R/Q/T = Complete/Running/Queued/Pending.

cluster_reset

ctrl_chain = cluster_reset(ctrl_chain)

Resets fields in a control chain so that the control chain may be run again with cluster_run. Usually this is done after some tasks failed in the control chain and another attempt is being made to run those tasks.

cluster_run

ctrl_chain = cluster_run(ctrl_chain, cluster_run_mode)

Runs a list of chains (each chain is a list of batches that must be run in serial) or a batch. This function has blocking/polling and non-blocking modes. If the ctrl_chain has been run before, but the ctrl_chain cell array does not have the most recent state information in it, use cluster_run_mode 2 (non-blocking) or 3 (blocking) to tell cluster_run to first query the state of every batch before running it. This query is very slow for large batches and so it is much more efficient to make sure you keep track of the ctrl_chain with all the state information intact. If you have done this, then use cluster_run_mode 0 (non-blocking) or 1 (blocking) to skip the query step.

A typical example of where cluster_run_mode is required would be if cluster_run crashes and you need to start the process over. Then you would load the latest version of the chain from disk that you have stored and run with cluster_run_mode 2 or 3.

Typically, if you are just polling the status of your jobs rather than blocking until they are all done, you would run:

ctrl_chain = cluster_load_chain(CHAIN_NUMBER); % CHAIN_NUMBER should be your most recent saved version of this chain
ctrl_chain = cluster_run(ctrl_chain,0);
cluster_save_chain(ctrl_chain); % Note the CHAIN_NUMBER since this will be what you load the next time you poll.

Then you would repeat these commands each time you want to poll the status of the job chains.

cluster_save_chain

[chain_fn,chain_id] = cluster_save_chain(ctrl_chain)

Saves a batch chain to a chain file. chain_id is a unique positive integer. This function also prints out all the batch files and directories that are required to run the chain. This is useful if the batch files need to be generated on one computer system and then moved to another computer system to be run.


cluster_save_dparam

ctrl = cluster_save_dparam(ctrl);

Save the dparam structure to the dynamic inputs file. This is needed with the dparam_save option is set to 0 when cluster_new_task is called.


cluster_set_chain

ctrl_chain = cluster_set_chain(ctrl_chain,varargin)

Sets a parameter in every batch in a control chain. Note that some parameters, notably cluster.type, require additional manual functions to be run if changed with cluster_set_chain. If cluster.type is set to matlab, then the "ctrl.cluster.jm = parcluster" may need to be run manually for each batch. If the cluster.type is set to torque or slurm, then cluster_compile may need to be run manually if there have been code changes since the last compile.

For example, to change the type to debug mode:

ctrl_chain = cluster_set_chain(ctrl_chain,'cluster.type','debug')

cluster_stop

cluster_stop(ctrl_chain,mode)

Stops/cancels/deletes jobs that are running in a cluster for the batches identified in ctrl_chain. The batches are also put on hold. To specify a batch or batches, the second argument can be set to 'batch' (default is 'chain'). Only useful for 'torque', 'slurm', and 'matlab' modes of operation.

cluster_submit_batch

ctrl = cluster_submit_batch(fun,block,argsin,num_argsout,cpu_time)

Convenient function which creates a batch, adds a task to it, runs the batch, and then cleans up the batch. This can be used as a simple example of how to use the cluster interface.

cluster_submit_job

ctrl = cluster_submit_job(ctrl,job_tasks,job_cpu,job_mem)

Submits a job to the cluster. For the debug type cluster, this function also executes the tasks in the job.

cluster_update_batch

ctrl = cluster_task_status(ctrl,force_success_check)

Updates the status of the batch

Returns the status of all tasks. First, the usual cluster specific status methods are used (e.g. qstat on torque). If force_success_check is set to true, then the success condition is also checked to see what the status of each job is. The is useful when:

  1. A job has been lost by the cluster.
  2. Another Matlab session is accessing the state of a batch and the cluster specific status methods may be incomplete because completed jobs that have left the cluster job cache will not be reported.

If a job is completed, the error flags for the tasks executed in that job will be updated in the ctrl structure depending on the success condition and the contents of the task's out file.


cluster_update_task

ctrl = cluster_update_task (ctrl,task_id)

Updates the status of the task

A tasks "error_mask" uses a binary mask to separate different types of errors. Matlab commands bitor, bitand, or dec2bin are useful for interpreting this mask.

Task error_mask
Error bit Error decimal Description
b0000 0000 0000 0001 1 Output file out_TASKID does not exist
b0000 0000 0000 0010 2 Output file out_TASKID does not load properly
b0000 0000 0000 0100 4 argsout variable does not exist in output file out_TASKID
b0000 0000 0000 1000 8 argsout variable has wrong length
b0000 0000 0001 0000 16 errorstruct variable does not exist in output file out_TASKID
b0000 0000 0010 0000 32 errorstruct variable contains error
b0000 0000 0100 0000 64 success criteria error
b0000 0000 1000 0000 128 cluster killed error (torque only)
b0000 0001 0000 0000 256 wall time exceeded (torque only). This error may be fixed by adjusting the scripts which estimate the cpu time usage, or by increasing ctrl.cluster.cpu_time_mult.
b0000 0010 0000 0000 512 Task success criteria failed to evaluate
b0000 0100 0000 0000 1024 Output files that are used to check task success do not exist.
b0000 1000 0000 0000 2048 Output files that are used to check task success are corrupt and cannot be read.
b0001 0000 0000 0000 4096 Maximum memory exceeded error (job used more memory than was requested). This error can be fixed by adjusting the scripts which estimate the memory usage, increasing ctrl.cluster.mem_mult, or by setting ctrl.cluster.max_mem_mode to 'auto'.
b0010 0000 0000 0000 8192 Insufficient space for matlab compiler runtime (MCR) cache so job could not be started. The primary and secondar cache are set by environment variables MCR_CACHE_ROOT and MCR_CACHE_ROOT2.

Cluster Settings

User settings used by cluster programs:

Cluster Setting Default Description
cluster.cluster_job_fn fullfile(gRadar.path,'cluster','cluster_job.sh') Sets the location of the cluster_job.sh bash program.
cluster.cpu_time_mult 1 This is a multiplication factor that will be applied to the supplied cpu time required for a task.
cluster.data_location gRadar.cluster.data_location Sets the location of the batch directories.
cluster.dbstop_if_error true Turns on "dbstop if error" when running tasks and using cluster.type debug.
cluster.desired_time_per_job 0 Sets the desired time per job. This controls how tasks will be divided into jobs. Setting to zero means one task per job.
cluster.file_check_pause 4 Sets the number of seconds to wait for the output file when it is expected.
cluster.file_version '-v7' Sets the version of the static (static.mat) and dynamic (dynamic.mat) input files and of the output files (out_TASKID.mat) used by the cluster. NOTE: -v6 does not support unicode characters. -v7 supports unicode characters. -v7.3 is HDF, but is generally not as efficient for this task.
cluster.force_compile false If true, cluster_compile will always recompile even if no files have changed.
cluster.hidden_depend_funs {} Cell array of hidden dependency functions needed by cluster_compile.
cluster.job_complete_pause {} For Matlab compiled jobs, this is the pause in seconds after the job completes. Useful on large file systems/clusters that may have long delay times for output files to show up to all computers.
cluster.matlab_mcr_path matlabroot Sets the location of the Matlab Compile Runtime library installation. If you have Matlab installed, this is just your Matlab installation directory. Only used for torque and slurm.
cluster.max_jobs_active 1 Sets the maximum number of active (queued or running) jobs.
cluster.max_mem_mode 'debug' String that controls how cluster_update_task.m will behave when the max memory is exceeded. Options are 'debug' which drops the session into debug mode when memory is exceeded and 'auto' which doubles the memory request and resubmits the job.
cluster.max_retries 1 Sets the maximum number of retries per task before it gives up running that task.
cluster.max_time_per_job 86400 Sets the maximum time per job in seconds.
cluster.mcc 'system' If set to 'system' it runs mcc from the command line using the system() function. This is preferable because the Matlab Compiler license gets released when finished. However, mcc from the command line may not work. If so, set to 'eval' and mcc will be run from within Matlab. The downside is that the license will not be released until this Matlab session ends.
cluster.mcr_cache_root '/tmp' Sets the location of the Matlab Compile Runtime temporary folder. Only used for torque and slurm.
cluster.mem_mult 1 This is a multiplication factor that will be applied to the supplied memory required for a task.
cluster.mem_to_ppn [] If not empty, this causes memory requirements to be converted into processor requirements. This is useful when Torque ignores the memory requirement. For example with 46 processors per node and 120e9 bytes of memory per node available to the cluster, one would set cluster.mem_to_ppn = 120e9/46. The max_ppn parameter must be set as well.
cluster.max_ppn [] To be used when mem_to_ppn is set. This should usually be set equal to the number of cores on a processor. The valid range is 1 to the number of cores on a processor. Since there is no way with torque to ensure that a task requesting multiple nodes will have all the nodes on one machine, we are restricted to requesting one node. We cannot request more cores than this node has or the job will never run.
cluster.qsub_submit_arguments '-m n -l nodes=1:ppn=%p,pmem=%m,walltime=%t' Submit argument string for torque. Note memory and time requirements are inserted with regexprep inside cluster_submit_job.m.
cluster.ssh_hostname Optional cluster hostname. If empty or undefined, then cluster commands are run on the local machine. If specified and not empty, cluster commands will be submitted using "ssh -p %d -o LogLevel=QUIET -t %s@%s "COMMAND" where the port=cluster.ssh_port, username=cluster.ssh_username and hostname=cluster.ssh_hostname. This functionality requires that the user's login does not print any text to the screen because this will confuse the cluster software's interpretation of the output (i.e. this remote server implementation is not robust). Commands that print to the screen should have " &>/dev/null" and/or " >/dev/null" added to the end of each of them. Occasionally, the "ssh" command hangs which causes the matlab process to hang. Run "ps -ef | grep USERNAME | grep ssh" several times over the course of one minute from the terminal and if you see the same command show up in the list every time, then the command has probably hung and should be killed. To kill the process, run "kill PROCESS_ID". For example, "jpaden 9814 47277 0 01:00 pts/21 00:00:00 ssh -p 22 -o LogLevel=QUIET -t jpaden@karst.uits.iu.edu qstat -u jpaden </dev/null" is killed by "kill 9814".

Also, no password should be required, so it is likely that a key should be setup. Reminder steps:

  1. ssh-keygen -t rsa
  2. ssh username@hostname mkdir -p .ssh
  3. cat .ssh/id_rsa.pub | ssh username@hostname 'cat >> .ssh/authorized_keys'
cluster.ssh_port 22 If ssh_hostname is specified, this ssh_port will be used.
cluster.ssh_username whoami If ssh_hostname is specified, this ssh_username will be used. The default username is the response from the whoami command on the local machine.
cluster.slurm_submit_arguments '-N 1 -n 1 --mem=%m --time=%t' Submit argument string for slurm. Note memory and time requirements are inserted with regexprep inside cluster_submit_job.m. QOS and partition requests can be added here. For example "-N 1 -n 1 --mem=%m --time=%t -p fat --qos=short" would submit to the "fat" partition and set the quality of service to "short".
cluster.stat_pause 1 Sets the number of seconds to pause between each status check.
cluster.stop_on_error 1 If a Matlab error occurs in the execution of a task, then the submission script goes into debug mode in cluster_update_task.m.
cluster.submit_pause 0 Sets the number of seconds to pause between each submission.
cluster.type 'debug' Sets the cluster type to run jobs (torque, matlab, slurm, debug).

User settings used by user programs:

Cluster Setting Default Description
cluster.rerun_only false This is a property used by the user. Typically it means a task will not be created if its output already exists.

Settings that should not be modified by user. "BATCH" in the filenames below represents the directory name where the batch files are stored. BATCH is formatted like this "batch_BB_tp20f818ed_5a79_4002_b41a_bf91ab030392" where the "BB" represents that batch number (positive integer starting at 1) and "tp20f818ed_5a79_4002_b41a_bf91ab030392" represents a random string generated by the Matlab tempname.m command.

Cluster Setting Default Description
batch_dir ctrl.cluster.data_location Directory where batch files are stored.
job_id_fn BATCH/job_id_file File with all the job IDs for each task. One line per task. 20 characters. -1 means the task is waiting to be submitted.
batch_id Lowest positive ID that is not used An integer containing the batch ID for this batch.
in_fn_dir BATCH/in Inputs for each task: static.mat contains inputs that do not change, dynamic.mat contains inputs that do change in variables called dparam_TASKID.
out_fn_dir BATCH/out Outputs and errors for each task: out_TASKID.mat file
stdout_fn_dir BATCH/stdout Standard output for each task (slurm and torque only). Stored in stdout_TASKID.txt files and only one task (the one with the maximum task ID) per job will have the file.
error_fn_dir BATCH/error Standard error for each task (slurm and torque only). Stored in error_TASKID.txt files and only one task (the one with the maximum task ID) per job will have the file.
hold_fn BATCH/hold_fn Empty file that if present causes cluster_run to enter debug mode. Placed by cluster_hold.m
job_id_list NA Cached contents of job_id_fn
task_id NA Last task ID used.
submission_queue NA Vector representing queue of task IDs waiting to be submitted. Index 1 is front of queue.
job_status NA String containing the job status for each task. 'T': waiting to be submitted, 'Q/R': active, 'C': complete.
error_mask NA Vector containing the error status for each task. 0 is no error.
retries NA Vector containing the number of retries used for each task
active_jobs NA Number of active jobs.
notes NA Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "notes" for task_id 1).
cpu_time NA Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a vector. Each element contains the value of this field for the corresponding task (e.g. the first element contains the "cpu_time" for task_id 1).
mem NA Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a vector. Each element contains the value of this field for the corresponding task (e.g. the first element contains the "mem" for task_id 1).
success NA Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "success" for task_id 1).
file_success NA Cache of the same field stored in sparam/dparam. This field is not always available. The values are stored in a cell array. Each cell contains the value of this field for the corresponding task (e.g. the first cell contains the "file_success" for task_id 1).

The settings for each task are stored in the "in" directory within each batch directory. These are stored in the static.mat and dynamic.mat files in the "in" directory. The parameter structure for a given task is the result of calling merge_structs.m on static_param (stored in static.mat) and dparam{task_id} (stored in dynamic.mat). The combined result should contain all the input parameters for the task.

NOTE: static fields that do not change for each task should be stored in "static_param" to save space on disk. It is the job of the script that is creating the tasks to determine where to store each parameter (in sparam or in dparam).

Input Files static.mat and dynamic.mat

The input files (in/static.mat and in/dynamic.mat) are created by cluster_new_task. Their contents are set by the sparam and dparam input arguments to cluster_new_task. Usually, the function call looks like this:

ctrl = cluster_new_task(ctrl,sparam,dparam,'dparam_save',0); % Creating multiple tasks in a batch
ctrl = cluster_new_task(ctrl,sparam,[]); % Creating a single task in a batch

The sparam input argument maps to the in/static.mat "static_param" variable.

The dparam input argument for task TASK_ID maps to the in/dynamic.mat "dparam{TASK_ID}" variable.

The contents of the in/static.mat and in/dynamic.mat files are very similar and will be merged when each task executes. The fields in the in/static.mat input file are:

Cluster Setting Default Description
static_param NA Structure with static parameters. These are parameters that do not change from task to task within one batch.
static_param.cpu_time 0 The CPU time in seconds to be requested for this task.
static_param.file_success {} Used by cluster_update_task.m to determine if a task has completed successfully. Each task's file_success field is a cell array of files that must contain a file_version field without a 'D' in the file_version to be considered successfully created. If all of the files in the cell array list exist and meet this condition, then the task is considered to have ran successfully.
static_param.file_version ctrl.cluster.file_version This field is set to the value stored in ctrl.cluster.file_version when cluster_new_task is called. The version argument to the save function in Matlab. There is no default. It should be set to '-v7.3' and is generally set from cluster.file_version in cluster_new_task.m.
static_param.mem 0 The memory in bytes to be requested for this task.
static_param.notes '' The notes field is used for debugging and is printed out by cluster_print and other cluster functions to provide information to the operator about the tasks.
static_param.num_args_out 0 The number of output arguments to expect for this task.
static_param.success '' Success criteria evaluation string. Used by cluster_update_task.m to determine if task was successful. If the command in the string evaluates with a logical true, then the task is successful.
static_param.taskfunction NA String containing the function to be run

The fields in the in/dynamic.mat are:

Cluster Setting Description
dparam Cell array. Each cell contains a structure with the dynamic parameters for the corresponding task. The fields inside each cell array are the same as for the in/static.mat file's static_param structure.
dparam{TASKID} Structure with fields the same as in/static.mat file's static_param structure. These fields do not need to be defined in in/dynamic.mat or in/static.mat, but they must be defined in one or the other place. If defined in both files, the dparam{TASKID} will override the static parameters.
dparam{TASKID}.file_success See in/static.mat description.
dparam{TASKID}.file_version See in/static.mat description.
dparam{TASKID}.mem See in/static.mat description.
dparam{TASKID}.notes See in/static.mat description.
dparam{TASKID}.num_args_out See in/static.mat description.
dparam{TASKID}.success See in/static.mat description.
dparam{TASKID}.taskfunction See in/static.mat description.

Output Files out_TASKID.mat

The output files (in/out_TASKID.mat and in/dynamic.mat) are created by cluster_job.m in regular operation and cluster_exec_task.m in the debug run_mode 1. The fields in the out/out_TASKID.mat output files are:

Cluster Setting Description
argsout Cell array containing all of the output arguments of the task function. The number of output arguments must be specified in the input files' num_args_out field.
cpu_time_actual Double scalar containing the number of seconds it took to execute the task.
errorstruct If no error occurs, this field will be empty. If an error is thrown, the Matlab exception will be contained in this field.

Torque/PBS

Common Commands

Always available

qstat
List all jobs
qstat -q -u USERNAME and qstat -r -u USERNAME
List all queued or running jobs owned by USERNAME
qdel -all
Deletes all of your jobs
pbsnodes -l
Prints out node reservations

Only available on PBS/torque

showstart JOBID
Shows scheduled time for job.
checkjob JOBID
Prints out complete information about the job including scheduled time

Other Commands

qalter
Change the hold state of a job
qdel JOBID
Deletes a specific job
qsub
Submit a job

Interactive

Interactive mode lets you login to the cluster node with the environment setup in the same way it is when the job actually runs. This is helpful for debugging problems that have to do with running your code on the cluster nodes themselves.

To run interactively, set cluster.interactive to true and then run your batch like you normally would. For each job, the torque_submit_job function will print out the steps to run the job interactively. It is not critical, but to keep the torque submission routines in sync you need to assign the new_job_id as one of these steps. Look for the output from qsub like this:

qsub: waiting for job 2466505.m2 to start
  ... INTERACTIVE SESSION ...
qsub: job 2466505.m2 completed

In this case, run "new_job_id = 2466505" in matlab. m2 is the queue and is not part of the job id.

When debugging at IU, it is a good idea to add '-q debug' (short jobs) or '-q interactive' (long jobs) to your sched.submit_arguments because your job will get processed much faster than the default queue.

Slurm

Common Commands

scontrol show job JOBID
List complete information for JOBID
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j JOBID --allsteps
List statistics about JOBID
squeue --job JOBID
Show status information for JOBID

Other Commands

sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed
List information about all jobs

sbatch
Submit a job to the cluster to be run in the background

scancel JOBID
Cancel or delete JOBID

scontrol show jobid -dd <jobid>
List status info for a currently running job

squeue -u <username>
List all jobs

srun
Run a job on the cluster in interactive mode. For example: srun -N 1 -n 1 --mem=20000 --time=480 --pty bash

sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
List detailed information about a job

On ollie at AWI some sacct commands do not run, but these commands can be used:

sudo get_my_jobs.sh
to get information about todays jobs
sudo get_my_jobs.sh -t
to get information about running jobs

Matlab Compiler

The Matlab compiler is only used with the Torque and Slurm cluster interfaces. Also, it is only required when the code changes. The compiled code is cross platform (i.e. it is possible to compile on a Windows computer and then run the compiled code on a Linux computer). Once cluster_job.m is compiled, it can be used without requiring any Matlab license. For the compiler to work, the Matlab Compiler Runtime libraries (freely distributable) must be installed. If Matlab is installed on a machine, there is a copy of the MCR files (for that version of Matlab) already available and the Matlab installation directory can be used for the MCR path as long as the same version of Matlab was used to compile the files. Inside cluster_new_batch, the matlabroot command is used to set the ctrl.cluster.matlab_mcr_path.

If Matlab is not installed on the cluster or the compiled version is different than the Matlab version installed, then the gRadar.cluster.matlab_mcr_path should be set to the correct MCR path. Instructions to install the MCR libraries are available on Matlab's website (Matlab Runtime).

Currently, Matlab is required to submit and track jobs using Slurm and Torque even if the compiler is not needed. A feature needs to be added to support secure shell commands that would allow submission on cluster interfaces (i.e. head nodes) without a Matlab license. The task generation commands should also be compiled so that the dependent files (frames, records, gps, raw data, layer data, etc.) only need to be accessible from the cluster. A useful setup to do this would be to write a generic single job submission function that can run arbitrary functions through ssh; it would 1. compile the function, 2. copy the compiled function, inputs and outputs via scp, and 3. call the compiled function via ssh. This would be useful if a user has a personal Matlab license, but the cluster interface does not have a license and the personal computer does not have direct access to the cluster's file system. The Matlab Runtime/MCR libraries would still need to be installed on the cluster to run the compiled code.

Personal tools
Namespaces

Variants
Actions
Navigation
Tools