stat-cl man page
stat-cl - invoke the Stack Trace Analysis Tool.
SYNOPSIS
stat-cl [ OPTIONS ] PID
stat-cl [ OPTIONS ] -C COMMAND…
where
[ OPTIONS ] represents zero or more stat-cl options.
PID is the PID of the parallel job launcher for the target application to attach to.
COMMAND… is the command to launch the application.
DESCRIPTION
STAT (the Stack Trace Analysis Tool) is a highly scalable, lightweight tool that gathers and merges stack traces from all of the processes of a parallel application. After running the stat-cl command, STAT will create a stat_results directory in your current working directory. This directory will contain a subdirectory, based on your parallel application’s executable name, with the merged stack traces in DOT format.
OPTIONS
-a, –autotopo
let STAT automatically create topology.
-f, –fanout width
Sets the maximum tree topology fanout to width. Specify nodes to launch communications processes on with –nodes.
-d, –depth depth
Sets the tree topology depth to depth. Specify nodes to launch communications processes on with –nodes.
-z, –daemonspernode num
Sets the number of daemons per node to num.
-u, –usertopology topology
Specify the number of communication nodes per layer in the tree topology, separated by dashes, with topology. Specify nodes to launch communications processes on with –nodes. Example topologies: 4, 4-16, 5-20-75.
-n, –nodes nodelist
Use the specified nodes in nodelist. To be used with –fanout, –depth, or –usertopology. Example nodes lists: host1; host1,host2; host[1,5-7,9].
-N, –nodesfile filename
Use the file filename, which should contain the list of nodes for communication processes
-A, –appnodes
Allow tool communication processes to be co-located on nodes running application processes.
-x, –exclusive
Do not use the front-end or back-end nodes for communication processes.
-p, –procs processes
Sets the maximum number of communication processes to be spawned per node to processes. This should typically be set to a number less than or equal to the number of CPU cores per node.
-j, –jobid id
Append id to the output directory and file prefixes. This is useful for associating STAT results with a batch job.
-r, –retries count
Attempt count retries per sample to try to get a complete stack trace.
-R, –retryfreq frequency
Wait frequency microseconds between sample retries. To be used with the –retries option.
-P, –withpc
Sample program counter values in addition to function names.
-m, –withmoduleoffset
Sample module offset only.
-i, –withline
Sample source line number in addition to function names.
-o, –withopenmp
Translate OpenMP stacks to logical application view
-c, –comprehensive
Gather 5 traces: function only; module offset; function + PC; function + line; and 3D function only.
-U, –countrep
Only gather edge labels with the task count and a single representative. This will improve performance at extreme (i.e., over 1 million tasks) scales.
-w, –withthreads
Sample stack traces from helper threads in addition to the main thread.
-H, –maxdaemonthreads count
Allow sampling of up to count threads per daemon.
-y, –withpython
Where applicable, gather Python script level stack traces, rather than show the Python interpreter stack traces. This requires the Python interpreter being debugged to be built with -g and preferrably -O0.
-t, –traces count
Gather count traces per process.
-T, –tracefreq frequency
Wait frequency milliseconds between samples. To be used with the –traces option.
-S, –sampleindividual
Save all individual samples in addition to the 3D trace when using –traces option.
-C, –create arg_list
Launch the application under STAT’s control. All arguments after -C are used to launch the app. Namely, arg_list is the command that you would normally use to launch your application.
-I, –serial arg_list
Attach to a list of serial processes. All arguments after -I are interpreted as processes. Namely, arg_list is a white-space-separated list of processes to attach to, where each process is of the form [exe@][hostname:]PID.
-D, –daemon path
Specify the full path path to the STAT daemon executable. Use this only if you wish to override the default.
-F, –filter path
Specify the full path path to the STAT filter shared object. Use this only if you wish to override the default.
-s, –sleep time
Sleep for time seconds before attaching and gathering traces. This gives the application time to get to a hung state.
-l, –log
Enable debug logging of the FE frontend, BE backend, CP communication process, SW Stackwalker, SWERR Stackwalker on error. Multiple log options may be specified (i.e., -l FE -l BE).
-L, –logdir log_directory
Dump logging output into log_directory. To be used with the –log option.
-M, –mrnetprintf
Use MRNet’s printf for STAT debug logging.
-X, –dysectapi session
Run the specified DySectAPI session.
-b, –dysectapi_batch secs
Run the specified DySectAPI in batch mode. Session stops after secs seconds or detach action.
-G, –gdb
Use (cuda-)gdb to drive the daemons. If you are using cuda-gdb and want stack traces from cuda threads, you must also explicitly specify -w.
-Q, –cudaquick
When using cuda-gdb as the BE, gather less comprehensive, but faster cuda traces. Cuda frames will only show the top of the stack, not the full call path. This also defaults to display filename and line number and will not resolve the function name.
EXAMPLE
The most typical usage is to invoke STAT on the job launcher’s PID:
  % srun mpi_application arg1 arg2 &
  [1] 16842
  
  % ps
    PID TTY          TIME CMD
  16755 pts/0    00:00:00 bash
  16842 pts/0    00:00:00 srun
  16871 pts/0    00:00:00 ps
  
  % stat-cl 16842
You can also launch your application under STAT’s control with the -C option. All arguments after -C are used for job launch:
  % stat-cl -C srun mpi_application arg1 arg2
With the -a option (or when automatic topology is set as default), STAT will try to automatically create a scalable topology for large scale jobs. However, if you wish you may manually specify a topology at larger scales. For example, if you’re running on 1024 nodes, you may want to try a fanout of sqrt(1024) = 32. You will need to specify a list of nodes that contains enough processors to accommodate the ceil(1024/32) = 32 communication processes being launched with the –nodes option. Be sure that you have login permissions to the specified nodes and that they contain the mrnet_commnode executable and the STAT_FilterDefinitions.so library.
  % stat-cl --fanout 32 --nodes atlas[1-4] --procs 8 16482
Upon successful completion, STAT will write its output to a stat_results directory within the current working directory. Each run creates a subdirectory named after the application with a unique integer ID. STAT’s output indicates the directory created with a message such as:
  Results written to /home/user/bin/stat_results/mpi_application.6
Within that directory will be one or more files with a .dot extension. These .dot files can be viewed with stat-view.
COPYRIGHT
Copyright 2007-2018 Lawrence Livermore National Laboratory
This is free software; see the source for copying conditions. There is NO warranty; not even for MECHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
SEE ALSO
stat-gui(1), stat-view(1)