| perf-stat(1) |
| ============ |
| |
| NAME |
| ---- |
| perf-stat - Run a command and gather performance counter statistics |
| |
| SYNOPSIS |
| -------- |
| [verse] |
| 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command> |
| 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>] |
| |
| DESCRIPTION |
| ----------- |
| This command runs a command and gathers performance counter statistics |
| from it. |
| |
| |
| OPTIONS |
| ------- |
| <command>...:: |
| Any command you can specify in a shell. |
| |
| |
| -e:: |
| --event=:: |
| Select the PMU event. Selection can be: |
| |
| - a symbolic event name (use 'perf list' to list all events) |
| |
| - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a |
| hexadecimal event descriptor. |
| |
| - a symbolically formed event like 'pmu/param1=0x3,param2/' where |
| param1 and param2 are defined as formats for the PMU in |
| /sys/bus/event_sources/devices/<pmu>/format/* |
| |
| - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' |
| where M, N, K are numbers (in decimal, hex, octal format). |
| Acceptable values for each of 'config', 'config1' and 'config2' |
| parameters are defined by corresponding entries in |
| /sys/bus/event_sources/devices/<pmu>/format/* |
| |
| -i:: |
| --no-inherit:: |
| child tasks do not inherit counters |
| -p:: |
| --pid=<pid>:: |
| stat events on existing process id (comma separated list) |
| |
| -t:: |
| --tid=<tid>:: |
| stat events on existing thread id (comma separated list) |
| |
| |
| -a:: |
| --all-cpus:: |
| system-wide collection from all CPUs |
| |
| -c:: |
| --scale:: |
| scale/normalize counter values |
| |
| -d:: |
| --detailed:: |
| print more detailed statistics, can be specified up to 3 times |
| |
| -d: detailed events, L1 and LLC data cache |
| -d -d: more detailed events, dTLB and iTLB events |
| -d -d -d: very detailed events, adding prefetch events |
| |
| -r:: |
| --repeat=<n>:: |
| repeat command and print average + stddev (max: 100). 0 means forever. |
| |
| -B:: |
| --big-num:: |
| print large numbers with thousands' separators according to locale |
| |
| -C:: |
| --cpu=:: |
| Count only on the list of CPUs provided. Multiple CPUs can be provided as a |
| comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. |
| In per-thread mode, this option is ignored. The -a option is still necessary |
| to activate system-wide monitoring. Default is to count on all CPUs. |
| |
| -A:: |
| --no-aggr:: |
| Do not aggregate counts across all monitored CPUs in system-wide mode (-a). |
| This option is only valid in system-wide mode. |
| |
| -n:: |
| --null:: |
| null run - don't start any counters |
| |
| -v:: |
| --verbose:: |
| be more verbose (show counter open errors, etc) |
| |
| -x SEP:: |
| --field-separator SEP:: |
| print counts using a CSV-style output to make it easy to import directly into |
| spreadsheets. Columns are separated by the string specified in SEP. |
| |
| -G name:: |
| --cgroup name:: |
| monitor only in the container (cgroup) called "name". This option is available only |
| in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to |
| container "name" are monitored when they run on the monitored CPUs. Multiple cgroups |
| can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup |
| to first event, second cgroup to second event and so on. It is possible to provide |
| an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have |
| corresponding events, i.e., they always refer to events defined earlier on the command |
| line. |
| |
| -o file:: |
| --output file:: |
| Print the output into the designated file. |
| |
| --append:: |
| Append to the output file designated with the -o option. Ignored if -o is not specified. |
| |
| --log-fd:: |
| |
| Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive |
| with it. --append may be used here. Examples: |
| 3>results perf stat --log-fd 3 -- $cmd |
| 3>>results perf stat --log-fd 3 --append -- $cmd |
| |
| --pre:: |
| --post:: |
| Pre and post measurement hooks, e.g.: |
| |
| perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage |
| |
| -I msecs:: |
| --interval-print msecs:: |
| Print count deltas every N milliseconds (minimum: 10ms) |
| The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. |
| example: 'perf stat -I 1000 -e cycles -a sleep 5' |
| |
| --per-socket:: |
| Aggregate counts per processor socket for system-wide mode measurements. This |
| is a useful mode to detect imbalance between sockets. To enable this mode, |
| use --per-socket in addition to -a. (system-wide). The output includes the |
| socket number and the number of online processors on that socket. This is |
| useful to gauge the amount of aggregation. |
| |
| --per-core:: |
| Aggregate counts per physical processor for system-wide mode measurements. This |
| is a useful mode to detect imbalance between physical cores. To enable this mode, |
| use --per-core in addition to -a. (system-wide). The output includes the |
| core number and the number of online logical processors on that physical processor. |
| |
| --per-thread:: |
| Aggregate counts per monitored threads, when monitoring threads (-t option) |
| or processes (-p option). |
| |
| -D msecs:: |
| --delay msecs:: |
| After starting the program, wait msecs before measuring. This is useful to |
| filter out the startup phase of the program, which is often very different. |
| |
| -T:: |
| --transaction:: |
| |
| Print statistics of transactional execution if supported. |
| |
| EXAMPLES |
| -------- |
| |
| $ perf stat -- make -j |
| |
| Performance counter stats for 'make -j': |
| |
| 8117.370256 task clock ticks # 11.281 CPU utilization factor |
| 678 context switches # 0.000 M/sec |
| 133 CPU migrations # 0.000 M/sec |
| 235724 pagefaults # 0.029 M/sec |
| 24821162526 CPU cycles # 3057.784 M/sec |
| 18687303457 instructions # 2302.138 M/sec |
| 172158895 cache references # 21.209 M/sec |
| 27075259 cache misses # 3.335 M/sec |
| |
| Wall-clock time elapsed: 719.554352 msecs |
| |
| SEE ALSO |
| -------- |
| linkperf:perf-top[1], linkperf:perf-list[1] |