Man page - pmie(1)
Packages contains this manual
- pmie(1)
- pmie2col(1)
- pcp-netstat(1)
- pmdanetcheck(1)
- pmdammv(1)
- dbpmda(1)
- pcp2openmetrics(1)
- pmlogdump(1)
- pcp-dmcache(1)
- pmdabonding(1)
- webpingvis(1)
- pmdaxfs(1)
- nfsvis(1)
- pmclient(1)
- pcp-ipcs(1)
- pcp-buddyinfo(1)
- pmdanvidia(1)
- pmdasimple(1)
- pmdanfsclient(1)
- pmdalibvirt(1)
- pcp-atoprc(5)
- osvis(1)
- pmdaamdgpu(1)
- pmdbg(1)
- pmdabind2(1)
- pcp.env(5)
- pmlogsummary(1)
- pmrepconf(1)
- pmstat(1)
- pcp-pidstat(1)
- pcp-collectl(1)
- pmdagpfs(1)
- pmstore(1)
- pmdalio(1)
- pmerr(1)
- pmdamailq(1)
- pmdahaproxy(1)
- pcp-python(1)
- pmlogger.control(5)
- mkaf(1)
- pcp2csv(1)
- pmdabcc(1)
- pmie_daily(1)
- pmlogredact(1)
- pmiostat(1)
- pmdaopenvswitch(1)
- pmsleep(1)
- pcp-iostat(1)
- pmdatrace(1)
- pmdafarm(1)
- pmdakvm(1)
- pcp-htop(5)
- perfevent.conf(5)
- pmnsdel(1)
- pmafm(1)
- pmdagpsd(1)
- pmdaweblog(1)
- pmdasamba(1)
- pmdamic(1)
- chkhelp(1)
- pmsearch(1)
- pmlogconf(1)
- pmdapostgresql(1)
- pmloglabel(1)
- pmlogsize(1)
- genload(1)
- pcp-numastat(1)
- pmlogcheck(1)
- pmiestatus(1)
- pcp-slabinfo(1)
- pmcd_wait(1)
- runaspcp(1)
- pcp-atop(1)
- indomcachectl(1)
- pmdagluster(1)
- pcp-free(1)
- dstat(1)
- pcp-mpstat(1)
- pmdatxmon(1)
- pmwebd(1)
- pcp-uptime(1)
- dbprobe(1)
- pmdaunbound(1)
- pmdasockets(1)
- pmdadenki(1)
- pmdanetbsd(1)
- pmlogmv(1)
- mpvis(1)
- pmdasolaris(1)
- pmfind(1)
- pmdawindows(1)
- pmlogextract(1)
- pmval(1)
- pmclient_fg(1)
- pmdads389(1)
- pcp-htop(1)
- pmlogbasename(1)
- pminfo(1)
- pmlogger_daily_report(1)
- pcp-lvmcache(1)
- pmdadbping(1)
- pmlogpaste(1)
- pmview(5)
- pmdaroot(1)
- pmdahacluster(1)
- pmdaperfevent(1)
- pmdate(1)
- pmdasendmail(1)
- pmcpp(1)
- pmdaoracle(1)
- pmdajson(1)
- txrecord(1)
- pcp-shping(1)
- pcp-ps(1)
- pmdadocker(1)
- pmdanews(1)
- pmevent(1)
- pmdamounts(1)
- pmdamysql(1)
- pmlogcompress(1)
- pmdabpftrace(1)
- pcp-check(1)
- pcp-verify(1)
- pmpause(1)
- pmfind_check(1)
- pmdaactivemq(1)
- pmlock(1)
- pmprobe(1)
- pmdanamed(1)
- pmnsmerge(1)
- dkvis(1)
- genpmda(1)
- pmview(1)
- pmdalustre(1)
- pmlogger(1)
- pmdaroomtemp(1)
- pcp-ss(1)
- labels.conf(5)
- pmdasummary(1)
- pmdasample(1)
- pmrep(1)
- pmdanetfilter(1)
- pcp-geolocate(1)
- pmdalogger(1)
- pmlogger_daily(1)
- pmtrace(1)
- pmdapipe(1)
- pmlogrewrite(1)
- pmdiff(1)
- pmdauwsgi(1)
- pcp.conf(5)
- pmpost(1)
- pmdajbd2(1)
- pmdarabbitmq(1)
- pmdapodman(1)
- pmdaapache(1)
- pmdabash(1)
- pmdumplog(1)
- pmdasmart(1)
- pmdagfs2(1)
- pmdadarwin(1)
- pmdaelasticsearch(1)
- pmpython(1)
- newhelp(1)
- pmdabpf(1)
- pmdaslurm(1)
- pmlogreduce(1)
- pcp-atopsar(1)
- pmnsadd(1)
- pcp-dstat(5)
- pmdazimbra(1)
- pmdasnmp(1)
- pcp-tapestat(1)
- pmdadm(1)
- pmieconf(5)
- pmdafreebsd(1)
- pmdatrivial(1)
- pmgetopt(1)
- pmconfig(1)
- pmdalinux(1)
- pmlogger_merge(1)
- weblogvis(1)
- pmdaresctrl(1)
- pmdashping(1)
- pmgenmap(1)
- pmdamssql(1)
- perfalloc(1)
- pmdanginx(1)
- pmdazfs(1)
- pmproxy(1)
- pcp-kube-pods(1)
- pmdazswap(1)
- pmdamemcache(1)
- pmrep.conf(5)
- pmdaproc(1)
- pmcd(1)
- pmdaoverhead(1)
- pmdaredis(1)
- pcp-meminfo(1)
- pmie_dump_stats(1)
- pmdaopenmetrics(1)
- pmdacisco(1)
- pmdaaix(1)
- pcp-summary(1)
- pmdalmsensors(1)
- pmlogger_rewrite(1)
- clustervis(1)
- pcp-reboot-init(1)
- pcp-dstat(1)
- pmdalustrecomm(1)
- pmhostname(1)
- pcp-zoneinfo(1)
- pcp(1)
- pmdarsyslog(1)
- pcp-vmstat(1)
- pmdacifs(1)
- pmieconf(1)
- pmmgr(1)
- pmie_check(1)
- pmjson(1)
- pmseries(1)
- pmdapostfix(1)
- pmdamongodb(1)
- telnet-probe(1)
- pmsocks(1)
- pmlogger_check(1)
- pmlc(1)
- pmlogctl(1)
- pmdapdns(1)
- pmdads389log(1)
- pmiectl(1)
- pcp-xsos(1)
- pmdanutcracker(1)
- webvis(1)
- find-filter(1)
- pmsignal(1)
apt-get install pcp
Manual
PMIE
NAMESYNOPSIS
DESCRIPTION
OPTIONS
EXAMPLES
QUICK START
EXPRESSION SYNTAX
BOOLEAN EXPRESSIONS
RULESETS
SCALE FACTORS
MACROS
AUTOMATIC RESTART
EVENT MONITORING
DIFFERENCES IN HOST AND ARCHIVE MODES
SIGNALS
HOSTNAME CHANGES
BUGS
FILES
PCP ENVIRONMENT
UNIX SEE ALSO
WINDOWS SEE ALSO
SEE ALSO
USER GUIDE
NAME
pmie - inference engine for performance metrics
SYNOPSIS
pmie [ -bCdeFfPqvVWxXz? ] [ -a archive ] [ -A align ] [ -c filename ] [ -h host ] [ -l logfile ] [ -m note ] [ -j stompfile ] [ -n pmnsfile ] [ -o format ] [ -O offset ] [ -S starttime ] [ -t interval ] [ -T endtime ] [ -U username ] [ -Z timezone ] [ filename ... ]
DESCRIPTION
pmie accepts a collection of arithmetic, logical, and rule expressions to be evaluated at specified frequencies. The base data for the expressions consists of performance metrics values delivered in real-time from any host running the Performance Metrics Collection Daemon (PMCD), or using historical data from Performance Co-Pilot (PCP) archives.
As well as computing arithmetic and logical values, pmie can execute actions (popup alarms, write system log messages, and launch programs) in response to specified conditions. Such actions are extremely useful in detecting, monitoring and correcting performance related problems.
The expressions to be evaluated are read from configuration files specified by one or more filename arguments. In the absence of any filename , expressions are read from standard input.
Output from
pmie
is directed to standard output and standard
error as follows:
stdout
Expression values printed in the verbose -v mode and the output of print actions.
stderr
Error and warning messages for any syntactic or semantic problems during expression parsing, and any semantic or performance metrics availability problems during expression evaluation.
OPTIONS
The available
command line options are:
-a
archive
,
--archive
=
archive
archive which is a comma-separated list of names, each of which may be the base name of an archive or the name of a directory containing one or more archives written by pmlogger (1). Multiple instances of the -a flag may appear on the command line to specify a list of sets of archives. In this case, it is required that only one set of archives be present for any one host. Also, any explicit host names occurring in a pmie expression must match the host name recorded in one of the archive labels. In the case of multiple sets of archives, timestamps recorded in the archives are used to ensure temporal consistency.
-A align , --align = align
Force the initial time window to be aligned on the boundary of a natural time unit align . Refer to PCPIntro (1) for a complete description of the syntax for align .
-b , --buffer
Output will be line buffered and standard output is attached to standard error. This is most useful for background execution in conjunction with the -l option. The -b option is always used for pmie instances launched from pmie_check (1).
-c config , --config = config
An alternative to specifying filename at the end of the command line.
-C , --check
Parse the configuration file(s) and exit before performing any evaluations. Any errors in the configuration file are reported.
-d , --interact
Normally pmie would be launched as a non-interactive process to monitor and manage the performance of one or more hosts. Given the -d flag however, execution is interactive and the user is presented with a menu of options. Interactive mode is useful mainly for debugging new expressions.
-e , --timestamp
When used with -V , -v or -W , this option forces timestamps to be reported with each expression. The timestamps are in ctime (3) format, enclosed in parenthesis and appear after the expression name and before the expression value, e.g.
|
expr_1 (Tue Feb 6 19:55:10 2001): 12 |
-f , --foreground
If the -l option is specified and there is no -a option (i.e. real-time monitoring) then pmie is run as a daemon in the background (in all other cases foreground is the default). The -f (and -F , see below) options force pmie to be run in the foreground, independent of any other options.
-F , --systemd
Like -f , the -F option runs pmie in the foreground, but also does some housekeeping (like create a pid file, change user id and notify systemd (1) when pmie has started or is shutting down). This is intended for use when pmie is launched from systemd (1) and the daemonising has already been done. The -f and -F options are mutually exclusive.
-h host , --host = host
By default performance data is fetched from the local host (in real-time mode) or the host for the first named set of archives on the command line (in archive mode). The host argument overrides this default. It does not override hosts explicitly named in the expressions being evaluated. The host argument is interpreted as a connection specification for pmNewContext , and is later mapped to the remote pmcd’s self-reported host name for reporting purposes. See also the %h vs. %c substitutions in rule action strings below.
-j file
An alternative STOMP protocol configuration is loaded from stompfile . If this option is not used, and the stomp action is used in any rule, the default location $PCP_SYSCONF_DIR/pmie/config/stomp will be used.
-l logfile , --logfile = logfile
Standard error is sent to logfile .
-m note , --note = note
Used to indicate where pmie has been launched from, e.g. pmie_check (1) and pmie_daily (1) use -m pmie_check and this is used by pmie to determine if it needs to be restarted should the PMCD hostname change, as described in the HOSTNAME CHANGES section below.
-n pmnsfile , --namespace = pmnsfile
An alternative Performance Metrics Name Space (PMNS) is loaded from the file pmnsfile .
-o format , --format = format
When precessing performance data from an archive, the -o option may be used to specify an alternate output format when a rule action is executed. See the DIFFERENCES IN HOST AND ARCHIVE MODES section for a description of how the output format may be constructed.
-O origin , --origin = origin
Specify the origin of the time window. See PCPIntro (1) for complete description of this option.
-P , --primary
Identifies this as the primary pmie instance for a host. See the ‘‘AUTOMATIC RESTART’’ section below for further details.
-q , --quiet
Suppresses diagnostic messages that would be printed to standard output by default, especially the "evaluator exiting" message as this can confuse scripts.
-S starttime , --start = starttime
Specify the starttime of the time window. See PCPIntro (1) for complete description of this option.
-t interval , --interval = interval
The interval argument follows the syntax described in PCPIntro (1), and in the simplest form may be an unsigned integer (the implied units in this case are seconds). The value is used to determine the sample interval for expressions that do not explicitly set their sample interval using the pmie variable delta described below. The default is 10.0 seconds.
-T endtime , --finish = endtime
Specify the endtime of the time window. See PCPIntro (1) for complete description of this option.
-U username , --username = username
User account under which to run pmie . The default is the current user account for interactive use. When run as a daemon, the unprivileged "pcp" account is used in current versions of PCP, but in older versions the superuser account ("root") was used by default.
|
-v |
Unless one of the verbose options -V , -v or -W appears on the command line, expressions are evaluated silently, the only output is as a result of any actions being executed. In the verbose mode, specified using the -v flag, the value of each expression is printed as it is evaluated. The values are in canonical units; bytes in the dimension of ‘‘space’’, seconds in the dimension of ‘‘time’’ and events in the dimension of ‘‘count’’. See pmLookupDesc (3) for details of the supported dimension and scaling mechanisms for performance metrics. The verbose mode is useful in monitoring the value of given expressions, evaluating derived performance metrics, passing these values on to other tools for further processing and in debugging new expressions. |
-V , --verbose
This option has the same effect as the -v option, except that the name of the host and instance (if applicable) are printed as well as expression values.
|
-W |
This option has the same effect as the -V option described above, except that for boolean expressions, only those names and values that make the expression true are printed. These are the same names and values accessible to rule actions as the %h, %i, %c and %v bindings, as described below. |
-x , --secret-agent
Execute in domain agent mode. This mode is used within the Performance Co-Pilot product to derive values for summary metrics, see pmdasummary (1). Only restricted functionality is available in this mode (expressions with actions may not be used).
-X , --secret-applet
Run in secret applet mode (thin client).
-z , --hostzone
Change the reporting timezone to the timezone of the host that is the source of the performance metrics, as identified via either the -h option or the first named set of archives (as described above for the -a option).
-Z timezone , --timezone = timezone
Change the reporting timezone to timezone in the format of the environment variable TZ as described in environ (7).
-? , --help
Display usage message and exit.
EXAMPLES
The following example expressions demonstrate some of the capabilities of the inference engine.
The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated examples of pmie expressions.
The variable delta controls expression evaluation frequency. Specify that subsequent expressions be evaluated once a second, until further notice:
delta = 1 sec;
If the total context switch rate exceeds 10000 per second per CPU, then display an alarm notifier:
kernel.all.pswitch
/ hinv.ncpu > 10000 count/sec
-> alarm "high context switch rate %v";
If the high context switch rate is sustained for 10 consecutive samples, then launch top (1) in an xterm (1) window to monitor processes, but do this at most once every 5 minutes:
all_sample (
kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu
) -> shell 5 min "xterm -e
’top’";
The following rules are evaluated once every 20 seconds:
delta = 20 sec;
If any disk is performing more than 60 I/Os per second, then print a message identifying the busy disk to standard output and launch dkvis (1):
some_inst (
disk.dev.total > 60 count/sec
) -> print "busy disks:" " %i" &
shell 5 min "dkvis";
Refine the preceding rule to apply only between the hours of 9am and 5pm, and to require 3 of 4 consecutive samples to exceed the threshold before executing the action:
$hour >= 9
&& $hour <= 17 &&
some_inst (
75 %_sample (
disk.dev.total @0..3 > 60 count/sec
)
) -> print "disks busy for 20 sec:" "
[%h]%i";
The following two rules are evaluated once every 10 minutes:
delta = 10 min;
If either the / or the /usr filesystem is more than 95% full, display an alarm popup, but not if it has already been displayed during the last 4 hours:
filesys.free
#’/dev/root’ /
filesys.capacity #’/dev/root’ < 0.05
-> alarm 4 hour "root filesystem (almost)
full";
filesys.free
#’/dev/usr’ /
filesys.capacity #’/dev/usr’ < 0.05
-> alarm 4 hour "/usr filesystem (almost)
full";
The following rule requires a machine that supports the lmsensors metrics. If the machine environment temperature rises more than 2 degrees over a 10 minute interval, write an entry in the system log:
lmsensors.coretemp_isa.temp1
@0 - lmsensors.coretemp_isa.temp1 @1 > 2
-> alarm "temperature rising fast" &
syslog "machine room temperature rise alarm";
And something interesting if you have performance problems with your Oracle database:
// back to
30sec evaluations
delta = 30 sec;
|
sid = "ptg1"; |
|||||
|
# $ORACLE_SID setting |
|||||
|
lid = "223"; |
|||||
|
# latch ID from v$latch |
lru = "#’$sid/$lid
cache buffers lru chain’";
host = ":moomba.melbourne.sgi.com";
gets = "oracle.latch.gets $host $lru";
total = "oracle.latch.gets $host $lru +
oracle.latch.misses $host $lru +
oracle.latch.immisses $host $lru";
$total > 100
&& $gets / $total < 0.2
-> alarm "high lru latch contention in database
$sid";
The following ruleset will emit exactly one message depending on the availability and value of the 1-minute load average.
delta = 1
minute;
ruleset
kernel.all.load #’1 minute’ > 10 * hinv.ncpu
->
print "extreme load average %v"
else kernel.all.load #’1 minute’ > 2 *
hinv.ncpu ->
print "moderate load average %v"
unknown ->
print "load average unavailable"
otherwise ->
print "load average OK"
;
The following rule will emit a message when some filesystem is more than 75% full and is filling at a rate that if sustained would fill the filesystem to 100% in less than 30 minutes.
some_inst (
100 * filesys.used / filesys.capacity > 75 &&
filesys.used + 30min * (rate filesys.used) >
filesys.capacity
) -> print "filesystem will be full within 30
mins:" " %i";
If the metric mypmda.errors counts errors then the following rule will emit a message if the rate of errors exceeds 1 per second provided the error count is less than 100.
mypmda.errors
> 1 && instant mypmda.errors < 100
-> print "high error rate: %v";
QUICK START
The pmie specification language is powerful and large.
To expedite rapid development of pmie rules, the pmieconf (1) tool provides a facility for generating a pmie configuration file from a set of generalized pmie rules. The supplied set of rules covers a wide range of performance scenarios.
The Performance Co-Pilot User’s and Administrator’s Guide provides a detailed tutorial-style chapter covering pmie .
EXPRESSION SYNTAX
This description is terse and informal. For a more comprehensive description see the Performance Co-Pilot User’s and Administrator’s Guide .
A pmie specification is a sequence of semicolon terminated expressions.
Basic operators are modeled on the arithmetic, relational and Boolean operators of the C programming language. Precedence rules are as expected, although the use of parentheses is encouraged to enhance readability and remove ambiguity.
Operands are performance metric names (see PMNS (5)) and the normal literal constants.
Operands involving performance metrics may produce sets of values, as a result of enumeration in the dimensions of hosts , instances and time . Special qualifiers may appear after a performance metric name to define the enumeration in each dimension. For example,
kernel.percpu.cpu.user :foo :bar #cpu0 @0..2
defines 6 values corresponding to the time spent executing in user mode on CPU 0 on the hosts ‘‘foo’’ and ‘‘bar’’ over the last 3 consecutive samples. The default interpretation in the absence of : (host), # (instance) and @ (time) qualifiers is all instances at the most recent sample time for the default source of PCP performance metrics.
Host and instance names that do not follow the rules for variables in programming languages, i.e. alphabetic optionally followed by alphanumerics, should be enclosed in single quotes.
Expression evaluation follows the law of ‘‘least surprises’’. Where performance metrics have the semantics of a counter, pmie will automatically convert to a rate based upon consecutive samples and the time interval between these samples. All numeric expressions are evaluated in double precision, and where appropriate, automatically scaled into canonical units of ‘‘bytes’’, ‘‘seconds’’ and ‘‘counts’’.
A rule is a special form of expression that specifies a condition or logical expression, a special operator ( -> ) and actions to be performed when the condition is found to be true.
The following table summarizes the basic pmie operators:
|
All operators are supported for numeric-valued operands and expressions. For string-valued operands, namely literal string constants enclosed in double quotes or metrics with a data type of string ( PM_TYPE_STRING ), only the operators == and != are supported. The rate and instant operators are the logical inverse of one another, so an arithmetic expression expr is equal to rate instant expr . The more useful cases involve using rate with a metric that is not a counter to determine the rate of change over time or instant with a metric that is a counter to determine if the current value is above or below some threshold. Aggregate operators may be used to aggregate or summarize along one dimension of a set-valued expression. The following aggregate operators map from a logical expression to a logical expression of lower dimension. |
|
The following instantial operators may be used to filter or limit a set-valued logical expression, based on regular expression matching of instance names. The logical expression must be a set involving the dimension of instances, and the regular expression is of the form used by egrep (1) or the Extended Regular Expressions of regcomp (3). |
|
For example, the expression below will be ‘‘true’’ for disks attached to controllers 2 or 3 performing more than 20 operations per second: match_inst "ˆdks[23]d" disk.dev.total > 20; The following aggregate operators map from an arithmetic expression to an arithmetic expression of lower dimension. |