User and system time per process
Long time no blog post - so let's change that :-)
Today I've come across a system with a pretty high load, but couldn't quite make out which processes were responsible for the load. The ps command showed:
$ uptime 20:16pm up 1 day 2:40, 12 users, load average: 132.81, 132.79, 133.43 $ ps -eo user,pid,rsz,vsz,pmem,pcpu,time,args --sort -pcpu USER PID RSZ VSZ %MEM %CPU TIME COMMAND user0 18428 3124 58236 0.0 1.1 00:00:00 LLAWP /www/user0/eam/WebAgent.conf -A user0 8387 14976 154220 0.0 4.2 01:07:44 LLAWP /www/user0/eam/WebAgent.conf -A user0 4508 14828 152864 0.0 4.2 01:07:21 LLAWP /www/user0/eam/WebAgent.conf -A user0 8045 15000 154220 0.0 4.2 01:07:33 LLAWP /www/user0/eam/WebAgent.conf -A user0 23814 14892 152868 0.0 4.2 01:06:47 LLAWP /www/user0/eam/WebAgent.conf -A user0 18384 3124 58236 0.0 0.8 00:00:00 LLAWP /www/user0/eam/WebAgent.conf -A user0 17224 14932 152952 0.0 4.1 01:05:39 LLAWP /www/user0/eam/WebAgent.conf -ASo, while each of these processes is using some CPU time, it didn't quite explain the high load.
top(1)
shows:
Tasks: 1439 total, 9 running, 1430 sleeping, 0 stopped, 0 zombie Cpu0 : 14.1%us, 49.5%sy, 0.0%ni, 34.3%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st Cpu1 : 19.0%us, 51.4%sy, 0.0%ni, 28.6%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 15.2%us, 47.6%sy, 0.0%ni, 35.2%id, 0.0%wa, 0.0%hi, 1.9%si, 0.0%st Cpu3 : 16.8%us, 45.8%sy, 0.0%ni, 37.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 17.4%us, 39.4%sy, 0.0%ni, 41.3%id, 0.9%wa, 0.0%hi, 0.9%si, 0.0%st Cpu5 : 14.7%us, 39.4%sy, 0.0%ni, 14.7%id, 0.0%wa, 0.0%hi, 31.2%si, 0.0%st Cpu6 : 14.5%us, 30.0%sy, 0.0%ni, 52.7%id, 0.0%wa, 0.0%hi, 2.7%si, 0.0%st Cpu7 : 9.4%us, 34.9%sy, 0.0%ni, 55.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stSo we have 1439 processes, but only 9 are running and 1430 are sleeping - and there's a quite a lot system time being used. But there's no ps option to display
user
oder system
time. However, this being a Linux system, we still have the /proc filesystem. The /proc/PID/stat
file of a 2.6.30 kernel contains 52 fields, among them are:
2: tcomm filename of the executable 3: state state (R=running, S=sleeping, D=uninterruptible wait, Z=zombie, T=traced/stopped) 14: utime user mode jiffies 15: stime kernel mode jiffiesSo let's try this:
$ for PID in `ps -eo pid`; do awk '{print "user: " $14 " kernel: " $15 " state: " $3 " process: " $2}' /proc/$PID/stat done > foo $ sort -t: -nk3 foo | tail -5 user: 71232 kernel: 335302 state: D process: (LLAWP) user: 71105 kernel: 342684 state: D process: (LLAWP) user: 71290 kernel: 346119 state: D process: (LLAWP) user: 71009 kernel: 347570 state: D process: (LLAWP) user: 71278 kernel: 348388 state: D process: (LLAWP)This is only the top-5, but LLAWP was using the most system on this system while really doing nothing - it was in uninterruptible sleep ("D"). Restarting those processes helped and the system load returned to normal :-)