Networking and Linux concepts: Performance tools in Linux

top

The top command in Linux displays the running processes on the system. It is used extensively for monitoring the load on a server.

Uptime and Load averages:
top - 20:55:50 up 176 days, 7:38, i user, load average: 1.39, 0.95, 0.76

The fields display:

current time
the time the system has been up
number of users logged in
load average of 5 minutes, 10 minutes and 15 minutes respectively

This uptime display can be toggled with the 'l' command.

Tasks:
Tasks: 288 total, 1 running, 287 sleeping, 0 stopped, 0 zombie

Shows summary of tasks or processes. The processes can be in different states. It shows the total number of processes. These processes can be running, sleeping, stopped or in zombie state. These processes can be toggled with the 't' command.

CPU states:
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Next is shown the CPU state. Here percentage of CPU(s) time in different modes is shown:

us, user: CPU time in user processes
sy, system: CPU time in running kernel processes
ni, niced: CPU time in running niced user processes
wa, I/O wait: CPU time waiting for I/O completion
hi: CPU time serving hardware interrupts
si: CPU time serving software interrupts
st: CPU time stolen from this VM by the Hypervisor

Memory usage:
Mem: 164615148k total, 5679640k used, 10785508k free, 261452k buffers
Swap: 18481148k total, 0k used, 18481148k free, 1254932k cached

The memory usage is sort of like the # free command output. The first line shows details for physical memory. The second line displays information on the virtual memory (swap space).

Fields/Columns:

The processes are shown in columns.

PID:
The Process IDs, to uniquely identify processes.

USER:
The effective username of the owner of the processes.

PR:
The scheduling priority of the process.

NI:
The nice value of the process. Lower value means higher priority.

VIRT:
The amount of virtual memory used by the process.

RES:
The resident memory size. Resident memory is the amount of non-swapped physical memory a task is using.

SHR:
SHR is the shared memory used by the process.

S:
This is the process status. It can have one of the following values:
D - uninterruptible sleep
R - running
S - sleeping
T - traced or stopped
Z - zombie

%CPU:
It is the percentage of CPU time the task has used since last update.

%MEM:
Percentage of available physical memory used by the process.

TIME+:
The total CPU time the task has used since it started, with precision upto hundredth of a second.

COMMAND:
The command which was used to start the process.

iostat

The iostat command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates. The iostat creates reports that can be used to change system configuration to better balance the input/output between physical disks.

server.us.company.com: / >
server.us.company.com: / > iostat
Linux 2.6.39-400.109.5.el5uek (server.us.company.com) 09/27/2014

avg-cpu: %user %nice %system %iowait %steal %idle
0.11 0.02 0.03 0.03 0.00 99.81

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.30 0.11 14.27 1628194 218384238
sda1 0.00 0.00 0.00 3146 30
sda2 1.30 0.11 14.27 1624648 218384208
dm-0 1.79 0.11 14.27 1622738 218384208
dm-1 0.00 0.00 0.00 1472 0

server.us.company.com: / >
server.us.company.com: / >

The first section contains the CPU report:

%user: shows the percentage of CPU utilization that occurs while executing at the user (application) level
%nice: shows the percentage of CPU utilization that occurs while executing at the user level with nice priority
%system: shows the percentage of CPU utilization that occurs while executing at the system (kernel) level
%iowait: shows the percentage of the time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request
%steal: shows the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor
%idle: shows the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request

The second section contains device utilization report:

Device: device/partition name as listed in /dev directory
tps: shows the number of transfers per second that were issued to the device. Higher tps means the processor is busier
Blk_read/s: shows the amount of data read from the device expressed in number of blocks (kilobytes, megabytes) per second
Blk_wrtn/s: shows the amount of data written to the device expressed in number of blocks (kilobytes, megabytes) per second
Blk_read: shows the total number of blocks read
Blk_wrtn: shows the total number of blocks written

The various data you have seen above is in bytes. You can use the 'k' option and display the information in Kilobytes, for ease of readability. Combined with a couple of more options, let us look at an example where the disk I/O and iostat outputs are put out on the screen four times, with a gap of three seconds after every read:

server.us.company.com: / >
server.us.company.com: / >
server.us.company.com: / > iostat -k 3 4
Linux 2.6.39-400.109.5.el5uek (server.us.company.com) 09/27/2014

avg-cpu: %user %nice %system %iowait %steal %idle
0.11 0.02 0.03 0.03 0.00 99.81

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1.30 0.05 7.14 814097 109311967
sda1 0.00 0.00 0.00 1573 15
sda2 1.30 0.05 7.14 812324 109311952
dm-0 1.79 0.05 7.14 811369 109311952
dm-1 0.00 0.00 0.00 736 0

avg-cpu: %user %nice %system %iowait %steal %idle
0.08 0.00 0.08 0.17 0.00 99.67

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 6.67 0.00 32.00 0 96
sda1 0.00 0.00 0.00 0 0
sda2 6.67 0.00 32.00 0 96
dm-0 8.00 0.00 32.00 0 96
dm-1 0.00 0.00 0.00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
0.08 0.00 0.13 0.00 0.00 99.79

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 2.67 0.00 10.67 0 32
sda1 0.00 0.00 0.00 0 0
sda2 2.67 0.00 10.67 0 32
dm-0 2.67 0.00 10.67 0 32
dm-1 0.00 0.00 0.00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
0.04 0.00 0.08 0.04 0.00 99.83

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.67 0.00 5.33 0 16
sda1 0.00 0.00 0.00 0 0
sda2 0.67 0.00 5.33 0 16
dm-0 1.33 0.00 5.33 0 16
dm-1 0.00 0.00 0.00 0 0

server.us.company.com: / >
server.us.company.com: / >
server.us.company.com: / >

uptime

server.us.company.com: / >
server.us.company.com: / > uptime
19:20:13 up 178 days, 6:03, 1 user, load average: 3.73, 7.98, 0.50
server.us.company.com: / >

The uptime command displays how long the server/system has been up and running since the last reboot.
The 19:20:13 shows the current time in 24-hour format.
The 178 days and 6:03 says that the system has been running for 178 days, 6 hours and 3 minutes.
The total number of users logged in is 1.
What is loadavg ? What do the three numbers in uptime represent ?
Exponentially damped/weighted moving average
On single-CPU machines that are CPU-bound, one can think of load average as a percentage of system utilization during the respective time period. For systems with multiple CPUs, the number needs to be divided by the number of processors in order to get a percentage.
For example, a load average of "3.73 7.98 0.50" on a single-CPU system can be interpreted as:
During the last minute, the CPU was overloaded by 273% (1 CPU with 3.73 runnable processes, so that 2.73 processes were waiting for their turn). The CPU was only half busy during half of the last 15 minutes. This means that this CPU could have handled all of the work scheduled for the last minute if it were 3.73 times as fast, or if there were 4 (3.73 rounded up) times as many CPUs, but that over the last 15 minutes it was twice as fast as necessary to prevent runnable processes from waiting for their turn.

Conversely, in a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU.

mpstat

mpstat is used for monitoring CPU utilization. This tool is more useful when there are multiple CPUs. server.us.company.com: / >
server.us.company.com: / > mpstat
Linux 2.6.39-400.109.5.el5uek (server.us.company.com) 09/28/2014

07:30:28 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
07:30:28 PM all 0.11 0.02 0.03 0.03 0.00 0.00 0.00 99.81 232.23
server.us.company.com: / >

In this output,

07:30:28 PM: the time that mpstat was run
all: means all CPUs
%usr: shows the percentage of CPU utilization that occurs while executing at the user level (application)
%nice: shows the percentage of CPU utilization that occurs while executing at the user level with nice priority
%sys: shows the percentage of CPU utilization that occurs while executing at the system level (kernel)
%iowait: shows the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request
%irq: shows the percentage of time spent by the CPU(s) to service hardware interrupts
%soft: shows the percentage of time spent by the CPU(s) to service software interrupts
%steal: shows the percentage of time spent in involuntary wait by the virtual CPU(s) while the hypervisor was servicing another virtual processor
%idle: shows the percentage of time spent that the CPU(s) were idling and the system did not have an outstanding disk I/O request

A useful way of checking all the CPUs for their utilization:
server.us.company.com: / >
server.us.company.com: / > mpstat -P ALL
Linux 2.6.39-400.109.5.el5uek (server.us.company.com) 09/28/2014

08:00:23 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
08:00:23 PM all 0.11 0.02 0.03 0.03 0.00 0.00 0.00 99.81 232.31
08:00:23 PM 0 0.07 0.01 0.03 0.03 0.00 0.00 0.00 99.87 0.00
08:00:23 PM 1 0.08 0.01 0.03 0.08 0.00 0.00 0.00 99.80 0.00
08:00:23 PM 2 0.12 0.00 0.03 0.04 0.00 0.00 0.00 99.81 0.00
08:00:23 PM 3 0.32 0.12 0.04 0.00 0.00 0.00 0.00 99.51 0.00
08:00:23 PM 4 0.08 0.01 0.02 0.08 0.00 0.00 0.00 99.82 0.00
08:00:23 PM 5 0.07 0.01 0.02 0.02 0.00 0.00 0.00 99.89 0.00
08:00:23 PM 6 0.07 0.00 0.02 0.00 0.00 0.00 0.00 99.90 0.00
08:00:23 PM 7 0.07 0.00 0.02 0.00 0.00 0.00 0.00 99.91 0.00
server.us.company.com: / >
server.us.company.com: / >

vmstat

The amount of memory (RAM) is finite, and you can only load a certain number of applications. When you try to load too many applications into memory, the computer will come back to you and say: "Sorry, you cannot run any more applications. You need to close some of the applications already running".

To resolve this sort of a problematic situation, the operating system uses a concept called Virtual Memory. This method will search the area of memory not recently used by an application, copy it into the hard disk, thereby freeing up some memory and give you the opportunity to run more applications.

vmstat provides reporting virtual memory statistics. It covers system's memory, swap and processor(s) utilizations in real time.

server.us.company.com: / >
server.us.company.com: / > vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 10785136 261548 1255252 0 0 0 1 0 0 0 0 100 0 0
0 0 0 10785136 261548 1255252 0 0 0 3 1042 2206 0 0 100 0 0
0 0 0 10785136 261548 1255252 0 0 0 17 1033 2209 0 0 100 0 0
0 0 0 10785128 261548 1255252 0 0 0 25 981 2162 0 0 100 0 0
0 0 0 10785128 261548 1255252 0 0 0 11 954 2127 0 0 100 0 0
server.us.company.com: / >

Procs:
r: the total number of processes that are waiting for access to the processor
b: the total number of processes in a sleep state

Memory:
swpd: shows how much memory has been swapped to a swap file or disk
free: shows the unallocated memory available
buff: shows how much buffer space is taken up
cache: shows much memory that can be swapped into the swap file or disk if there is some application needing it

Swap:
Swap shows how much memory is sent or retrieved from the swap system.
si: how much memory is moved from swap to real memory per second
so: how much memory is moved from real memory to swap

I/O:
The I/O shows the amount of input and output activity per second in terms of blocks read and blocks written.
bi: the number of blocks received
bo: the number of blocks sent

System:
Shows the number of system operations per second.
in: the number of system interrupts per second
cs: the number of context switches that the system makes in order to process all tasks

CPU:
Shows the use of CPU's resources.

us: how much time that processor spends on non-kernel processes
sy: how much time that processor spends on kernel related tasks
id: how long the processor has been idle
wa: how much time or how long the processor has been waiting for I/O operations to complete before being able to continue processing tasks

free

ping

nicstat

dstat

sar

netstat

pidstat

strace

tcpdump

blktrace

iotop

slabtop

sysctl

/proc

btrace

perf

dtrace

SystemTap

lsof

pcstat

ftrace

stap

ktap

ebpf

lttng

tiptop

swapon

ltrace

ss

ltrace

iptraf

snmpget

lldptool

sysdig

rdmsr

Loadavg:

What is loadavg ? What do the three numbers in uptime represent ?

delta59.company.com: / >
delta59.company.com: / >
delta59.company.com: / > uptime
07:38:22 up 81 days, 17:45, 6 users, load average: 3.73, 7.98, 0.50
delta59.company.com: / >
delta59.company.com: / >

Exponentially damped/weighted moving average
On single-CPU machines that are CPU-bound, one can think of load average as a percentage of system utilization during the respective time period. For systems with multiple CPUs, the number needs to be divided by the number of processors in order to get a percentage.
For example, a load average of "3.73 7.98 0.50" on a single-CPU system can be interpreted as:
During the last minute, the CPU was overloaded by 273% (1 CPU with 3.73 runnable processes, so that 2.73 processes were waiting for their turn). The CPU was only half busy during half of the last 15 minutes. This means that this CPU could have handled all of the work scheduled for the last minute if it were 3.73 times as fast, or if there were 4 (3.73 rounded up) times as many CPUs, but that over the last 15 minutes it was twice as fast as necessary to prevent runnable processes from waiting for their turn.

Conversely, in a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU. Average number of processes in the run queue. EMA

Networking and Linux concepts

Tuesday, March 31, 2020

Performance tools in Linux

No comments:

Post a Comment

Various topics

About Me