January 2013
Abstract
The OOM killer becomes active on an NFS4 server while there seems to be plenty memory left
Table of Contents
We 've got an NFS4 server, and it's read-only, as you can see in /etc/exports
:
/exports 192.152.0.0/16(ro,async,root_squash,no_subtree_check,fsid=0,crossmnt) /exports/netapps 192.152.0.0/16(ro,async,root_squash,no_subtree_check) 10.0.0.0/8(ro,async,root_squash,no_subtree_check) 172.16.0.0/12(ro,async,root_squash,no_subtree_check) 192.168.0.0/16(ro,async,root_squash,no_subtree_check)
The exported filesystem is ext4 on top of LVM, and the machine itself is an ESX VM with one CPU, 2GB of RAM and 3GB of swap.
One monday morning, the OOM Killer lashes out:
Jan 14 08:15:17 nfssrv kernel: [2151974.493200] nfsd invoked oom-killer: gfp_mask=0x40d0, order=1, oom_adj=0
Jan 14 08:15:18 nfssrv kernel: [2151974.493208] nfsd cpuset=/ mems_allowed=0
Jan 14 08:15:18 nfssrv kernel: [2151974.493214] Pid: 1849, comm: nfsd Not tainted 2.6.32-5-amd64 #1
Jan 14 08:15:18 nfssrv kernel: [2151974.493217] Call Trace:
Jan 14 08:15:18 nfssrv kernel: [2151974.493229] [<ffffffff810b643c>] ? oom_kill_process+0x7f/0x23f
Jan 14 08:15:18 nfssrv kernel: [2151974.493237] [<ffffffff8106bb5e>] ? timekeeping_get_ns+0xe/0x2e
Jan 14 08:15:18 nfssrv kernel: [2151974.493242] [<ffffffff810b6960>] ? __out_of_memory+0x12a/0x141
Jan 14 08:15:18 nfssrv kernel: [2151974.493247] [<ffffffff810b6ab7>] ? out_of_memory+0x140/0x172
Jan 14 08:15:18 nfssrv kernel: [2151974.493253] [<ffffffff810ba81c>] ? __alloc_pages_nodemask+0x4ec/0x5fc
Jan 14 08:15:18 nfssrv kernel: [2151974.493264] [<ffffffff810e6e7e>] ? new_slab+0x5b/0x1ca
Jan 14 08:15:18 nfssrv kernel: [2151974.493271] [<ffffffff810e71dd>] ? __slab_alloc+0x1f0/0x39b
Jan 14 08:15:18 nfssrv kernel: [2151974.493285] [<ffffffffa03e6170>] ? xs_setup_xprt+0xd7/0x151 [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493290] [<ffffffff810e84bf>] ? __kmalloc+0xf1/0x141
Jan 14 08:15:18 nfssrv kernel: [2151974.493300] [<ffffffffa03e6170>] ? xs_setup_xprt+0xd7/0x151 [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493309] [<ffffffffa03e6170>] ? xs_setup_xprt+0xd7/0x151 [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493317] [<ffffffff812432ba>] ? sock_common_recvmsg+0x30/0x45
Jan 14 08:15:18 nfssrv kernel: [2151974.493327] [<ffffffffa03e65e2>] ? xs_setup_tcp+0x15/0x1f4 [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493337] [<ffffffffa03e42e2>] ? xprt_create_transport+0x32/0x23a [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493346] [<ffffffffa03e3cf3>] ? rpc_create+0xbc/0x4f1 [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493352] [<ffffffff81065046>] ? autoremove_wake_function+0x0/0x2e
Jan 14 08:15:18 nfssrv kernel: [2151974.493383] [<ffffffffa04cc5c9>] ? setup_callback_client+0x101/0x13d [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493394] [<ffffffffa04cc61c>] ? nfsd4_probe_callback+0x17/0x57 [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493405] [<ffffffffa04c8eab>] ? nfsd4_setclientid_confirm+0x2b0/0x33e [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493416] [<ffffffffa04bc729>] ? nfsd4_proc_compound+0x1fc/0x3b2 [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493426] [<ffffffffa04ae329>] ? nfsd_dispatch+0xdd/0x1b9 [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493439] [<ffffffffa03eb563>] ? svc_process+0x403/0x627 [sunrpc]
Jan 14 08:15:18 nfssrv kernel: [2151974.493448] [<ffffffffa04ae772>] ? nfsd+0x0/0x12e [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493457] [<ffffffffa04ae857>] ? nfsd+0xe5/0x12e [nfsd]
Jan 14 08:15:18 nfssrv kernel: [2151974.493462] [<ffffffff81064d79>] ? kthread+0x79/0x81
Jan 14 08:15:18 nfssrv kernel: [2151974.493469] [<ffffffff81011baa>] ? child_rip+0xa/0x20
Jan 14 08:15:18 nfssrv kernel: [2151974.493473] [<ffffffff81064d00>] ? kthread+0x0/0x81
Jan 14 08:15:18 nfssrv kernel: [2151974.493478] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
Jan 14 08:15:18 nfssrv kernel: [2151974.493480] Mem-Info:
Jan 14 08:15:18 nfssrv kernel: [2151974.493483] Node 0 DMA per-cpu:
Jan 14 08:15:18 nfssrv kernel: [2151974.493488] CPU 0: hi: 0, btch: 1 usd: 0
Jan 14 08:15:18 nfssrv kernel: [2151974.493490] Node 0 DMA32 per-cpu:
Jan 14 08:15:18 nfssrv kernel: [2151974.493494] CPU 0: hi: 186, btch: 31 usd: 64
Jan 14 08:15:18 nfssrv kernel: [2151974.493502] active_anon:0 inactive_anon:34 isolated_anon:0
Jan 14 08:15:18 nfssrv kernel: [2151974.493504] active_file:65 inactive_file:274 isolated_file:0
Jan 14 08:15:18 nfssrv kernel: [2151974.493506] unevictable:0 dirty:32 writeback:0 unstable:0
Jan 14 08:15:18 nfssrv kernel: [2151974.493507] free:37141 slab_reclaimable:62795 slab_unreclaimable:409427
Jan 14 08:15:18 nfssrv kernel: [2151974.493509] mapped:19 shmem:0 pagetables:668 bounce:0
Jan 14 08:15:18 nfssrv kernel: [2151974.493512] Node 0 DMA free:8036kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15304kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:32kB slab_unreclaimable:7804kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 14 08:15:18 nfssrv kernel: [2151974.493529] lowmem_reserve[]: 0 2004 2004 2004
Jan 14 08:15:18 nfssrv kernel: [2151974.493535] Node 0 DMA32 free:140528kB min:5704kB low:7128kB high:8556kB active_anon:0kB inactive_anon:136kB active_file:260kB inactive_file:1096kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052256kB mlocked:0kB dirty:128kB writeback:0kB mapped:76kB shmem:0kB slab_reclaimable:251148kB slab_unreclaimable:1629904kB kernel_stack:904kB pagetables:2672kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:98 all_unreclaimable? yes
Jan 14 08:15:18 nfssrv kernel: [2151974.493552] lowmem_reserve[]: 0 0 0 0
Jan 14 08:15:18 nfssrv kernel: [2151974.493557] Node 0 DMA: 3*4kB 1*8kB 1*16kB 2*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8036kB
Jan 14 08:15:18 nfssrv kernel: [2151974.493570] Node 0 DMA32: 34452*4kB 300*8kB 4*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 140528kB
Jan 14 08:15:18 nfssrv kernel: [2151974.493582] 380 total pagecache pages
Jan 14 08:15:18 nfssrv kernel: [2151974.493584] 25 pages in swap cache
Jan 14 08:15:18 nfssrv kernel: [2151974.493587] Swap cache stats: add 2099460, delete 2099435, find 543315/1176734
Jan 14 08:15:18 nfssrv kernel: [2151974.493590] Free swap = 3465396kB
Jan 14 08:15:18 nfssrv kernel: [2151974.493592] Total swap = 3489784kB
Jan 14 08:15:18 nfssrv kernel: [2151974.507001] 524272 pages RAM
Jan 14 08:15:18 nfssrv kernel: [2151974.507005] 9054 pages reserved
Jan 14 08:15:18 nfssrv kernel: [2151974.507007] 319 pages shared
Jan 14 08:15:18 nfssrv kernel: [2151974.507009] 477519 pages non-shared
Jan 14 08:15:18 nfssrv kernel: [2151974.507014] Out of memory: kill process 11440 (pickup) score 8346 or a child
Jan 14 08:15:18 nfssrv kernel: [2151974.507225] Killed process 11440 (pickup)
In the half hour that follows, the OOM killer is provoked again and again, until things get so bad I have to restart the server by pushing the (virtual) button.
I did not expect this.
/proc/sys/vm/overcommit_memory
has a value of 2, and /proc/sys/vm/overcommit_ratio
is 80.
So I'd think that only when memory usage would be swap+80%RAM, the OOM killer would wake up.
As you can see, hardly any swap was in use.
After reboot, ext4_inode_cache
in slabtop output soon goes to 700MB, and dentry
to 100MB:
nfssrv:/home/apprentice# slabtop -o|egrep 'dentry|ext4_inode'
851781 851775 99% 0.19K 40561 21 162244K dentry
707920 707920 100% 0.96K 88490 8 707920K ext4_inode_cache
nfssrv:/home/apprentice# sync && for (( i = 1 ; i < 4 ; i++ )) ; do echo $i > /proc/sys/vm/drop_caches ; sleep .1 ; done
nfssrv:/home/apprentice# slabtop -o|egrep 'dentry|ext4_inode'
23352 10041 42% 0.19K 1112 21 4448K dentry
2888 1298 44% 0.96K 361 8 2888K ext4_inode_cache
That's ok, this is a file server, so it should cache the FS.
Judging from the logs above, an NFS kernel thread has provoked the OOM Killer while trying to get more slab.
Prior to the OOM event, the NFS server had more traffic than during the night:
nfssrv:/home/apprentice# sar -n NFSD
Later on I see in my rootmail that between 08:15:21 and 08:15:31, fifty NFS clients request the same file.
But the rise in traffic had already started as early as 7:55:03.
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
12:00:02 AM scall/s badcall/s packet/s udp/s tcp/s hit/s miss/s sread/s swrite/s saccess/s sgetatt/s
<snip>
05:35:02 AM 131.74 0.00 131.74 0.00 131.74 0.00 0.00 129.83 0.00 0.48 1.81
05:45:02 AM 131.71 0.00 131.71 0.00 131.71 0.00 0.00 131.46 0.00 0.03 0.08
05:55:02 AM 134.57 0.00 134.57 0.00 134.57 0.00 0.00 132.71 0.00 0.47 1.78
06:05:02 AM 143.58 0.00 143.59 0.00 143.58 0.00 0.00 141.70 0.00 0.47 1.80
06:15:03 AM 125.96 0.00 125.96 0.00 125.96 0.00 0.00 125.71 0.00 0.03 0.08
06:25:03 AM 133.80 0.00 133.80 0.00 133.80 0.00 0.00 131.94 0.00 0.46 1.78
06:35:02 AM 127.72 0.00 127.72 0.00 127.72 0.00 0.00 125.84 0.00 0.47 1.80
06:45:02 AM 124.29 0.00 124.29 0.00 124.29 0.00 0.00 124.03 0.00 0.03 0.10
06:55:02 AM 119.01 0.00 119.01 0.00 119.01 0.00 0.00 117.13 0.00 0.47 1.79
07:05:02 AM 40.60 0.00 40.60 0.00 40.60 0.00 0.00 38.68 0.00 0.49 1.84
07:15:03 AM 6.95 0.00 6.95 0.00 6.95 0.00 0.00 6.69 0.00 0.03 0.09
07:25:02 AM 41.31 0.00 41.31 0.00 41.31 0.00 0.00 39.40 0.00 0.48 1.83
07:35:05 AM 28.62 0.00 28.62 0.00 28.62 0.00 0.00 26.70 0.00 0.49 1.84
07:45:02 AM 31.13 0.00 31.13 0.00 31.13 0.00 0.00 30.85 0.00 0.04 0.12
07:55:03 AM 600.85 0.00 600.85 0.00 600.85 0.00 0.00 32.10 0.00 82.86 477.65
08:05:03 AM 1330.81 0.00 1330.84 0.00 1330.80 0.00 0.00 24.82 0.00 219.06 1281.26
08:15:03 AM 1764.18 0.00 1764.24 0.00 1764.17 0.00 0.00 24.69 0.00 294.29 1701.70
08:25:07 AM 1675.89 0.00 1675.90 0.00 1675.89 0.00 0.00 97.18 0.00 242.91 1422.20
08:35:04 AM 2139.22 0.00 2139.22 0.00 2139.22 0.00 0.00 23.80 0.00 353.34 2051.31
08:45:13 AM 1517.72 0.00 1517.73 0.00 1517.72 0.00 0.00 22.08 0.00 237.56 1399.89
08:55:19 AM 1754.02 0.00 1754.02 0.00 1754.02 0.00 0.00 12.05 0.00 286.85 1653.07
Average: 293.62 0.00 293.63 0.00 293.62 0.00 0.00 94.70 0.00 32.47 188.32
09:27:24 AM LINUX RESTART
09:35:01 AM scall/s badcall/s packet/s udp/s tcp/s hit/s miss/s sread/s swrite/s saccess/s sgetatt/s
09:45:02 AM 583.65 0.00 583.65 0.00 583.65 0.00 0.00 62.80 0.00 75.15 438.14
09:55:01 AM 147.19 0.00 147.19 0.00 147.19 0.00 0.00 70.28 0.00 12.16 63.67
10:05:01 AM 671.96 0.00 671.96 0.00 671.96 0.00 0.00 95.46 0.00 86.02 483.80
10:15:01 AM 752.61 0.00 752.61 0.00 752.61 0.00 0.00 100.95 0.00 92.39 550.58
10:25:01 AM 208.10 0.00 208.10 0.00 208.10 0.00 0.00 106.95 0.00 15.65 86.23
I/O goes up too:
nfssrv:/home/apprentice# sar -f ~/sa14 -b -s 06:00:00 -e 10:00:00
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM tps rtps wtps bread/s bwrtn/s
06:15:03 AM 115.00 111.68 3.32 8204.66 42.36
06:25:03 AM 129.68 123.30 6.38 9123.76 76.53
06:35:02 AM 266.88 250.55 16.33 10785.91 218.79
06:45:02 AM 117.14 113.92 3.22 8755.04 40.89
06:55:02 AM 131.45 124.63 6.83 8834.95 81.90
07:05:02 AM 63.23 57.17 6.06 3694.20 72.38
07:15:03 AM 13.74 12.29 1.45 683.47 17.95
07:25:02 AM 80.82 73.73 7.09 4092.92 85.30
07:35:05 AM 70.50 64.70 5.80 3065.11 69.16
07:45:02 AM 58.16 54.82 3.35 3138.49 42.71
07:55:03 AM 558.41 550.25 8.16 9987.06 96.02
08:05:03 AM 1074.79 1066.87 7.91 15100.50 88.75
08:15:03 AM 1605.11 1596.17 8.93 21859.32 97.98
08:25:07 AM 1520.93 1510.66 10.27 28636.57 111.14
08:35:04 AM 2037.42 2023.51 13.91 27100.50 152.89
08:45:13 AM 1270.55 1263.13 7.42 17039.51 87.77
08:55:19 AM 1566.63 1544.13 22.50 19718.10 259.13
Average: 629.25 621.07 8.18 11764.74 96.62
09:27:24 AM LINUX RESTART
09:35:01 AM tps rtps wtps bread/s bwrtn/s
09:45:02 AM 1.65 0.35 1.30 28.69 14.05
09:55:01 AM 23.47 8.70 14.77 380.06 197.41
Average: 12.55 4.52 8.03 204.08 105.58
but wait time stays reasonable:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -d |egrep 'DEV|254-2|RESTART|^$'
06:05:02 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
06:15:03 AM dev254-2 46.73 3710.40 0.00 79.40 0.10 2.12 1.15 5.39
06:25:03 AM dev254-2 49.73 4040.60 0.00 81.26 0.13 2.55 1.40 6.95
06:35:02 AM dev254-2 118.68 4484.30 0.00 37.79 0.49 4.10 0.93 11.05
06:45:02 AM dev254-2 48.03 3971.01 0.00 82.69 0.12 2.52 1.31 6.30
06:55:02 AM dev254-2 45.69 3823.96 0.00 83.69 0.13 2.76 1.44 6.60
07:05:02 AM dev254-2 17.08 1336.11 0.00 78.23 0.06 3.76 2.14 3.66
07:15:03 AM dev254-2 2.94 227.87 0.00 77.51 0.02 5.67 2.57 0.76
07:25:02 AM dev254-2 23.61 1449.37 0.00 61.39 0.07 3.00 1.66 3.92
07:35:05 AM dev254-2 16.33 986.10 0.00 60.38 0.07 4.49 2.60 4.25
07:45:02 AM dev254-2 18.77 1173.61 0.00 62.52 0.05 2.83 1.49 2.80
07:55:03 AM dev254-2 380.33 4155.44 0.05 10.93 1.20 3.15 0.45 17.07
08:05:03 AM dev254-2 743.06 6853.74 0.05 9.22 3.24 4.36 0.55 41.02
08:15:03 AM dev254-2 1145.24 10039.19 0.05 8.77 4.60 4.01 0.34 38.97
08:25:07 AM dev254-2 1097.08 13148.82 0.37 11.99 5.53 5.04 0.37 40.51
08:35:04 AM dev254-2 1466.55 12545.66 1.54 8.56 8.42 5.74 0.41 60.71
08:45:13 AM dev254-2 910.94 8008.72 14.38 8.81 3.27 3.59 0.39 35.69
08:55:19 AM dev254-2 1105.53 9200.08 78.08 8.39 5.99 5.42 0.44 48.74
Average: dev254-2 426.41 5249.78 5.61 12.32 1.97 4.62 0.46 19.69
09:27:24 AM LINUX RESTART
09:35:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:45:02 AM dev254-2 0.12 9.60 0.04 80.33 0.00 17.44 9.56 0.11
09:55:01 AM dev254-2 0.00 0.13 0.01 44.00 0.00 60.00 60.00 0.02
Average: dev254-2 0.06 4.87 0.03 79.35 0.00 18.59 10.92 0.07
There is hardly any movement in the paging:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -R
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM frmpg/s bufpg/s campg/s
06:15:03 AM 0.84 -0.01 -0.67
06:25:03 AM -0.45 -0.00 -0.27
06:35:02 AM -0.20 0.01 -0.01
06:45:02 AM -0.09 0.00 0.35
06:55:02 AM -0.17 -0.01 -0.21
07:05:02 AM -0.79 0.03 0.54
07:15:03 AM 0.72 -0.02 -0.95
07:25:02 AM -1.58 0.01 1.55
07:35:05 AM -1.29 0.01 0.82
07:45:02 AM 2.19 -0.01 -2.18
07:55:03 AM 0.44 0.41 -0.84
08:05:03 AM -1.72 -0.34 1.43
08:15:03 AM 0.99 0.05 -1.36
08:25:07 AM 0.55 0.23 -0.46
08:35:04 AM -1.08 0.36 0.50
08:45:13 AM 1.03 -0.72 -0.41
08:55:19 AM -0.98 0.74 0.42
Average: -0.09 0.04 -0.10
09:27:24 AM LINUX RESTART
09:35:01 AM frmpg/s bufpg/s campg/s
09:45:02 AM -2.33 0.23 1.79
09:55:01 AM -33.36 2.46 25.36
Average: -17.82 1.34 13.56
CPU usage shows more and more time spent by the kernel from 7:55 onwards:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -u
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM CPU %user %nice %system %iowait %steal %idle
06:15:03 AM all 0.01 0.00 3.54 77.09 0.00 19.36
06:25:03 AM all 0.01 0.00 3.83 80.60 0.00 15.57
06:35:02 AM all 0.22 0.00 3.94 80.94 0.00 14.90
06:45:02 AM all 0.01 0.00 3.48 81.13 0.00 15.39
06:55:02 AM all 0.05 0.00 3.40 76.04 0.00 20.51
07:05:02 AM all 0.01 0.00 1.20 36.21 0.00 62.59
07:15:03 AM all 0.00 0.00 0.20 7.30 0.00 92.50
07:25:02 AM all 0.01 0.00 1.15 38.09 0.00 60.75
07:35:05 AM all 0.08 0.00 0.83 30.92 0.00 68.17
07:45:02 AM all 0.01 0.00 0.88 31.76 0.00 67.35
07:55:03 AM all 0.01 0.00 10.08 57.73 0.00 32.17
08:05:03 AM all 0.02 0.00 19.03 68.92 0.00 12.04
08:15:03 AM all 0.01 0.00 23.93 60.30 0.00 15.75
08:25:07 AM all 0.09 0.00 23.15 54.33 0.00 22.43
08:35:04 AM all 0.05 0.00 28.71 71.20 0.00 0.04
08:45:13 AM all 0.08 0.00 21.22 66.21 0.00 12.49
08:55:19 AM all 0.14 0.00 23.54 70.15 0.00 6.17
Average: all 0.05 0.00 10.14 58.18 0.00 31.64
09:27:24 AM LINUX RESTART
09:35:01 AM CPU %user %nice %system %iowait %steal %idle
09:45:02 AM all 0.09 0.00 6.86 0.20 0.00 92.85
09:55:01 AM all 0.68 0.03 7.30 2.16 0.00 89.84
Average: all 0.38 0.01 7.08 1.18 0.00 91.35
Context switching goes rather crazy:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -w
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM proc/s cswch/s
06:15:03 AM 0.02 369.07
06:25:03 AM 0.03 401.97
06:35:02 AM 0.24 486.13
06:45:02 AM 0.02 366.99
06:55:02 AM 0.02 376.08
07:05:02 AM 0.02 172.42
07:15:03 AM 0.01 50.93
07:25:02 AM 0.02 216.06
07:35:05 AM 0.02 171.07
07:45:02 AM 0.02 164.47
07:55:03 AM 0.02 2690.00
08:05:03 AM 0.02 5684.55
08:15:03 AM 0.02 6854.36
08:25:07 AM 0.03 6157.04
08:35:04 AM 0.02 8164.11
08:45:13 AM 0.03 6409.18
08:55:19 AM 0.02 7112.54
Average: 0.03 2702.19
09:27:24 AM LINUX RESTART
09:35:01 AM proc/s cswch/s
09:45:02 AM 0.03 2101.04
09:55:01 AM 0.60 433.35
Average: 0.31 1268.57
... with a slight increase in things that need to be swapped back in:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -W
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM pswpin/s pswpout/s
06:15:03 AM 5.76 2.16
06:25:03 AM 7.05 2.67
06:35:02 AM 8.06 3.45
06:45:02 AM 5.58 2.06
06:55:02 AM 7.80 3.00
07:05:02 AM 7.39 2.74
07:15:03 AM 2.19 0.70
07:25:02 AM 8.23 3.17
07:35:05 AM 7.15 2.51
07:45:02 AM 5.63 2.21
07:55:03 AM 9.60 3.76
08:05:03 AM 8.28 3.13
08:15:03 AM 10.11 3.77
08:25:07 AM 9.90 3.18
08:35:04 AM 10.22 4.49
08:45:13 AM 6.79 2.68
08:55:19 AM 4.53 2.18
Average: 7.31 2.81
09:27:24 AM LINUX RESTART
09:35:01 AM pswpin/s pswpout/s
09:45:02 AM 0.00 0.00
09:55:01 AM 0.00 0.00
Average: 0.00 0.00
And the number of files/inodes in use doesn't change much:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -v
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM dentunusd file-nr inode-nr pty-nr
06:15:03 AM 105 3360 230579 276
06:25:03 AM 121 3392 230688 276
06:35:02 AM 128 3360 230824 276
06:45:02 AM 106 3360 230759 276
06:55:02 AM 123 3360 230870 276
07:05:02 AM 124 3360 231000 276
07:15:03 AM 105 3360 230939 276
07:25:02 AM 118 3360 231041 276
07:35:05 AM 129 3360 231191 276
07:45:02 AM 105 3360 231132 276
07:55:03 AM 268 3360 231470 276
08:05:03 AM 136 3360 231428 276
08:15:03 AM 96 3360 231449 276
08:25:07 AM 103 3200 231576 276
08:35:04 AM 235 3072 231926 276
08:45:13 AM 111 3072 231709 276
08:55:19 AM 451 2848 232090 276
Average: 151 3288 231216 276
09:27:24 AM LINUX RESTART
09:35:01 AM dentunusd file-nr inode-nr pty-nr
09:45:02 AM 784621 2784 714086 242
09:55:01 AM 789455 2880 718061 245
Average: 787038 2832 716074 244
RAM usage deserves closer inspection:
nfssrv:/home/apprentice# sar -f ~/sa14 -s 06:00:00 -e 10:00:00 -r
Linux 2.6.32-5-amd64 (nfssrv) 01/14/2013 _x86_64_ (1 CPU)
06:05:02 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
06:15:03 AM 150348 1910780 92.71 268 4956 82072 1.48
06:25:03 AM 149260 1911868 92.76 260 4316 83556 1.51
06:35:02 AM 148772 1912356 92.78 276 4300 82952 1.49
06:45:02 AM 148556 1912572 92.79 276 5136 81420 1.47
06:55:02 AM 148152 1912976 92.81 252 4640 82952 1.49
07:05:02 AM 146252 1914876 92.90 316 5936 82072 1.48
07:15:03 AM 147972 1913156 92.82 264 3640 82072 1.48
07:25:02 AM 144184 1916944 93.00 276 7356 82072 1.48
07:35:05 AM 141080 1920048 93.16 296 9328 82204 1.48
07:45:02 AM 146304 1914824 92.90 272 4124 83084 1.50
07:55:03 AM 147356 1913772 92.85 1248 2104 82204 1.48
08:05:03 AM 143232 1917896 93.05 444 5524 82208 1.48
08:15:03 AM 145616 1915512 92.94 552 2260 82204 1.48
08:25:07 AM 146936 1914192 92.87 1100 1156 53528 0.96
08:35:04 AM 144368 1916760 93.00 1956 2360 51280 0.92
08:45:13 AM 146868 1914260 92.87 208 1368 50400 0.91
08:55:19 AM 144492 1916636 92.99 2004 2388 6680 0.12
Average: 146456 1914672 92.89 604 4170 72527 1.31
09:27:24 AM LINUX RESTART
09:35:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
09:45:02 AM 691976 1369152 66.43 388012 49216 63264 1.14
09:55:01 AM 612160 1448968 70.30 393908 109892 77664 1.40
Average: 652068 1409060 68.36 390960 79554 70464 1.27
That is to say: memory usage is nearly 2GB, but there is still 140MB free. That free space however, is in 4kb slab blocks (as can be read from the OOM killer log above). Only about 80MB is used as cache.
The fact that the OOM killer is triggered without memory being full is easily explained: the NFS kernel server runs in kernel threads. Judging from the logs, it asks for slab, which is never paged out (see Chapter 10 of these excellent Kernel Memory docs: slab is not part of any LRU). Apparently the NFS server needs something larger that 4kb blocks, and to get it, the kernel sends the OOM killer.
It isn't completely clear to me what is using all the memory. The log says that about 1.6GB is unreclaimable slab. Let's split that usage of zone DMA32 out:
free 140528kB
min 5704kB
low 7128kB
high 8556kB
active_anon 0kB
inactive_anon 136kB
active_file 260kB
inactive_file 1096kB
unevictable 0kB
isolated(anon) 0kB
isolated(file) 0kB
present 2052256kB
mlocked 0kB
dirty 128kB
writeback 0kB
mapped 76kB
shmem 0kB
slab_reclaimable 251148kB
slab_unreclaimable 1629904kB
kernel_stack 904kB
pagetables 2672kB
unstable 0kB
bounce 0kB
writeback_tmp 0kB
pages_scanned 98
Without fully understanding, I'm guessing that slab fragmentation is what kills us here.
NFS needs chunks larger than 4kb (see /proc/slabinfo
).
The system can only reclaim pages of slab if all the slab in the page is unused.
If slab-in-use is scattered over the pages, this may easily not be the case.
This is bad enough if cause by the system caching inodes and dentries alone. To make things worse, NFS also is too old, so we are likely also suffering from bug 1047566.