Troubleshooting

The reason for pam_krb5_migrate outputting Unknown code krb5 156 creating principal "joeuser@DOMAIN>COM" is a non-responding kadmind.

The nscd can somehow cause files to show up as owned by nobody. In one case, this was resolved by restarting the nscd. (Thanks, Stefan.)

The clientaddr parameter of mount.nfs is important. For a while, I didn't specify it in /etc/fstab, and most clients automatically detected it, while others used 0.0.0.0 with impunity.

Then, one of the NFS servers, with no Kerberos, suddenly saw its load jump to 50 when 50 clients fetched a file at the same time, prompted by a cron job. The load would be caused by processes waiting for I/O, not shortage of CPU, and while the file to be fetched was only 3kB or so, they would keep waiting for minutes. And on some clients, no files could be read from the share in case, although directory listing could be obtained, as could stat info.

It appeared that all of the clients affected had clientaddr=0.0.0.0 specified, and when they started using the proper address, the problem was over.

Increasing the number of daemons is done like this:


apprentice@nfs-server:~$ sudo rpc.nfsd 16
    

.. it is made persistent in /etc/default/nfs-kernel-server:

# Number of servers to start up
RPCNFSDCOUNT=8
<snip>
	  

Whether this is necessary can be judged from


    
cat /proc/net/rpc/nfsd|grep ^th
th 32 340905478 475057.300 442289.504 184302.020 1.312 95448.776 60302.696 43222.208 42993.392 0.000 224481.668
    

  

The last ten numbers form a histogram showing number of seconds at percentage full over all threads. In this case, 224481.668 seconds were spent at 90-100% full, so increasing the number of threads would seem justified. (Even though twice as much time was spent at 0-10% and 10-20%, it is peak load per thread that we want to reduce.)