Home / How to debug and troubleshoot NFS in Linux

How to debug and troubleshoot NFS in Linux

There are a number of metrics and data that can be collected when an NFS problem is identified. This article shows some of these common steps that can find the root cause, by breaking NFS issues down into four major categories as follows and describing a troubleshooting process for each.

  • Reading or Writing data performance issues
  • Protocol Issues
  • Connectivity Issues
  • Kernel Crash

Reading or Writing data performance issues

Define a repeatable benchmark that runs for at least 15 minutes. Try using a subset of the actual NFS I/O workload as a benchmark. Capture the following statistics on the NFS client as follows.

nfsiostat

Run the command nfsiostat to monitor the performance of NFS reads and writes.

date > nfsiostat.txt; nfsiostat 5 NFS-MOUNT-POINT >> nfsiostat.txt

The output will show the ops/s, kB/s and latencies for reads/writes. It shows the data travelling via the network to the NFS Server and how long it takes. Lets say, NFS write performance is poor, it looks like this is due to a network or NFS server latency because avg RTT should be much lower.

write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
14.000 11410.775 815.055 0 (0.0%) 1107.043 80432.143

Where avg RTT and avg exe are;

"avg RTT": "the network + server latency of the request"
"avg exe": "the total time the request spent from init to release"

/proc/meminfo

When the NFS I/O involves WRITE operations, look at Dirty, Writeback and NFS Unstable values. You could capture something like this every 5 seconds to correspond to the nfsiostat capture times.

while true; do date >> nfs_meminfo.txt; grep -E "(Dirty|Writeback|NFS_Unstable):" /proc/meminfo >> nfs_meminfo.txt; sleep 5; done

Where;

Dirty: Memory which is waiting to get written back to the disk
Writeback: Memory which is actively being written back to the disk
NFS_Unstable: NFS pages sent to the server, but not yet committed to stable storage

The Dirty count will increase when there is data to be written to a device. When a threshold is met it will start writing the dirty pages to a device and Writeback will increase accordingly. If write I/O ceases, then Dirty count will keep dropping as Writeback occurs. Afterall the Dirty count can drop to zero while there is still some Writeback left to be completed.

iostat

Capture CPU and IO device usage;

iostat -cxt 5 >> nfs_util.txt

tcpdump

On the NFS client, capture a representative sample of traffic for approx. 5 minutes between the NFS client and NFS server.

tcpdump -n -C 250M -s 300 -i INTERFACE -w OUTPUTFILE host NFS-SERVER-IP

The above tcpdump command only collects the first 300 bytes of every packet (-s 300).

The above tcpdump command "rolls over" into separate 250Mb files (-C 250M). To save uploading every one of the "rolling" output files, the timespan which each file covers can be find out with the capinfos tool from the wireshark package.

$ capinfos -ae OUTPUTFILE.pcap.10
Start time: Wed Jul 16 17:25:21 2014
End time: Wed Jul 16 17:26:28 2014

mountstats

This command is useful in obtaining NFS mount options, buffered versus direct IO, RTT latency per RPC procedure, backlog latency per RPC procedure.

  • Copy the raw mountstats before.

cp /proc/self/mountstats mountstats_raw_before_$(date +%s).txt

  • Run the reproducer
  • Copy the raw mountstats after;

cp /proc/self/mountstats mountstats_raw_after_$(date +%s).txt

The mountstats command can provide you a summary of NFS activity between the two captures.

mountstats -f mountstats_raw_after_* -S mountstats_raw_before_*

SysRq-t

It is better to capture the stack traces of the kernel processes to identify if and where processes are getting blocked.

Then do the following a few times, perhaps 3 times in 5 second intervals.

echo t > /proc/sysrq-trigger

Install iperf3 from EPEL,

  • On the NFS server execute the following command;

iperf3 -i 10 -s

  • On the NFS client execute the following command;

iperf3 -i 10 -w 4M -t 60 -c NFS-SERVER-IP

Repeatable Benchmark

If there is no repeatable benchmark, dd can be used to artificially generate a load as follows;

Write

Remember to use conv=fsync for write benchmarks. e.g.

dd if=/dev/zero of=/mnt/nfs/file conv=fsync bs=1M count=1000

You can also use oflag=direct to compare cached I/O and direct I/O

Read

Make sure to drop caches or unmount and mount the filesystem to ensure data is being read from the NFS server and not the NFS client's cache. As an example;

echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/nfs/file of=/dev/null bs=1M count=1000

Protocol Issues

It is important to identify the time/date when an error occurs. That helps us focus on a smaller period within the data provided to us. Do not expect someone to look through hundreds of MBs of data if a time/date is not given.

If applicable, make sure you capture data both when the problem can be reproduced and when the problem can not be reproduced. This allows comparing the two sets of data to identify the differences.

Capture the following data on the NFS client.

tcpdump

When it is not a performance issue involving the transfer of a lot of Read or Write data, first capture a tcpdump.

tcpdump -n -C 250M -s 0 -i INTERFACE -w OUTPUTFILE host NFS-SERVER-IP
gzip OUTPUTFILE

strace

Capture and strace as follows;
strace -T -tt -f -v -s 4096 [-o OUTPUTFILE] <command>

If using CentOS 6.7 or 7.0 or later, add the -yy flags.

mount

If you are having difficulties mounting an NFS export on the NFS client. Then run mount with the verbose option.

mount -vvv [USUAL-OPTIONS] NFS-SERVER:/EXPORT NFS-MOUNT-POINT

NFS user space daemons (ps -e | grep -E 'rpc\.')

Debugging/verbosity can be enabled by editing /etc/sysconfig/nfs as follows;

RPCMOUNTDOPTS="-d all"
RPCIDMAPDARGS="-vvv"
RPCGSSDARGS="-vvv"
RPCSVCGSSDARGS="-vvv"

RPCMOUNTDOPTS and RPCSVCGSSDARGS are applicable only to the NFS Server. RPCGSSDARGS is applicable only to the NFS Client.

To get debug output between rpc.nfsd and kernel nfsd;

RPCNFSDARGS="-d"

Ports

  • Use rpcinfo -p to dump the list of RPC programs registered with the portmapper.
  • Use 'rpcinfo -t/-u' to actually test connectivity with an RPC program running on a remote host. For example to test connectivity with mountd via UDP (which is the default for the MOUNT protocol) running on 'server.computeman.com':

# rpcinfo -u server.computeman.com 100009 3
program 100009 version 3 ready and waiting

  • Inspect /etc/sysconfig/nfs to see if ports have been modified.
  • Ensure portmapper is running on Port 111. Including with NFSv4.

rpcdebug

The nfs/nfsd/sunrpc debugging output can be useful after looking at a tcpdump and having some idea what is going wrong.
Note: This debugging dumps a lot of data to syslog (/var/log/messages) and can slow down older systems. Therefore, requesting this data is used only if you are confident in what you are looking for.
  • Enable debug as follows;

rpcdebug -m nfs -s all
rpcdebug -m nfsd -s all
rpcdebug -m rpc -s all
rpcdebug -m nlm -s all/code>

  • Disable debug as follows;

rpcdebug -m nfs -c all
rpcdebug -m nfsd -c all
rpcdebug -m rpc -c all
rpcdebug -m nlm -c all

On an NFS server, the relevant modules are nfsd, rpc and nlm. On an NFS client, the relevant modules are nfs, rpc and nlm.

  • To get a list of modules and valid flags, run.

rpcdebug -vh

Connectivity Issues

Check for dropped packets

It is very important to first check if there are problems with the NIC and the NIC driver.

tcpdump

If the NFS Client is reporting the error, "nfs: server [...] not responding", then use the tcpdump-watch.sh script attached below.

Collect tcpdump on both the NFS client and NFS server while reproducing the problem.

On the NFS client run;

tcpdump -n -C 250M -s 300 -i INTERFACE -w OUTPUTFILE host NFS-SERVER-IP

On the NFS server run;

tcpdump -n -C 250M -s 300 -i INTERFACE -w OUTPUTFILE host NFS-CLIENT-IP

The above tcpdump command only collects the first 300 bytes of every packet (-s 300).

The exception to this is when troubleshooting either NFSv4 issues, or READDIR/READDIRPLUS issues, or issues with batched NFS commands. In those cases, capture the full packet length with the -s 0 option instead of -s 300.

The above tcpdump command "rolls over" into separate 250Mb files (-C 250M). To save uploading every one of the "rolling" output files, the timespan which each file covers can be determined with the capinfos tool from the wireshark package.

$ capinfos -ae OUTPUTFILE.pcap.10
Start time: Wed Jul 16 17:25:21 2015
End time: Wed Jul 16 17:26:28 2015

Map warnings/errors in syslog to packets in the tcpdump according to their timestamps.

SysRq-t

It is better to capture the stack traces of the kernel processes to identify if and where processes are getting blocked.
Then do the following a few times, perhaps 4 times in 6 second intervals:

echo t > /proc/sysrq-trigger

Kernel Crash

If the NFS Client or NFS Server Linux kernel has crashed / panicked / oopsed, then check the vmcore file. If a vmcore file was not generated, check the oops message from syslog.

Leave a Reply