ZIL is designed to log all file system system calls. The data logged can be used to replay them to the file system on disk in the event of a system crash or power outage. By design ZFS is always consistent on disk with every operation done as a transaction. ZIL works in this model to ensure that data is always consistent. Think of other journaled file systems like UFS and ext3.
This is all fine and dandy, except for when you need to write synchronously to disk, such as an O_DSYNC write or more importantly in our case, an NFS COMMIT. In both cases a filesystem needs to guarantee data integrity for the application or NFS. It is generally known (not necessarily accepted) that for a local file system that a power outage or system crash can result in data loss. Using file system calls like O_DSYNC and fsync guarantee that data is committed and flushed to disk.
As Roch of ZFS fame puts it:
"On the other hand, the nature of the NFS protocol is such that the client _must_ at some specific point request to the server to place previously sent data onto stable storage. This is done through an NFSv3 or NFSv4 COMMIT operation. The COMMIT operation is a contract between clients and servers that allows the client to forget about its previous historical interaction with the file. In the event of a server crash/reboot, the client is guaranteed that previously commited data will be returned by the server. Operations since the last COMMIT can be replayed after a server crash in a way that insures a coherent view between everybody involved.
But this all topples over if the COMMIT contract is not honored. If a local filesystem does not properly commit data when requested to do so, there is no more guarantee that the client's view of files will be what it would otherwise normally expect."
Performance of ZIL is critical for for these synchronous operations. ZIL needs to commit the transactions to storage before it can return. In the case of small files, this results in an excessive amount of blocking.
In my case I have users who are writing millions of small files (under 5KB.) NFS over ZFS in this circumstance offers unacceptable performance.
Testing
To replicate the type of operations seen in my environment I created a tar file containing 4096 4KB files. The files were generated randomly and extracted to filesystems. All tests were done with caches flushed and empty, filesystems were unmounted and re-mounted after each run. Tests were done against a Sun X4540 w/48 500GB SATA HD's in a single zpool comprised of 8 raidz's. Tests were also ran against a NetApp FAS3160 with 126 144GB FC HD's (ok, maybe not a fair comparison but the benchmark results speak for themselves.) Tests were then ran locally against single 72GB SATA (UFS) and 72GB SAS (ext3) HD's.
Results
ZFS over NFS: 165.42s
NetApp WAFL over NFS:6.006s
ext3: 0.318s
UFS: 2.703s
ZFS over NFS w/ZIL disabled: 4.987s
Disabling ZIL
By disabling ZIL we can see a 33x performance improvement over NFS with small files! But what does disabling ZIL actually do? It ignores all synchronous operations and instead treats them as asynchronous. Much like exporting a NFS filesystem as async in Linux; ZFS will reply to NFS that the operation has been committed to disk. This breaks all NFS garauntee's that a file has been safely written.
ZIL Slogging
There is the ability to place ZIL on a separate log device, such as SSD or NVRAM. Unfortunately this doesn't offer many options for those who can't accommodate additional disks in the form of SSD's. PCI Express NVRAM boards are also hard to come by, with uMem/VMETRO producing boards only for OEM's. The MM-5453CN looks like a good fit for a Thumper but it does not appear to be supported in Solaris (the old PCI only MM-5435CN is.) Not that it matters, I can't find one for sale anywhere.
Sun themselves use slogging on their 7000 series storage appliances to offload ZIL, which really begs the question why they're not offering an accelerator board that one can use in a Thumper. It would really make the X4500/X4540 an even better storage box.
Conclusion
While Solaris and ZFS are completely adhering to the NFS protocol it offers unacceptable performance in certain circumstances. The Thumper offers an incredible bang-for-buck storage server solution and could really benefit from a NVRAM card and slogging. I just wish I could find a card to install in the system. Other costly alternatives might be to use a FC HBA and a RAMDISK device. Chris Greer has tested the idea of using a ramdisk slog mirrored over iSCSI.
Disabling ZIL may or may not be a solution for your environment, so tread with care. As for me, I'll be leaving ZIL disabled until I come up with a better solution for my Thumper's who only serve data over NFS.

