The Last Word in File Systems
When developers make a claim like this, it takes serious proof to back it up. I’ve been interested in ZFS ever since the first rumors about it appearing in Mac OS X Leopard, where it was reportedly included, then pulled, then included, then pulled for a Sun developer stealing Steve’s thunder from the keynote, but actually pulled because it wasn’t ready.
And now Apple is officially claiming ZFS read/write support in Snow Leopard, I decided it’s time to test it myself. While source code and binaries are available for the testing code in OS X, I wanted something I could test without going over USB, and something I could use for an extended time to provide some practical benchmarks. And with a new spare hard drive in my Linux server, I looked into the ZFS-FUSE project.
Right now there’s no native support for ZFS in the Linux kernel, and while Sun is “looking into it,” there may never be support due to fundamental incompatibilities between the ODDL and the GPL. Someone could conceivably work on a native Linux port, but until the licensing problems are resolved it will never be included in the mainline kernel. So for now, the only working thing is to go through the Filesystem in Userspace (or FUSE) project, which adds more CPU and RAM overheard. I have no other ZFS usage to compare it against, but as I’ve been using it, I can’t help but think it should be a little faster.
ZFS is a new approach to filesystems. It seeks to bring just about every important feature together and become the end-all filesystem. End-to-end data integrity (no fscking, ever), built in RAID and RAID-like features, the arbitrary creation and destruction of volumes, arbitrary addition and removal of disks, and a mathematical data allocation limit that is so unfathomably huge, it exceeds the quantum limits of earth-based storage, requiring more energy than could be gained from boiling the oceans just to create the bits on which the data would need to be stored. It very well could be the last word in filesystems.
For working with ZFS volumes, there are only two commands to learn: ‘zpool’ and ‘zfs’
To understand what’s happening, it’s important to first learn how ZFS approaches the idea of a disk, a volume, and a filesystem. It’s simple, but it may take a little unlearning of what you already think about storage.
First there is the physical disk itself, as it is represented in your operating system, along with it’s partitions, if any exist.
Linux
/dev/sda First disk
/dev/sda1 First disk, first partition
Mac OS X
/dev/disk0 First disk
/dev/disk0s1 First disk, first partition
Solaris
/dev/c0t0d0 First disk
/dev/c0t0d0s0 First disk, first partition
Typically all you would think about is the partition and the mount point, but ZFS adds another layer to the stack: pools, which are made up of any number or combination of whole disks or partitions, in any RAID configuration you like. And from the pools come your mountable filesystems. Since it’s a popular concept now, think of the pool like a cloud. You can put anything (in terms of disks) into it. And you can get anything (in terms of filesystems) out of it.
Here’s how a typical, one disk, one partition ZFS filesystem is created. From here on, I’m using device names as they are in Linux. Italicized text is my input.
zpool create pool0 sda
That’s it. No other command is necessary to start using your filesystem.
Note the absence of a partition designation. That’s because ZFS understands how to use partitions as well as whole disks, and it doesn’t care how they go into the pool. “pool0″ is just the name of the pool. It could be “tank” or “array” or “box” or any other thing that helps you remember this is both a container of your disks, and a grab-bag for your filesystems. It’s a virtual entity that connects your disks (which could be any number of anything) to your filesystems and their mountpoints (which could be any number of anything). It’s sort of like a logical volume in RAID, but much, much more.
By default, ZFS will create a mount point in / by the name of the pool, so to change this mount point from <code>/pool0</code> to something else, you can do:
<code>
zfs set pool0 mountpount=<i>/mnt/zfs</i>
</code>
Mount points are created automatically if they do not exist. There’s a bunch of other properties you can change, including built in compression.
<code>zfs get all <i>[filesystem]</i></code>
See a list of all properties on a zfs filesystem
<code>zpool status</code>
See the status of your pools, whether they are single drives, striped RAID arrays, or mirrors, and what disks comprise each pool.
<code>zfs list</code>
See total, free, and used space for each zfs filesystem.
<code>zpool iostat</code>
See real time read/write statistics on your filesystems
<code>zpool scrub <i>[pool]</i></code>
Force an integrity check of the entire pool.
<code>ztest</code>
Simulate all sorts of crashes and problems and see how your ZFS copes with it
In the next part, I’ll go through some of the more complex tasks such as creating a RAID-Z, mirroring, and importing/exporting volumes.
