Friday, April 24, 2009

ZFSing around (or, using ZFS under Solaris)

ZFS is a different file system than the old UFS. Best way to describe it is to think on it as a file system with LVM built on it. If you know LVM, you will understand what I am trying to say and recognize a lot of things here. If not, I promise that soon I will try to talk about it and compare ZFS, Linux LVM, and AIX LVM. For now, accept my simplistic explanation:

Logical Volume Manager, or LVM, is a way the separate the physical storage (think hard drives and networks drives and such things) from the storage seen by the user. In other words, what the users see is a pseudo disk (or logical volume if we are to use its terminology) of sorts which can be made of a collection of other disks, raid arrays, or just partitions of those disks. Now, while Linux LVM provides the logical disk which is then formatted and partitioned as they were normal disks, in ZFS these two steps are done become just one.

I think this will make sense using an example: in this Solaris 10 machine, we have two 73GB SCA hard drives. One of which, c0t0d0 (yes, Sun likes to call their drives differently than everybody else. We will talk about that in some other episode. For now, it suffices to say that d0 is the SCSI disk with SCSI ID 0), was formated as ZFS during the installation of the operating system and currently looks like this:

# df -h
Filesystem             size   used  avail capacity  Mounted on
boot/ROOT/root          67G   6.8G    40G    15%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   1.9G   1.4M   1.9G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
fd                       0K     0K     0K     0%    /dev/fd
boot/ROOT/root/var      67G   727M    40G     2%    /var
swap                   1.9G    32K   1.9G     1%    /tmp
swap                   1.9G    40K   1.9G     1%    /var/run
boot                    67G   176K    40G     1%    /boot
boot/export             67G    20K    40G     1%    /export
boot/export/home        67G    15G    40G    28%    /export/home
# 

Then we have c0t1d0, also a 73GB HD that we plan to add to the hard drive. For this machine, I would like to have the following layout:

/
/var
/tmp
/export/home   (accounts)
/export/hosts  (virtual hosts, in case we have them)
/export/images (as this may end up being a netboot/jumpstart server)

Each of those would reside in a separate partition. Some of that has already been taken care of by Solaris' default install, but we are left with /export/hosts and /export, which will be run off c0t1d0. Before ZFS, we would divide the hard drive using format into two partitions and would edit /etc/vfstab to mount the two partitions at boot time:

# cat /etc/vfstab                      
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/zvol/dsk/boot/swap -       -       swap    -       no      -
/devices        -       /devices        devfs   -       no      -
sharefs -       /etc/dfs/sharetab       sharefs -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
swap    -       /tmp    tmpfs   -       yes     -
/dev/dsk/c0t1d0s4     /dev/rdsk/c0t1d0s4      /export/hosts        ufs     2       yes     -
/dev/dsk/c0t1d0s6     /dev/rdsk/c0t1d0s6      /export/images       ufs     2       yes     -
# 

but, in ZFS things are a bit different. First we need to create the pseudo disk mentioned above. In ZFS terminology that is called a pool; I guess they want to remind us that you usually create it by adding disks to it. So we create the pool, which shall be named storagepool. By now you may have realized that boot is the name of the pool created during the installation of the operating system.

# zpool create  storagepool c0t1d0
# df -h
Filesystem             size   used  avail capacity  Mounted on
boot/ROOT/root          67G   6.8G    40G    15%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   1.9G   1.4M   1.9G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
fd                       0K     0K     0K     0%    /dev/fd
boot/ROOT/root/var      67G   727M    40G     2%    /var
swap                   1.9G    32K   1.9G     1%    /tmp
swap                   1.9G    40K   1.9G     1%    /var/run
boot                    67G   176K    40G     1%    /boot
boot/export             67G    20K    40G     1%    /export
boot/export/home        67G    15G    40G    28%    /export/home
storagepool             67G    19K    67G     1%    /storagepool
#

You may have noticed that the pool we created, storagepool is also a filesystem: as soon as we created the pool, it becomes a filesystem mounted under /storagepool. That is different than what is done under Linux LVM, where you would first create the pool, then partition and format that pool into the partitions you are going to use, and then mount them using /etc/fstab (the Linux equivalent to /etc/vfstab and mount -a. In Solaris and ZFS, on the other hand, that is done with just one command. I mean, we did not even have to edit /etc/vfstab. The current list of ZFS mountpoints defined can be seen using zfs filelist:

# zfs list
NAME                 USED  AVAIL  REFER  MOUNTPOINT
boot                26.5G  40.5G   176K  /boot
boot/ROOT           7.46G  40.5G    18K  legacy
boot/ROOT/root      7.46G  40.5G  6.75G  /
boot/ROOT/root/var   727M  40.5G   727M  /var
boot/dump           2.00G  40.5G  2.00G  -
boot/export         15.0G  40.5G    20K  /export
boot/export/home    15.0G  40.5G  15.0G  /export/home
boot/swap              2G  42.5G    16K  -
storagepool         89.5K  66.9G     1K  /storagepool
# 

If you remember, from storagepool we will be creating /export/hosts. That is done as follows:

# zfs create storagepool/hosts
# df -h
Filesystem             size   used  avail capacity  Mounted on
boot/ROOT/root          67G   6.8G    40G    15%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   1.9G   1.4M   1.9G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
fd                       0K     0K     0K     0%    /dev/fd
boot/ROOT/root/var      67G   727M    40G     2%    /var
swap                   1.9G    32K   1.9G     1%    /tmp
swap                   1.9G    40K   1.9G     1%    /var/run
boot                    67G   176K    40G     1%    /boot
boot/export             67G    20K    40G     1%    /export
boot/export/home        67G    15G    40G    28%    /export/home
storagepool             67G    19K    67G     1%    /storagepool
storagepool/hosts       67G    18K    67G     1%    /storagepool/hosts
#

If we just leave at that, a mount point, /storagepool/hosts, is created. But, we really do not want that. We want to mount storagepool/hosts in /export/hosts, and mount we shall.

# zfs set mountpoint=/export/hosts storagepool/hosts
# 

Now, it seems to be mounted where we want

# df -h
Filesystem             size   used  avail capacity  Mounted on
boot/ROOT/root          67G   6.8G    40G    15%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   1.9G   1.4M   1.9G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
fd                       0K     0K     0K     0%    /dev/fd
boot/ROOT/root/var      67G   727M    40G     2%    /var
swap                   1.9G    32K   1.9G     1%    /tmp
swap                   1.9G    40K   1.9G     1%    /var/run
boot                    67G   176K    40G     1%    /boot
boot/export             67G    20K    40G     1%    /export
boot/export/home        67G    15G    40G    28%    /export/home
storagepool             67G    19K    67G     1%    /storagepool
storagepool/hosts       67G    18K    67G     1%    /export/hosts
#

Neat, huh? There is more we can do, like establish quotas and such stuff. But that will be for another episode.

Some parting thoughts

Remember that Sun call boot and storagepool pools. Even though we've only used one disk per pool, we could have used more. For instance, we could have grouped them together like in Linux LVM,

# zpool create  anotherpool c0t0d0 c0t1d0
# 

or, we could have made a pool where one of the disks mirrors the other, as in a raid 1

# zpool create  anotherpool mirror c0t0d0 c0t1d0
# 

Another option is instead of doing a mirror is to create a raid 5 (Sun claims it is a variation of raid 5, bigger, faster, better) using the raidz option and at least 3 drives (min for such raid),

# zpool create  anotherpool raidz c0t0d0 c0t1d0 c0t2d0
# 

We could also create hotspares for those arrays; check the man page for zpool for more info on that. I hope that will make you think on what ZFS can do and why I think it is rather neat.

No comments: