Improving Directory Import Rate Through ZFS Caching

January 27, 2010

As we all know, the process of importing data into the directory database is the first step in building a directory service. Importing is an equally important step in recovering from a directory disaster such as an inadvertent corruption of the database due to hardware failure or an application with a bug. In this scenario, a nightly database binary backup or an archived ldif could save the day for you. Furthermore, if your directory has a large number of entries (tens of millions) then the import process can be time consuming. Therefore, it is very important to fine tune the import process in order to reduce initialization and recovery time.

Most import tuning recommendations have focused on the write capabilities of the disk subsystem. Undeniably, it is the most important ingredient of the import process. However as we all know, the input to the import process is a ldif file which is used to initialize and (re)build the directory database. As demonstrated by our recent performance testing effort, the location of the ldif file is also very important. I’ll mainly concentrate on ZFS in this post as time and again it has proven to be the ideal filesystem for the Directory. Note in some cases, you can save hour’s of time by even the smallest gain in the import rate. Especially if your ldif file has tens of millions of entries.

Generally speaking there are few gotchas that need to be kept in mind for
the import process. First thing is to ensure that you have a separate partition for your database, logs and transaction logs (this is actually true for any filesystem). For ZFS this translates into separate Pools. Similarly it is recommended to place the ldif file on a pool that is not being used for any other purpose during importing. This maximizes the read I/O for that pool without having to share it with any other process. In ZFS, the Adaptive Replacement Cache (ARC) cache plays an important role in the import process as seen in the table below. ZFS caches can be controlled via the primarycache and secondarycache properties that can be set via the zfs set command. This excellent blog explains these caches in detail. To understand and prove the effectiveness of these caches we ran few tests of imports on a SunFire X4150 system with ldif files of 3 million and 10 million entries each. The ldif file was generated using the telco.template via make-ldif. Details about the Hardware, OS and ZFS configuration and other useful commands are listed in the Appendix.

Dataset primarycache
secondarycache Time taken (sec) Import Rate
3 Million all all 887 3382.19
metadata metadata 1144 2622.38
metadata none 1140 2631.58
none none 1877 1598.3
all none 909 3300.33
10 Million all all 3026 3304.69
metadata metadata 3724 2685.29
metadata none 3710 2695.42
none none 7945 1258.65
all none 3016 3315.65

The table shows the results of various combinations of primarycache and
secondarycache on the ldifpool only. The db pool where the directory
database is created always had primarycache and secondarycache set to
all. The astute reader will observe from the Appendix that the
ZFS Intent Log (ZIL) is actually configured on flash memory. This
did not skew our results as we are concerned with the ldifpool (where the ldif file resides).

So going back to the table, as expected the primarycache (ARC in DRAM)
is obviously the key catalyst in the read performance. Disabling it causes a catastrophic drop in the import rate primarily because prefetching also gets disabled and a lot more reads have to go to the disk directly. The charts below (data obtained via iostat -xc) depicts this very clearly as the disk are lot busier in reading when the primarycache is set to none for the 3 Million ldif file import.

So far, I have concentrated on discussing the primarycache (ARC). What
about the secondarycache (L2ARC)? Typically the secondarycache is utilized optimally when used with a flash memory device. We did have flash memory device (Sun Flash F20) added to the ldifpool, however our reads were sequential and by design the L2ARC does not cache sequential data. So for this particular use case the secondarycache did not come into play as evident by the results in the table. Maybe if we limited the size of the ARC to just 1GB or less, the pre-fetches would have “spilled” over to the L2ARC and hence the L2ARC would have contributed more.

Finally a disclaimer, since the intent of this exercise is to show the effect of ZFS caches, the import rate results in the table are for comparison and not a benchmark. And i would also like to thank my colleagues who help me with this blog. These specialists are Brad Diggs, Pedro Vazquez, Ludovic Poitou, Arnaud Lacour, Mark Craig, Fabio Pistolesi, Nick Wooler and Jerome Arnou.


	zm1 # uname -a
	SunOS zm1 5.10 Generic_141445-09 i86pc i386 i86pc

	zm1 # cat /etc/release
	                       Solaris 10 10/09 s10x_u8wos_08a X86
	           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
	                        Use is subject to license terms.
	                           Assembled 16 September 2009

	zm1 # cat /etc/system | grep -i zfs
	* Limit ZFS ARC to 6 GB
	set zfs:zfs_arc_max = 0x180000000
	set zfs:zfs_mdcomp_disable = 1
	set zfs:zfs_nocacheflush = 1

	zm1 # zfs set primarycache=all ldifpool
	zm1 # zfs set secondarycache=all ldifpool

	zm1 # echo "::memstat" | mdb -k
	Page Summary                Pages                MB  %Tot
	------------     ----------------  ----------------  ----
	Kernel                     189405               739    2%
	ZFS File Data               52657               205    1%
	Anon                       184176               719    2%
	Exec and libs                4624                18    0%
	Page cache                   7575                29    0%
	Free (cachelist)             3068                11    0%
	Free (freelist)           7944877             31034   95%

	Total                     8386382             32759
	Physical                  8177488             31943

NOTE: The system had three ZFS pools.  The “db” pool for storing the directory database and striped across 6 SATA disks with the ZIL on a flash memory. The “ldifpool” pool was were the ldif file, transaction and
access logs were located.  In the import process the transaction and access logs are not used therefore the pool was entirely dedicated to the ldif file.

	zm1 # zfs get all ldifpool | grep cache
	ldifpool  primarycache          none                   local
	ldifpool  secondarycache        none                   local

	zm1 # zpool list
	db       816G  2.25G  814G     0%   ONLINE  -
	ldifpool 136G  93.0G  43.0G    68%  ONLINE  -
	rpool    136G  75.6G  60.4G    55%  ONLINE  -

	zm1 # zpool status -v
	  pool: db
	 state: ONLINE
	 scrub: none requested

	        NAME        STATE     READ WRITE CKSUM
	        db          ONLINE       0     0     0
	          c0t1d0    ONLINE       0     0     0
	          c0t2d0    ONLINE       0     0     0
	          c0t3d0    ONLINE       0     0     0
	          c0t4d0    ONLINE       0     0     0
	          c0t5d0    ONLINE       0     0     0
	          c0t6d0    ONLINE       0     0     0
	          c2t0d0    ONLINE       0     0     0

	errors: No known data errors

	  pool: ldifpool
	 state: ONLINE
	 scrub: none requested

	        NAME        STATE     READ WRITE CKSUM
	        ldifpool    ONLINE       0     0     0
	          c0t7d0    ONLINE       0     0     0
	          c2t3d0    ONLINE       0     0     0

	errors: No known data errors

	  pool: rpool
	 state: ONLINE
	 scrub: none requested

	        NAME        STATE     READ WRITE CKSUM
	        rpool       ONLINE       0     0     0
	          c0t0d0s0  ONLINE       0     0     0

	errors: No known data errors

	ds@dsee1$ du -h telco_*
	  48G   telco_10M.ldif
	  14G   telco_3M.ldif

	ds@dsee1$ grep cache dse.ldif | grep size
	nsslapd-dn-cachememsize: 104857600
	nsslapd-dbcachesize: 104857600
	nsslapd-import-cachesize: 2147483648
	nsslapd-cachesize: -1
	nsslapd-cachememsize: 1073741824


Shrinking Windows Disk: A huge challenge!

September 8, 2009

Usually i don’t blog about Windows issues since i don’t use it. However recently i bought a Laptop for my father. It had Windows Vista Ultimate Home on it. The hard disk is 250GB in size so i wanted to partition it into smaller ones so that i could have “D” drive for data and possible slap Ubuntu on it too. The problem was that there was one cluster of data right at the end of the partition that could not be moved by defrag or any of the commercial available Defrags out there.

So first i did the usual tricks i.e. set page file to zero, disabled system restore, disabled dumps, disabled hibernation. Deleted all the related files etc.

Then i ran defrag. Still no joy. Then i ran defrag.exe from command line with -w switch. Still no joy.

Then i downloaded a commercial utility called O&O Defrag. This utility still did not move the file(s). But it did help to identify the name which was “$Extend/$UsnJrnl…”

Further research reveled that this journal file was actually being used by the windows indexing service. So naturally i disabled the indexing service. This did release/delete some of the journal file but a small cluster of them still remained. I could not figure out what application was using them.

Then i attempted to use “fsutil usn deletejournal /D C:” command from a System Administror command prompt. I would always get “Access Denied”.

So i downloaded PEbuilder and created a Windows XP SP3 BartPE disk. I booted from the disk and then i ran “fsutil usn deletejournal /D C:” again. This time the command worked since the journal was not opened by any process.

I rebooted and ran a free defrag utility called Auslogics Disk Defrag. Everthing now consolodated to my liking and i was able to resize the partition to my hearts content!

OpenSSO 8 & SAML v2 AttributeStatement

September 7, 2009

A very useful and essential feature of OpenSSO is to allow attribute mappings.  This enables you to send addtional attributes in the SAMLv2  assertion/response to the Service Provider.  Once the attribute mapping is defined (can be done either from the GUI under the entities “Assertion Processing” tab or in the metadata itself), the map is sent as a name-value pair to the Service Provider.  Also keep in mind that the mapping can and should be defined on the remote service provider so that if your hosted IDP is shared amongst multiple SP’s, each can have their own mapping.  For example here the map was defined from the GUI as USERID=employeeNumber for one of the remote SP’s.

<saml:AttributeStatement><saml:Attribute Name="USERID"><saml:AttributeValue xmlns:xs="" xmlns:xsi="" xsi:type="xs:string">121898</saml:AttributeValue><saml:AttributeValue xmlns:xs="" xmlns:xsi="" xsi:type="xs:string">007</saml:AttributeValue></saml:Attribute></saml:AttributeStatement>

Once the Service Provider receives the assertion and has been configured to look for the attribute name USERID, it will grab the value and do whatever it needs to.  One such real life example is CRM.  In OpenSSO 8 Express Build 8, there is a wizard to support easy configuration of federation with which results in a map definition automatically.

One problem that i ran into (not related to the product, phew..) was that however many maps i defined i could not see them in the assertion.  As a matter of fact i could not even see the <saml:AttributeStatement> tag.  Turns out that earlier i had modified the Authentication->Core setting from Profile=required to Profile=ignored.  Reverting back to Profile=required fixed the issue and the assertion started to pass the attributes.