AIX > Administrator > Performance

AIX Flash Cache Statistics

Flash Cache

Flash Cache (also known as server side caching) has now been around for some time. The bigger challenge has been obtaining and interpreting performance statistics from flash cache as some of the output is not documented in the main pages. In fact, the best documentation on the fields is actually in the DS8800 easy tier documentation.

Flash cache allows an LPAR to use SSDs or flash storage as a read-only cache to improve read performance for spinning disks. In order to benefit from flash cache the workload must be a primarily read workload that re-reads data once it is cached. AIX will decide which data should be cached based on access patterns.

Once the system is up with flash cache installed and configured, then it is time to start the cache. This is done as follows:

cache_mgt cache start -t all

The command above will start caching for all the hdisks identified as sources for the cmpool. I have 8 x 700GB SSDs in my pool and 88 san disks (20TB) as my sources and it takes quite a while before they are all started. When they are all active you will see the following message:

All caches have been started.

Alternatively, you can start the disks one at a time by typing in: cache_mgt cache start -t hdisk?? replacing ?? with each hdisk name

When the cache is started you will see error log entries similar to the following:

C459CBDD   0528191417 I O hdisk12        AIX DISKDD RD/WR/STRAT SWITCHED TO ETCDD

D5BC7A29   0528191417 I O hdisk12        SAN DISK RD_CACHE IS ENABLED
D5BC7A29   0528191417 I O hdisk13        SAN DISK RD_CACHE IS ENABLED
D5BC7A29   0528191417 I O hdisk14        SAN DISK RD_CACHE IS ENABLED

Finally, you should see a message that the cache has warmed up.

61EC73EF   0529130617 I O ETCACHE        THE CACHE IS NOW WARM

As you can see in this case it took almost 20 hours for the cache to warm up—May 28 was a Sunday so the real workload did not start running until Monday, hence there was nothing to warm up the cache till then. But even on a regular day, it can take several hours for the cache to warm up in such a large environment.

When caching is enabled, read requests for the target devices are sent to the caching software, which checks whether the block is in the cache. If it is, then the disk block is provided from the cache and this is noted as a read hit. If some of the data is in the cache then it is a partial read hit. All other reads and all writes will be sent through to the original disk.

Once caching is up and running you can get statistics using the cache_mgt monitor command.

	cache_mgt monitor get -h -s

The issue with the “monitor get” command is that it reports by source hdisk. This is fine if you have three or four source hdisks, but when you have 88 of them it is a lot of data to look at. It also reports since boot time or when the monitor was last started, which makes it difficult to get point in time statistics. One way around this is to use the undocumented pfcras command which provides shorter term statistics as well as an average over all the hdisks.

I set up a cron job to run at 11.59pm each night that grabs the monitor cache stats and stores them. The command I run is:

cache_mgt monitor get -h -s >>$logit.cachestats.txt
($logit is setup earlier in the script to be a name with date and time in it)

Additionally, I run the following command hourly:

pfcras -a dump_stats >>$logit.pfcras.txt
($logit is setup earlier in the script to be a name with date and time in it)

Jaqui Lynch is an independent consultant, focusing on enterprise architecture, performance and delivery on Power Systems with AIX and Linux.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2017 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Achieving a Resilient Data Center

Implement these techniques to improve data-center resiliency.

AIX > ADMINISTRATOR > PERFORMANCE

AIO: The Fast Path to Great Performance

AIX Enhancements -- Workload Partitioning

The most exciting POWER6 enhancement, live partition mobility, allows one to migrate a running LPAR to another physical box and is designed to move running partitions from one POWER6 processor-based server to another without any application downtime whatsoever.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
AIX News Sign Up Today! Past News Letters