AIX > Administrator > Performance

Storage Recommendations for AIX: How to Stress and Test Your Storage Server

Stress and Test Your Servers
The storage resources tend to be the biggest bottleneck for many high-end applications and large databases. One of the most common SAN performance problems are related to bottlenecks in the storage subsystem. To identify problems is crucial to have reference points and thresholds; many people ask for best practices for I/O thresholds. However, the best thresholds depends on your storage settings and your workload environment. One of the goals of stressing the storage server is to get an idea of what the system is able to do for your workload environment and establishing reference points. That’s why testing the performance of the storage server is crucial.  
 
Generally speaking, there are two ways to generate workload for your storage server: First, by creating a testing environment resembling the production environment as much as possible and then simulate user work flows; this is the best way, but it could be complex. Second, by using I/O workload generators. These tools produce different kinds I/O loads on the storage, and you can tune and configure these tools to match your application workload. In this article, I’ll discuss some of these I/O generator tools and their main considerations.
 
Typifying Your I/O Workload
The usefulness of using I/O generator tools is to simulate the workloads which resemble your production environment. In general, the I/O activity can be classified in two types:
  • Random: Smaller blocks (8KB to 32KB) and sensitive to the latency. The random workload is more demanding for the storage server because isn’t cache-friendly. Typically, the storage vendors use the random workload for their benchmarks.  
  • Sequential: Large I/O requests (64KB to 128KB or even more) and the data is read in order. Normally, the throughput is important and the latency is not an issue, since the latency increases with larger I/O sizes. You may want to run sequential workloads to test the throughput for new HBAs or SAN switches implementations.
Source: IBMSystem Storage DS8000 Performance monitoring and tuning Redbook
 
The challenge of typifying your workload is that normally the applications may have a mix of sequential and random workloads with different read and write ratios. For instance, some batch process may have a mix of sequential and random workloads with 50/50 R/W ratios. On the other hand, some OLTP database may have 70/30 R/W, while others 50/50. To determine the nature of your I/O, use filemon and check the seek parameter during a representative workload in production.
 
Tools to Test and Stress the Storage Server
The following are some common tools to test and stress the storage subsystem:
 
ndisk:The ndisk can test throughput and stress the disk subsystem to see what it can handle. This tool is included in nstress package, which is available on developerWorks website. The ndisk allows to simulate sequential or random workloads and define I/O request size, R/W ratio and amount of threads. It’s also able to generate I/O to multiple raw LV, hdisks or files. It’s suggested to run the tool with many threads (-M start at 32 or more). Running write tests will destroy the data on the target device. 
 
Example: This test simulated 8K block size (-b), random workload (-R), 100% reading (-r) and 64 processes (-M) using the hdisk0 which is 400G size for 300 seconds (-t):
Ortega1.png

dd and time:  The dd allows to generate sequential workload and test throughput on hdisk, LV and files. The time command reports how long it takes in each operation; calculate the throughput by dividing the amount of data by the real time.
 
Examples: These tests simulate read and write with 1G of data with 128K block size (-bs); the 1G data is made up of 8192 X 128K block size:
 
Read test: time dd if=/dev/rhdisk0 of=/dev/null bs=128K count=8192
Write test: time dd if=/dev/zero of=/dev/rhdisk0 bs=128K count=8192
 
It’s easy to distinguish I/O bottlenecks with dd, because you can compare the before and after results; the difference between the time results allows you to identify problems. Again, the write tests will destroy the data on the target device.
 
DBMS_RESOURCE_MANAGER.CALIBRATE_IO for Oracle:the CALIBRATE_IO is other tool to get references and stress the storage server when using an Oracle databases.
 
It’s not necessary to have deep Oracle knowledge to use this tool. This stored procedure stresses the storage subsystem by issuing an I/O intensive read-only for sequential and random workloads. The random workload is made up of 8Kb blocksize of I/Os to determine the maximum IOPS and the sequential is made up of 1Mb blocksize to determine the MBPS (megabytes of I/O per second) that can be sustained by the storage subsystem.
 
To run this tool, execute the following stored procedure from sqlplus “/as sysdba”:
 
DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
 
It’s suggested to provide the number of physical disks input (DISKS) parameter, since the tool runs more efficiently with that information; the MAX_LATENCY input parameter specifies maximum tolerable latency in milliseconds for the test period.
 
The tool will provide the following output:
  • MAX_IOPS: the maximum IOPS that the database can sustain
  • MAX_LATENCY: the average latency for random workload
  • MAX_MBPS: the maximum MBPS of I/O that the database can sustain in sequential workload
Example:
Ortega2.png

General Considerations for Testing
Before testing the storage server, it is useful to define the different scenarios you want to simulate. Document configurations such I/O size, LVM layout, R/W ratios and type of I/O workload. For instance, you’ll get more IOPS using 100/0 than 70/30 and the performance is normally inferior when mixing sequential and random workloads on the same physical disk. 
 
With regard to the blocksize, consider testing the 4K blocksize, although this I/O size isn’t common in the real world because it’s intended to demonstrate the maximum capabilities for the storage server.
 
Make sure to monitor your environment and gather at least the following metrics from iostat –D: IOPS (tps), response times (avgserv’s) and throughput (can be calculated with IOPS X I/O size). The idea is to get thresholds for your environment by associating the latency for each type of workload. Run each scenario several times to get reliable results and run ndisk test for at least 5 minutes. When running test over filesystems, consider the mount –o CIO option to bypass the file caching and send more stress workload to the storage.
 
Finally, when running performance tests over a new storage server, avoid expecting 10X performance improvements over the applications, just because the test results with these generator tools shows 10X better IOPS performance over the previous storage server. There are other considerations to ponder such as the application need also be able to exploit the new IOPS capacity (enough parallelized).
 
Conclusion
Using I/O generator tool to stress the storage server is useful to identify the reference points of thresholds for your storage equipment. The challenge of using these tools is to replicate the I/O characteristics of your production environment. Sequential I/O tends to use larger I/O sizes, and random I/O uses smaller I/O sizes. Consider using the filemon to determine the nature of I/O of your production workload and monitor. Collect performance metrics such as IOPS, latency along with the configuration when running the performance tests. Finally, avoid expecting 10x improvements just because your stress and test results showed 10x in references with the older storage server—the application has to be capable of exploiting the new IOPS capacity.
 
References:
AIX 7.2 Performance Management Guide
IBM System Storage DS8000 Performance Monitoring and Tuning, SG24-8318-00
IBM FlashSystem V9000 AC3 and AE3 Performance – RedPaper
Best Practices Guide for Databases on IBM FlashSystem
https://www.ibm.com/developerworks/community/blogs/aixpert/entry/using_ndisk64_to_test_new_disk_set_ups_like_flash_systems?lang=en
 



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Achieving a Resilient Data Center

Implement these techniques to improve data-center resiliency.

AIX > ADMINISTRATOR > PERFORMANCE

AIO: The Fast Path to Great Performance

AIX Enhancements -- Workload Partitioning

The most exciting POWER6 enhancement, live partition mobility, allows one to migrate a running LPAR to another physical box and is designed to move running partitions from one POWER6 processor-based server to another without any application downtime whatsoever.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters