How to Exploit the Vector Packed Decimal Facility in IBM z14
With the IBM announcement of the IBM z14 platform, new features became available, including a new instruction set for the decimal numbers using the Vector Facility.
By Flavio C. Buccianti01/01/2018
With the IBM announcement of the IBM z14 platform, some new features became available. Among them are a new instruction set for the decimal numbers using the Vector Facility.
The original Vector Facility was used until ESA/390, but a redesigned Vector Facility was introduced with the IBM z13, or ARC(11). The new Vector Facility implements 32 new registers of 128 bits each one, 139 new instructions and the ability to handle data in a Single Instruction Multiple Data (SIMD) model. (Read more in IBM z13 Vector Facility Works to Accelerate Features.)
The IBM z14 architecture (ARCH(12) ) implements the Vector Packed Decimal Facility and it introduces the support of the new instruction set: vector decimal instructions. Those instructions make use of the 32 128-bit registers to perform native packed decimal arithmetic.
Previously, to handle that kind of data, a set of still-existing instructions had to use in-memory data to perform the calculations. The difference using the new set of instructions is that they use the data in the registers, eliminating the overhead required to fetch the data from different levels of cache or real memory, which will in turn slow the execution of those instructions. Also, the register operations are usually faster than operations that requires memory manipulation.
Format of the Signed Packed Decimal Numbers in Vector Registers
The signed packed decimal data is represented with two digits or two nibbles in binary decimal format for each byte. Each nibble or half byte range from 0000 to 1001 (0 to 9) being the rightmost half byte (S) reserved for the sign (Figure 1). So each byte represent two digits.
The preferred representation for the positive sign is the hexa C, or the binary representation of 1100. The negative sign is represented by the hexa D, or 1101 in binary.
Each Vector Register can hold up to 31-digits (D) number plus a sign (S).
If the number is shorter than 31-digits a binary zeros fills all the left digits, this means that the number is always right aligned into the Vector Register.
The Vector Packed Decimal instructions can’t do SIMD operations with the decimal data.
Some examples on how the signed packed decimal numbers are represented:
Value Comp-3, hex Into the vector register +0 0C 0000000000000000000000000000000C +1 1C 0000000000000000000000000000001C +32 03 2C 0000000000000000000000000000032C +473 47 3C 0000000000000000000000000000473C +6189 61 89 4C 0000000000000000000000000006189C -1 1D 0000000000000000000000000000001D -6189 61 89 4D 0000000000000000000000000006189D
With this new set of instructions, legacy existing programs can get a boost in performance when handling speedup decimal native numbers.
Together with the z14 announcement, IBM also announced new releases of three largely used languages in Z: COBOL, PL/I, XL C/C++ and Java.
New Vector Decimal Instructions
The new Vector Decimal Assembler Instructions provided by the Vector Packed Decimal Feature in z14 are fully described in the IBM z/Architecture Principles of Operations (POP) manual - SA22-7832-11. The new instructions are:
VECTOR ADD DECIMAL VECTOR COMPARE DECIMAL VECTOR CONVERT TO BINARY VECTOR CONVERT TO DECIMAL VECTOR DIVIDE DECIMAL VECTOR LOAD IMMEDIATE DECIMAL VECTOR MULTIPLY DECIMAL VECTOR MULTIPLY AND SHIFT DECIMAL VECTOR PACK ZONED VECTOR PERFORM SIGN OPERATION DECIMAL VECTOR REMAINDER DECIMAL VECTOR SHIFT AND DIVIDE DECIMAL VECTOR SHIFT AND ROUND DECIMAL VECTOR SUBTRACT DECIMAL VECTOR TEST DECIMAL VECTOR UNPACK ZONED
In previous COBOL versions that supported ARCH(11) and below, packed decimal arithmetic was performed using data in memory or in some cases converting the data to Decimal Floating Point (DFP). In both solutions this is time consuming and affects the performance of the final application.
COBOL V6.2 for z/OS introduced the support for ARCH(12) that can be configured as a parameter for compilation of programs, allowing them to take the advantages of the z14 instructions to use the Vector Packed Decimal Facility. The programs compiled with this option in effect will perform native instructions using registers instead of memory and this will eliminate the overhead converting data back and forth between packed decimal and DFP.
A recent presentation from Tom M. Ross of IBM at SHARE after COBOL V6.2 was announced shows some impressive performance numbers:
- A COBOL V6.2 program compiled with ARCH(12) in a 100 million times in a loop is 4.85x faster than COBOL V4 with 80 percent less CPU usage. It was 2.91x faster than a program compiled with ARCH(11).
- In a Zoned Decimal Computation, a program compiled using ARCH(12) is 3.05x faster than COBOL V4. Programs compiled with ARCH(12) are 1.74x faster than those compiled with ARCH(11).
The good point on this for COBOL is that there’s no need of changes in the source program. To take advantage of this improvement, use ARCH(12) for recompilation and run it in a z14 CPU.
The PL/I language also can take advantage of the Vector Packed Decimal Facility.
With the recent announcement of Enterprise PL/I for z/OS V5.2, the ARCH(12) could be exploited by new and existing programs. In particular the advantage can be seen in calculations in some PICTURE and FIXED DECIMAL calculations.
There’s also no need to change the source program, the new routines will be incorporated in the running program if it’s compiled or recompiled with the new ARCH(12) option.
An article on IBM DeveloperWorks mentions that the performance improvement expected in PL/I applications that are CPU intensive could reach up to 40 percent, with a 10 percent on average better performance just by using the new compiler version with the ARCH(12) option during compilation and running on an IBM z14 machine.
V2.3 of XL C/C++ also provides support for the Vector Packed Decimal Facility via an ARCH(12) option to generate code that uses the new instructions available on z14. To accomplish that, the compiler must have in effect the ARCH(12) and the VECTOR options. A new TUNE(12) option was also added to generate code that is optimized for z14 processors.
In IBM Java V8.5, the Java VM is capable to detect when it is running in a z14 machine and transparently exploit these new instructions provided by the Vector Packed Decimal Facility, improving the application performance due to the register to register data handling of these new instructions.
You can also use the Data Access Accelerator (DAA) library when your code manipulates native data. Because Java language doesn’t support operations on native data such as primitive types and binary coded, these structures are converted into Java objects before the operations is executed. For example: binary-coded decimals are converted to BigDecimal or BigInteger Java objects. This transformation process takes time and resources and also puts pressure on the Java heap.
By using DAA APIs provided, the applications can avoid the unnecessary object creation and intermediate processing of the data, use available hardware acceleration and remains platform independent.
When your application is running in an IBM Z environment the SDK recognizes the com.ibm.dataaccess.DecimalData.addPackedDecimal method and can exploit the z14’s Vector Packed Decimal Facility instructions to improve the application execution.
Better Performance With No Application Changes
With this new implementation of the Vector Packed Decimal Facility on z14, almost all or even new applications that need to handle native decimal data can get an enormous performance advantage with no need to change the source code. All that’s needed is to use the newest compilers with the appropriate ARCH(12) option active during the compilation and run in a target z14. Just keep in mind that programs compiled with the ARCH(12) can not run in previous architecture levels 11 or below.
Flavio C. Buccianti retired from IBM where he worked for 38 years mainly with IBM Z, AS/400, Power and OS/2 Internals support for large customers. More →
Sponsored ContentAchieve Compliance Without Impacting Productivity
Post a Comment
Note: Comments are moderated and will not appear until approvedcomments powered by Disqus