PL/I and C/C++ Compilers Step Up to Improve Performance on zEC12
Even as hardware moves toward ever-increasing parallelism to keep strong performance gains flowing, the other partner for application performance—the programming languages and compilers—have to step up to make the gains even stronger.
The latest Enterprise PL/I and XL C/C++ compilers have stepped up to the plate to leverage the new instructions in the zEnterprise EC12 (zEC12). By upgrading the compilers, applications can take full advantage of the zEC12 hardware with increased performance, both automatically and through recoding to exploit new built-in functions.
PL/I on zEC12
The Enterprise PL/I for z/OS V4R3 compiler exploits zEC12 hardware through the addition of the ARCHITECTURE(10) option. This option enables the use of new instructions such as the Decimal-Floating-Point Zoned-Conversion Facility and tuning of the compiler-generated code.
For many years, instructions have existed to convert zoned decimal to packed decimal and from packed decimal to zoned decimal. However, some of these instructions are slow. In addition, because packed decimals cannot be held in registers, optimization is hindered.
The Decimal-Floating-Point Zoned-Conversion Facility adds a new set of instructions that converts between decimal-floating-point (DFP) and zoned decimal. The usefulness of these new instructions seems limited because only a few organizations currently use DFP. However, the compiler can exploit these instructions even in programs that don’t use floating-point data.
In PL/I, zoned decimal is represented by the PICTURE data type; floating-point data is represented by the FLOAT data type.
Converting PICTURE to DFP
The new instructions in the Decimal-Floating-Point Zoned-Conversion Facility will help in programs that use PICTURE and DFP data. For instance, to convert PICTURE to DFP in the following example, the results are different for the ARCH(9) and ARCH(10) options.
pic2dfp: proc( ein, aus ) options(nodescriptor);
dcl ein(0:100_000) pic'(9)9' connected;
dcl aus(0:hbound(ein)) float dec(16) connected;
dcl jx fixed bin(31);
do jx = lbound(ein) to hbound(ein);
aus(jx) = ein(jx);
Under ARCH(9), the loop consists the following 17 instructions:
0060 F248 D0F0 F000 PACK #pd580_1(5,r13,240),_shadow4(9,r15,0)
0066 C050 0000 0035 LARL r5,F'53'
006C D204 D0F8 D0F0 MVC #pd581_1(5,r13,248),#pd580_1(r13,240)
0072 41F0 F009 LA r15,#AMNESIA(,r15,9)
0076 D100 D0FC 500C MVN #pd581_1(1,r13,252),+CONSTANT_AREA(r5,12)
007C D204 D0E0 D0F8 MVC _temp2(5,r13,224),#pd581_1(r13,248)
0082 F874 D100 2000 ZAP #pd586_1(8,r13,256),_shadow3(5,r2,0)
0088 D207 D0E8 D100 MVC _temp1(8,r13,232),#pd586_1(r13,256)
008E 5800 4000 L r0,_shadow2(,r4,0)
0092 5850 4004 L r5,_shadow2(,r4,4)
0096 EB00 0020 000D SLLG 0,r0,32
009C 1605 OR r0,r5
009E B3F3 0000 CDSTR f0,r0
00A2 EB00 0020 000C SRLG r0,r0,32
00A8 B914 0011 LGFR r1,r1
00AC B3F6 0001 IEDTR f0,f0,r1
00B0 6000 E000 STD f0,_shadow1(,r14,0)
Under ARCH(10), the loop consists of only the following eight instructions and runs more than four times faster:
0060 EB2F 0003 00DF SLLK r2,r15,3
0066 B9FA 202F ALRK r2,r15,r2
006A A7FA 0001 AHI r15,H'1'
006E B9FA 2023 ALRK r2,r3,r2
0072 ED08 2000 00AA CDZT f0,#AddressShadow(9,r2,0),b'0000'
0078 B914 0000 LGFR r0,r0
007C B3F6 0000 IEDTR f0,f0,r0
0080 6001 E000 STD f0,_shadow1(r1,r14,0)