close window

Print print

A Traditional Approach to a Modern Technology

V5R4 simplifies the use of XML with RPG programs

One of RPG's traditional strengths has been that it provides a simple interface to complex technologies. For example, those who've programmed interactive applications using CICS* or CCP have to appreciate the relative simplicity of the interface provided by EXFMT. More recently, RPG has perhaps lost a little of this complex-made-simple focus. Rather, the power of the language has been increased through features such as expressions and subprocedures. The most recent enhancements have moved RPG IV into the realm of free-form languages (V5R1), added powerful facilities for interfacing with Java* (V5R1) and strengthened its data-definition capabilities (V5R2). Certainly the Java interface was welcomed by many of us and has opened the door to some exciting technologies such as the capability to create spreadsheets using the Jakarta POI classes. However, the word "simple" wouldn't come to mind when describing this interface.

With V5R4, RPG IV returns to its roots and steps up to the challenge of providing a simple interface to a modern technology-XML. Generating XML is rarely a problem, or at least not to the degree that it requires direct language support. XML can be generated within an RPG program by stringing the various delimiters and text values together, or using the template approach offered by products such as CGIDEV2, which was originally designed for the production of Web pages. And this is to name but two of the available options. Parsing and processing XML documents is a different matter, however. It can be a complex process and one to which RPG isn't ideally suited. It's this latter area that the new features address.

Those who have yet to be hit by the requirement to process XML documents will really appreciate these new features when your time comes-and come it will as XML usage becomes more widespread. Those who have already had to deal with XML documents have adopted many approaches from writing their own XML parsers to using Java or other parsers such as the C-based ExPat. There's no need to throw this work away, but you may still find the new RPG IV features easier to use for new tasks.

While assistance with XML processing is undoubtedly the major feature of the V5R4 release, there are also many other interesting features, which we'll also cover here. Since XML is the hot topic of the day, we'll start there.

XML Support

XML support is provided by two new opcodes, XML-SAX and XML-INTO, which work in conjunction with two new built-in functions (BIFs), %XML and %HANDLER. Although these new opcodes, like the traditional EVAL, are available for use in both Extended Factor-2 and /Free syntax, the full functionality of XML-INTO is only available in /Free because of those pesky column limits again. The problem arises because XML-INTO is eight characters in length, so within the 10-position opcode area, there's no room for an operation extender, and as you'll see later XML-INTO supports both the E(rror) and H(alf-adjust) extenders.

XML-INTO-In its simplest form, XML-INTO parses an XML document extracting a single element directly into a variable. It can be a single variable or an array. Most often, however, it's used to extract several elements and use the values extracted to populate a data structure (DS) or DS array. This is the approach to use when the number of items to be retrieved is known. Later we'll examine a variant of XML-INTO that can be used to process XML documents of indeterminate length.

With V5R4, RPG IV returns to its roots and steps up to the challenge of providing a simple interface to a modern technology-XML. Generating XML is rarely a problem, or at least not to the degree that it requires direct language support.

Code Sample 1 shows a small subset of an XML document containing customer information. For the purposes of demonstrating the basic functionality of XML-INTO, we'll assume that this entire document is contained in the variable XML_Input.

The basic syntax of the basic variant of XML-INTO is:

XML-INTO { (E H} } variable %XML( xmlDoc { : options } );

Note that both the E and H operation extenders may be utilized. Using the E extender causes the %ERROR BIF to be set on if any errors are encountered while parsing the XML document. The need for the H extender is a little less obvious. It's used when we require any numeric values that are extracted to be subject to rounding during the character-to-numeric conversion process.

As noted previously, the possible values for the "variable" parameter range from a simple variable to an array to a DS to DS arrays to a nested DS ... well you get the picture. We'll examine examples of some of these options. The important thing to note is that the parser is looking to match the names and nesting levels of the field(s) in the receiver with the element and attribute names encountered in the XML document.

%XML-The %XML BIF is used by both XML-INTO and XML-SAX. It identifies the XML document's source and specifies any options to be applied to the processing. The options that can be specified include how white space within an element is to be treated and whether the names of elements and attributes within the document will be in upper-, lower- or mixed-case. Some of the other uses of the options parameter will be explained as we work through the examples. (Note: Although all of our examples show the option parameter as a literal, it can be variable or any character expression.)

Example (D) in Code Sample 2 demonstrates the simplest usage of XML-INTO. The first parameter identifies the field to receive the parsed data, and we use the %XML BIF as the second parameter to identify the XML document's source. In this case, that's the field XML_Input. Execution of this code would result in the value 25 being placed in the field recordCount.

The second example (E), demonstrates the extraction of many fields from the XML data. All of the fields within the DS will be populated with the corresponding data from the first Customer element in the XML document. Notice that the nesting structure of the DS matches that of the XML document itself. In fact, for the process to function correctly, it must match the structure of the XML document. Notice that we've also specified the 'path=...' option. This is required to direct the parser to the correct starting point (i.e., to bypass the <RecordCount> element).

The more observant among you may have noticed that the type field in the customer DS represents an attribute rather than an element. From the parser's perspective, there's no difference between the two. Any attribute of an element is treated as being at the same level as the elements contained within it; type, therefore, appears at the same level within the customer DS as company.

Example (F) demonstrates how to handle repeating "records" within XML. It will process each of the Customer elements in turn and assign the values to consecutive elements in the customer array. No count of the number of elements used is provided; the programmer must determine this from the field content. In this example, you could test the company field, which would be blank for all unused entries. To ensure that this kind of testing is possible, issue a RESET or CLEAR operation against the DS prior to executing the XML-INTO.

In this example, we don't need to use the 'path=...' option because the structure of the XML document matches the hierarchy of the customers DS. Two new options are used in this example-'allowextra=yes' and 'allowmissing=yes'. The first informs the compiler that it's acceptable to have XML elements or attributes that have no corresponding field in the receiver DS. In this case, it accommodates the recordCount element. The second signals that it's acceptable if some fields in the DS aren't represented by XML elements or attributes. This could occur simply because the data isn't present in the XML stream or because the element is optional and therefore may not be present in all entities within the document. In our example, it would also allow the XML document to contain less than the 99 elements specified in the customer array.

The last example (G) performs the same task as the previous one except that the receiver variable is defined as customers.customer. We have also specified the 'path=...' option to steer the parser into the correct portion of the XML document, although in this example it isn't strictly required as the parser would "find its own way." However, we think that including it makes things a little more obvious for those who follow us. As in example (F), the parameter 'allowmissing=yes' is required in this example because the customer array has capacity for 99 elements but there may be less than 99 present in the XML document.

So far, all of our examples assume that the XML data is contained within the single field XML_Input. Sometimes this will indeed be the case, particularly if the data was obtained from a Web service. For example, we may have received the data as a result of querying a shipping company's Web site to obtain tracking information about a package. But of course there will also be many occasions when we receive the XML data in a file. To process such files we must supply the file's location to the parser. To do this we specify the option parameter 'doc=file'. This tells the compiler that the first parameter to the %XML BIF identifies an IFS filename. If we were to copy our sample XML document into an IFS file named Customers.xml in the directory Partner400, we could code the %XML BIF as:

%XML( '/Partner400/Customers.xml': 'doc=file')

Alternatively, if we want to process several different files, we could load the file's path information into a variable and then specify that variable as the first parameter to the BIF.

%HANDLER-All of our examples so far assume that the number of elements in the XML stream is known. Or that at least it will be within the maximum number of elements that an RPG IV array can define (32K). What if that number isn't big enough? That's where the %HANDLER BIF comes into play.

When handling an undefined number of elements with XML-INTO, we must specify the %HANDLER BIF in place of the receiver variable. The BIF identifies a user-defined procedure that will be called every time the receiver variable is full. The syntax for %HANDLER is:

%HANDLER (handlingProcedure : communicationArea )

The first parameter identifies your handling procedure by specifying the name of its prototype. The second parameter identifies a communications area, which is provided so you can indirectly pass a parameter to your procedure. This communications area can be of any data type and size. It will be passed on to your handler procedure as the first parameter.

The actual receiver variable type is specified by the second parameter of your handling procedure's prototype. The receiver variable must be an array and must be a read-only parameter (i.e., it must be specified with the DIM and CONST keywords in the prototype). A third parameter is also passed to the handler in the form of a 4-byte integer (10I)-this represents a count of the number of elements passed in the second (i.e., the receiver) parameter.

Because the receiver variable must be an array, there's one other notable difference to using %HANDLER compared with specifying a receiver variable. The 'path=...' option must be specified to direct the parser to the correct starting point within the XML document.

Now that we know the basics of using %HANDLER, let's apply it to our example in Code Sample 2. As you see at (H), the customer array contains 99 elements. Let's suppose that we need to code for the possibility that there are more than 99 customer elements in the XML document. We would do this by modifying example (E) as shown in Code Sample 3.

ProcessCustomers is the name of our user-defined handling procedure. The second parameter, custCount, is a simple 10I field that we'll pass to the handler so the handler can keep a count of the total number of customer elements processed. If we weren't interested in passing any such additional parameter, we could simply define a variable with the name null (or something similar) and pass that. Remember that, as we noted earlier, the 'path=...' option is essential to direct the parser to the correct starting point in the XML document. The prototype for our handling procedure might look like Code Sample 4. Notice that the second parameter to our subprocedure, custs, is defined as LIKEDS(customer). The definition of this parameter is used by the compiler to determine the structure of the receiver variable.

Since this article is intended simply to provide an overview of the new XML support, details of the possible return values to be supplied by the handling routine, possible exceptions during processing, etc., will have to wait for another day. As we get more exposure to these new features, we intend to provide detailed articles in the magazine and future editions of i5 EXTRA, the magazine's monthly newsletter. For now, we'll simply say that it's the responsibility of the user procedure to process the data in the receiver variable and then return control to the parser, which will continue processing until the array is filled again or the XML document has been completely processed.

XML-SAX-The XML-SAX parser requires an entire article of its own to do it justice. We won't be supplying an example here since there's no such thing as a simple example of using XML-SAX. While simple in name and essence, the actual programming task is far more complex than for XML-INTO. The basic syntax of XML-SAX is:

XML-SAX{(e)} %HANDLER(eventHandler : commArea )
             %XML(xmldocument { : saxOptions } );

As you can see, like XML-INTO, it utilizes the %XML BIF to specify the document and processing options. It also uses the %HANDLER BIF to identify the user-defined subprocedure that will process the events. The difference is that XML-SAX will pass control to the user procedure each time it identifies an event in the XML document. By "event" we mean when the parser locates the beginning or end of an element, an attribute, or an item of element or attribute data, among other things. In each case, the user's event handler is informed of the type of event (e.g., begin, end, etc.) and the value of the associated item (e.g., the name of the element involved). It's up to the user program to determine how and when to process the data. That means that your code must do a lot more work than it does with XML-INTO. The payoff is that you get far more granular control and can access several attributes within the XML headers that XML-INTO is unable to process.

Just a few examples of the events notified (and the new RPG reserved words that identify them) are:

    • Beginning of element (*XML_START_ELEMENT)
    • Element data (*XML_CHARS)
    • End of element (*XML_END_ELEMENT)
    • Attribute name found (*XML_ATTR_NAME)
    • Attribute data (*XML_ATTR_CHARS)
    • End of attribute (*XML_END_ATTR)

Those who have experience working with other XML parsers, such as Scott Klement's port of the Expat XML parser (www.ScottKlement.com), will find this model familiar and should be able to adapt your existing programs to the new RPG support if you so choose.

Other V5R4 RPG Features

This release includes several other new features, and we'll touch on them briefly later. Before we do, we want to introduce another new "traditional" RPG enhancement.

Evaluate corresponding fields (EVAL-CORR)-It feels as if we've been waiting a long time for this capability. In fact, the wait has been relatively short; since it was only with V5R1's introduction of qualified DSs that such a feature became truly useful. Until V5R1, RPG lived in a world where a field name couldn't appear in more than one DS. If we wanted to have copies of data in two DSs, we had to come up with a naming convention to differentiate the two versions of the field.

That limitation was removed with the introduction of the keyword QUALIFIED in V5R1. Specifying this against the DS name allows us to use the same field name in multiple structures. In Code Sample 5, you'll see that the field custName appears in both the custData and printData1 structures. This is possible because printData1 is defined with the keyword QUALIFIED. The field custName in the custData structure is simply referred to by its name. To reference the version in printData1, we use the qualified name printData1.custName. This is a useful feature and certainly preferable to making up new names for the fields in printData1. But, when typing code to copy the data, such as that shown at (A), it's hard to escape the feeling that there must be a better way. With V5R4 there is.

The "better way" is the new EVAL-CORR opcode shown at (B). It allows us to replace the individual series of EVALs shown with a single statement. The compiler works out which fields to move by matching the field names (i.e., it works out which ones CORRespond), and sets up the EVALs automatically for us. In cases where the field name matches but the data type doesn't match exactly, the compiler handles the conversion providing that the base data type matches (e.g., the two fields are both numeric). For example, the field custNo in the custData DS is a packed field, but in the printData1 DS it appears as a zoned numeric. The compiler will make the adjustment for us. The simple way to think of it is: if it worked when we hand-coded the EVALs ourselves, then it will work with EVAL-CORR.

Any fields present in the source DS that don't appear in the target are simply ignored. Similarly, a field is ignored when its name matches, but the data type isn't compatible. This is illustrated at (C). The field zip won't be copied since the source field is packed and the target field is character.

Fields such as division, which appear in the target DS but aren't present in the source, are ignored and their content remains unchanged.

As you might expect from an opcode named EVAL-CORR, the basic rules for data movement are the same as they would be for a simple EVAL (i.e., if the corresponding variables are multiple occurrence DSs, then only the current occurrence of each is used; if they're arrays, only the number of elements in the smaller of the two will be copied). Similarly if the target of a character field assignment is too small, truncation will occur. If the target of a numeric assignment is too small then overflow may occur based on the variable's content. If you have any doubts as to which fields are affected by the EVAL-CORR operation, check the compiler listing, which identifies all of the fields affected by the operation.

Full syntax checking-One new feature that many RPGers will embrace with open arms is the provision of full syntax checking within /free-form coding from within SEU, CODE and WebSphere* Development Studio Client (WDSc). Since this has been a regular source of complaint on many of the iSeries* mailing lists recently, it should make many people happy.

New keywords-Greater granularity is available in the debugging aids generated for the program through the use of new keywords. The most interesting of these are *INPUT and *XMLSAX. When *INPUT is specified, fields that appear only on the input specifications are still read into the program's fields, allowing their content to be examined during the debug session, even when the fields are otherwise unused. The *XML-SAX keyword causes an array of SAX events names to be generated into the module. These can be used to assist in the debugging of SAX event handlers.

Null-field indicators-Those who use null-capable fields in their databases will be happy to see additional support for null-field indicators. One aspect of this is the ability to specify the OPTIONS(*NULLIND) prototype parameter. When used, the called procedure is given access to the passed DS's null-field map in addition to the data. New support is also now available for null fields when debugging. Debug support for null-capable subfields in DSs has been problematic. The new support allows null indicators to be referenced by prefixing the appropriate field name with the string _QRNU_NULL_variableName. For example:

EVAL _QRNU_NULL_DS1

would list the status of all null flags associated with the data structure DS1. Whereas,

EVAL _QRNU_NULL_DS1.FIELD1

would simply display the status of the null flag associated with field FIELD1 in the qualified DS DS1.

Scratching the Surface

While the number of new operation codes and BIFs added to the RPG IV language in V5R4 appears relatively small, they're powerful in function and scope, so much so that we've barely been able to scratch the surface of their usage in this article. We'll return for more explorations in future pieces.

Authors' note: We wish to thank George Farr and Barbara Morris of IBM's Toronto Laboratory for their assistance in the preparation of this article. In particular, we thank Barbara for her endless patience in providing examples and explaining the nuances of the XML support. Any errors remaining in this article are entirely ours.