Web Connection : Processing XML in VA Smalltalk

Processing XML in VA Smalltalk
This FAQ describes three methods of processing an XML files in Smalltalk: SAX, DOM, and mapped objects. A sample invoice will serve to illustrate the basics of XML processing. The sample XML file to be processed is an invoice; an XML invoice checker calculates the total cost of the items ordered and compares that total with the total on the invoice. After describing the processing methods, this FAQ sets up a web page where the user can enter (or paste) an XML invoice string into a window and select one of the above processing methods to check the total.
It is assumed that you are familiar with VA Smalltalk, knows how to use the environment and has had some exposure to XML. If you are unfamiliar with XML then there are many good books on the subject. XML in a Nutshell, Third Edition by Elliotte Rusty Harold (Author), W. Scott Means is one example of a good place to begin investigating. Understanding XML is important because many other technologies such as SOAP, SOAP processing, and WSDL, etc. are based on it.
The version of VA Smalltalk used for this FAQ runs on Windows XP but the examples should run on any of the supported VA Smalltalk platforms.
This FAQ mirrors the chapter on processing XML in the book Building Web Services with Java: Making Sense of XML, SOAP, WSDL and UDDI by Steve Graham (Author), et al. except the samples in this FAQ are Smalltalk rather than Java based. Two of the processing methods, DOM and SAX, are handled similarly in Java and Smalltalk. With the last processing method, mapped objects, the book used JAXB (Java Architecture for XML Binding). JAXB uses a schema compiler to generate Java classes from the schema. VA Smalltalk has a set of XML goodies to generate Smalltalk classes from the schema.
The Environment
In order to run the Smalltalk examples which illustrate basic XML processing, it is necessary to the VA: XML Support feature into the development environment.
The XML Basic tools are used to generate Smalltalk code or XML artifacts; these tools are found in the configuration map AbxXmlBasicTools. This depends on the feature ST: Server Smalltalk – Web Services
In order to run the web service interface provided at the end of the FAQ, it is necessary to load the following features into VA Smalltalk:
For information on setting up and using the Web Server Interface, see the VA Smalltalk Web Connection Guide
Bill of materials
This FAQ relies on several files which are found in the processing_xml.zip file. The basic files are invoice.xml and invoice.xsd. Smalltalk tools can generate other files or code. Specifically, the zip file contains
In order to run Smalltalk examples which illustrate basic XML processing, the invoice XML, schema and map files must be in the current working directory or : <vas>/xml of your Smalltalk environment.
In order to run the web service interface provided at the end of the FAQ, it is necessary to place the invoice XML, schema and map files in the default resource directory usually <<vas>/xml where <vas> is the VA Smalltalk installation directory. The invoice checker.html file should be placed in the current working directory of your Smalltalk environment.
The Smalltalk code in the processing_xml.dat file will be discussed in the context it is needed.
The XML File
Before discussing how VA Smalltalk processes XML, this FAQ will examine the XML in more detail.
The following XML expression specifies an invoice with a three item order, taxes of $89.89, shipping of $200.00 and a total cost of $2087.64.
<?xml version="1.0" encoding="UTF-8"?>
<item sku="318-BP" quantity="5" unitPrice="49.95">
<description>Skateboard backpack; five pockets</description>
<item sku="947-TI" quantity="12" unitPrice="129.00">
<description>Street-style titanium skateboard.</description>
<item sku="008-PR" quantity="1000" unitPrice="0.00">
<description>Promotional: SkatesTown stickers</description>
The XML data is found in (invoice.xml in the processing_xml.zip file)
The Data Template
In order to ensure that the XML conforms to a structure, some sort of date template must be applied. The invoice XML refers to a schema, invoice.xsd. However, it is also possible to use a mapping file as a template.
The XML data expression has a corresponding schema (invoice.xsd in the processing_xml.zip file) which enforces the structure of the above invoice. The schema is shown below.
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns="http://www.skatestown.com/ns/invoice"
<xsd:element name="invoice" type="InvoiceType"/>
<xsd:complexType name="InvoiceType">
<xsd:element name="order">
<xsd:element name="item" type="ItemType"
<xsd:element name="tax" type="PriceType"/>
<xsd:element name="shippingAndHandling" type="PriceType"/>
<xsd:element name="totalCost" type="PriceType"/>
<xsd:complexType name="ItemType">
<xsd:element name="description" type="xsd:string"
<xsd:attribute name="sku" use="required">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
<xsd:attribute name="quantity" use="required"
<xsd:attribute name="unitPrice" use="required"
<xsd:simpleType name="PriceType">
<xsd:restriction base="xsd:decimal">
<xsd:minInclusive value="0"/>
Looking at the schema, an Invoice consists of one or more orders. Every item in an order must have sku, quantity and unitPrice attributes. The tax, shippingAndHandling, and totalCost have to be after all the items in the order and in the order shown above.
The example uses a schema rather than a DTD to specify the structure of the data. The industry tends to use schemas because schemas support namespaces whereas DTDs do not. Namespaces are important for two reasons:
XML Notepad , downloadable from Microsoft , can verify the invoice XML against the schema. Simply load the XML into SML Notepad and note the lack of errors. Reversing the totalCost above the shippingAndHandling will generate an error..
Mapping File
Another way of specifying the structure of XML input is as a mapping in a MAP file. The map was originally generated from the schema using the XML Basic Tools as shown below.
The following is the contents of the map. You can cut and past them into a file named invoice.map and place it in the current working directory or in <vast>/xml.
<?xml version="1.0"?>
<!DOCTYPE XmlMappingSpec SYSTEM "abtxmap.dtd" >
<!-- Generated by VisualAge Smalltalk goodie on 2007-12-07- -->
<XmlMappingSpec Name="invoice.map">
<!-- Mapping for element 'InvoiceType' -->
<ClassTypeMapping TypeName="invoice" ClassName="InvoiceType">
<AttributeMapping ClassAttribute="totalCost">
<AttributeMapping ClassAttribute="shippingAndHandling">
<AttributeMapping ClassAttribute="tax">
<AttributeMapping ClassAttribute="order">
<!-- Mapping for element 'ItemType' -->
<ClassTypeMapping TypeName="item" ClassName="ItemType">
<AttributeMapping ClassAttribute="sku">
<AttributeMapping ClassAttribute="quantity">
<AttributeMapping ClassAttribute="description">
<AttributeMapping ClassAttribute="unitPrice">
This map describes a structure in which an Invoice is comprised of one or more orders. The invoice holds the tax, shippingAndHandling, and totalCost. Every item in an order has description, sku, quantity and unitPrice attributes.
This is used in mapped XML processing; generated by Smalltalk tools. The XML is actually converted into Smalltalk objects during parsing.
For convenience, the mapping is included in the processing_xml.zip file as invoice.map.
Adding Data Checking
A schema or mapping file can only verify that the XML conforms to a particular structure. A specialized XML handler can be used to verify that the data values within the XML have the proper relationship. The handler must respond to some API initiating the data checking process and answer success or failure.
The processing of the invoice involves totaling the items ordered, adding in the tax and shipping and handling, and comparing that total with the total on the invoice. If the totals do not match, an error message (string) is returned. There are more sophisticated ways to handle errors such as creating your own ExError error handler but that is beyond the scope of this FAQ.
The processing of the XML invoice is the responsibility of one of three Smalltalk classes: InvoiceCheckerDOM, InvoiceCheckerSAX, or InvoiceCheckerMappedObject, written for this FAQ. Depending on the flavor if XML processing, the proper invoice checker is selected to initiate the checking process via the common API method, #checkInvoice.
All three XML handlers can be found in MyAbtXmlParserApp in the processing_xml.dat file. If you want to exercise the handlers you must import MyAbtXmlParserApp and MyAbtXmlSchemaClassesApp from processing_xml.dat and load them into the image. Alternatively, you may wish to recreate MyAbtXmlSchemaClassesApp from first principles and use it rather than the version provided.
JAXB in Java is equivalent to mapped objects in VA Smalltalk. The InvoiceCheckerMappedObject XML handler is equivalent to JAXB.
XML Processing
Processing XML can be done with only a well formed XML expression. Processing is greatly facilitated by
This FAQ considers three kinds of XML processing: SAX, DOM and mapped object (JAXB) processing. The first two use schema to apply structure; the last uses a map file.
DOM Processing
DOM is the easiest to use method of XML processing, but DOM is memory inefficient because the parser reads the entire XML file into memory and creates a tree. According to the book, XML in a Nutshell, some companies process XML files that are at least a gigabyte in size. Such large XML files would not be good candidates for DOM processing. For this small example though, DOM works perfectly.
The Data Template
Since DOM processing involves reading in an entire XML file, it is possible to look at the structure. The following workspace refers to invoice.xml which holds the XML data described in XML File. The expression generates an AbtDOMDocument from the XML input. As the AbtDOMDocument represents the XML in tree form, it will be called a DOM tree for the rest of the FAQ.
| parser |
parser := AbtXmlDOMParser newNonValidatingParser.
^parser parseURI: 'invoice.xml'
Copy the Smalltalk expression to a workspace and inspect it. You should see a DOM tree the contents of which reflect the XML input: a single three-item order with .tax, shipping and handling and a total cost. At this point, the total cost has not been verified.
To check this, execute the expression self getElementsByTagName: '*' in the inspector on the DOM tree. What is returned is a list of nodes in the DOM tree in the order dictated by the schema file.
<item sku="318-BP" quantity="5" unitPrice="49.95">
<item sku="947-TI" quantity="12" unitPrice="129.00">
<item sku="008-PR" quantity="1000" unitPrice="0.00">
<totalCost> )
The above is the string representing the list of nodes. The tree structure is not maintained, but the order does match the order of the tags in the XML input string.
Data Checking - InvoiceCheckerDOM
The invoice checker first parses the XML data into a DOM tree and then retrieves nodes from the DOM tree in order to check the validity of the XML data passed to it. The method getElementsByTagName: is used by the DOM invoice checker to find nodes with known names. For the DOM invoice checker, the nodes representing the tax, shipping-and-handling and total cost of the invoice contain the required values to confirm the validity of the total cost recorded in the invoice.
You can exercise the data checker the following workspace
InvoiceCheckerDOM example1 inspect. "good data"
InvoiceCheckerDOM example2 inspect. "bad data"
InvoiceCheckerDOM example3 inspect. "really bad data"
Looking forward to using this in a web page, the invoice checker must be able to detect and handle parsing errors. The following Smalltalk expression illustrates how to handle errors in the DOM invoice checker.
[ self domTree: (parser parse: source) ]
when: SgmlExceptions::SgmlException | ExError
do: [ :aSignal | self invoiceError: true ].
SAX Processing
SAX is a little harder to program because of the methods involved but it is very memory efficient because elements are read into memory one at a time, not the entire file.
SAX processing involves the parser making callbacks into your code. SAX is an API description, and developers can use the SAX API callbacks in a subclass as needed. The content handler is one such callback; it is called when the SAX parser has a token such as an element.
The Data Template
Unlike for DOM XML processing, which returns a DOM tree of the entire invoice, SAX XML parsing does not return any comparable object. Hence, the technique of traversing the DOM tree to retrieve the tax, shipping and handling and total cost of the invoice is not applicable for the SAX invoice checker. Instead, the invoice checker, InvoiceCheckerSAX, employs a specialized content handler, InvoiceCheckerSAXHandler.
InvoiceCheckerSAXHandler customizes behavior in the default SAX XML handler, which is responsible for interpreting the elements within an XML string. Here, it collects data needed to validate, i.e. the tax, shipping-and-handling and total cost. In addition to this function, InvoiceCheckerSAXHandler maintains a running total of the items in the invoice.
The workspace below illustrates how the XML data is parsed and how the specialized SAX handler is used to extract the items required to validate the XML data. The file invoice.xml must be in the <vas>/xml directory.
| file parser handler resolver total runningTotal |
handler := InvoiceCheckerSAXHandler new.
resolver := AbtXmlSaxDOMHandler new.
parser := AbtXmlSaxParser newNonValidatingParser
errorHandler: handler;
contentHandler: handler;
entityResolver: resolver.
[ parser parseURI: 'invoice.xml' ]
when: SgmlExceptions::SgmlException | ExError
do: [ :aSignal |
self invoiceError: true.
Transcript cr; show: aSignal argument printString ].
Copy the Smalltalk expression to a workspace and inspect it. You should see an InvoiceCheckerSAXHandler which holds the actual total recorded in the XML data and the calculated total of all items inclusive of tax and shipping-and-handling total cost. At this point the information is in place to verify the XML data.
Data Checking - InvoiceCheckerSAX
While the SAX invoice checker, InvoiceCheckerSAX , is parsing the XML data, it checks for parsing errors and collects the pertinent information in the specialized Sax handler, InvoiceCheckerSAXHandler,.Once this is done, the invoice checker only needs to validate the total invoice amount.
Use the following workspace to exercise the SAX invoice checker.
InvoiceCheckerSAX example1. "Good data "
InvoiceCheckerSAX example4. "Bad data "
InvoiceCheckerSAX example2. "Really bad data "
The SAX invoice checker handles errors by simply returning a string. To get SAX to use a custom error handler, do the following:
| handler errorHandler parser |
handler := InvoiceCheckerSAXHandler new.
errorHandler := MyAbtXmlErrorHandler new.
parser := AbtXmlSaxParser newNonValidatingParser
errorHandler: errorHandler;
contentHandler: handler;
Looking forward to using this in a web page, the invoice checker detects and handles parsing errors rather than generating a walkback. The following Smalltalk expression illustrates how errors are handled in the SAX invoice checker.
[ parser parse: source ]
when: SgmlExceptions::SgmlException | ExError
do: [ :aSignal | self invoiceError: true ].
Mapped Objects
JAXB processing involves converting from XML to Java objects. Smalltalk does not have an exact counter part. However, the techniques outlined in this section have the same effect of converting an XML string to a true object as JAXB processing has in Java.
The Data Template
In developing a moral equivalent to JAXB, it was necessary to construct
a MAP file rather than a Schema to describe the data template.
The Smalltalk classes corresponding to complex data types in the invoice schema are InvoiceType, ItemType and PriceType. . They belong to the MyAbtXmlSchemaClassesApp. Prefabricated classes are found in MyAbtXmlSchemaClassesApp in the processing_xml.dat file in processing_xml.zip To use them, simply import and load To recreate them using the XML Basic tools, refer to the instructions below.
The two important Smalltalk classes are ItemType and InvoiceType. Looking at the invoice XML in the beginning of this FAQ, an ItemType object is created for each order item element in the invoice. ItemType holds the sku, quantity, and unitPrice. The invoiceType class, which corresponds to the invoice i.e. the root element, holds the order, the tax, shipping and handling, and total.
The following Smalltalk expression converts the XML invoice into a DOM tree. To explore the tree, inspect the expression, then execute the expression self getElementsByTagName: '*' in the inspector.
AbtXmlDOMParser newNonValidatingParser parseURI: 'invoice.xml'.
Data Checking - InvoiceCheckerMappedObject
The mapped object invoice checker uses a DOM tree to represent both the mapping and the XML data. The method getElementsByTagName: is used by the mapped object invoice checker to retrieve all DOM elements in the DOM tree generated from the XML data. The wild card parameter facilitates retrieval of all DOM elements. The DOM element names are used in the invoice checker to extract the corresponding Smalltalk class. Two classes are of interest: InvoiceType and ItemType; the former holds the tax, shipping-and-handling and total cost of the invoice, the latter contains the unit price and number of units purchased. Together they contain the required values to confirm the validity of the total cost recorded in the invoice
The workspace below basically captures the activity of the mapped object invoice checker as it converts the data from XML to Smalltalk objects, dom is a DOM tree constructed from parsing the invoice.xml. The workspace assumes that the invoice map and xml files are in the xml directory. The DOM elements are converted into Smalltalk objects whenever a mapping is specified. There is no relationship between the converted Smalltalk objects even though there might have been between portions of the XML data.
| mappingDOM mappingSpec dom root nodeNames coll invoice|
" Create a mapping spec from the invoice map "
mappingDOM := AbtXmlDOMParser newValidatingParser
parseURI: 'invoice.map'.
mappingSpec := AbtXmlMappingSpec fromMappingDOM: mappingDOM.
" Parse the data "
dom := AbtXmlDOMParser newNonValidatingParser
parseURI: 'invoice.xml'.
root := dom rootTag.
"Get unique node names form the DOM document"
nodeNames := ((root getElementsByTagName: '*')
collect: [:e | e name]) asSet.
nodeNames add: dom rootTag name.
"Convert DOM elements to Smalltalk objects.
i.e. Unmarshal the data."
coll := OrderedCollection new.
nodeNames do: [ :name |
| objColl |
objColl := dom mapElements: name using: mappingSpec.
objColl do: [ :obj |
obj notNil ifTrue: [ coll add: obj ]
coll inspect.
The do: loop iterates over the DOM tree elements and collects the objects corresponding to DOM elements with a mapping specification. In the case of the invoice.xml, coll contains an ItemType for each item in the order and an InvoiceType which contains the shipping charges, taxes and total.
Checking the data involves getting the total cost for each item in the order, adding in the tax and shipping and handling from the invoice, and comparing the result to the total held by the invoice.
You can exercise the data checker the following workspace
InvoiceCheckerDOM example1 inspect. "good data"
InvoiceCheckerDOM example2 inspect. "bad data"
InvoiceCheckerDOM example3 inspect. "really bad data"
For more information, review the ‘mapping specification classes’ section in the VA Smalltalk User’s Guide.
The next step is to get the web page to talk to my Smalltalk objects via the WSI interface.
Invoice Checker Web Interface
With the underpinnings of XML processing explained, it only remains to put a face on the invoice checker.
The interface is built using
In order to run the invoice checker web interface, it is necessary to first
You can import and load the MyAbtXmlWebSamplesApp from the processing_xml.dat or you can gernerate InvoiceCheckerForm and InvoiceCheckerWrapper with the instructions below. For more information in WSI see the WebConnection Guide.
Testing the WSI Interface
The next step is to start the WSI interface. This is done from the Transcript, Tools -> Open Web Server Interface Monitor. Press "Start a WSI Server" button and start a WSI server with sst-http transport and 8081 port.
Go to the menu System Transcript -> Tools -> Open Web Browser Launch Options and change Default Prefix from to Specify the internet browser you would like to use. Dismiss the dialog with ‘OK’.
Now it is time to launch the InvoiceCheckerForm. Do this from the Organizer. Select InvoiceCheckerForm and use the context menu item ‘Test’. Testing the InvoiceCheckerForm should open a web browser with the web page pictured at the beginning of this section. If the initial screen is blank, cut and paste the invoice.xml into it to start.
The web interface is below:
VA Smalltalk has two parsers, SAX, and DOM. These can be instantiated as a validating or non-validating parser.
The third way to process XML is JAXB (Java Architecture for XML Binding). JAXB compiles a schema and creates Java classes. In VA Smalltalk, the Basic XML Tools, as shown above, uses a schema file to generate VA Smalltalk classes and a map file. This map file is used for mapping XML to VA Smalltalk objects.
Building the WSI
Generate HTML
The first task is to create the HTML (invoice checker.html in the processing_xml.zip file) and load it into a web browser to see how it looked. The HTML is below:
<HEAD><TITLE>Invoice Checker</TITLE></HEAD>
<h1>Invoice Checker</h1>
<p>This example implements a web form driver for the SkatesTown's invoice
checker. You can modify the invoice on the form if you wish (the default
is modified from Chapter 2), select a SAX, DOM, or Mapped parser and
click the 'Check Invoice' button to perform a check on the invoice
<FORM action="InvoiceCheckerForm" method="POST">
<TEXTAREA NAME="xml" ROWS="20" COLS="90"><% xmlString %></TEXTAREA>
Select parser type:
<INPUT NAME="parserType" type="RADIO" value="SAX" CHECKED> SAX
<INPUT NAME="parserType" type="RADIO" value="DOM"> DOM
<INPUT NAME="parserType" type="RADIO" value="Mapped"> Mapped
<INPUT NAME="SubmitButton" type="SUBMIT" value="Check Invoice ">
<% sessionTagString %>
Generate the wrapper
The next step is to get the web page to talk to the Smalltalk objects. This sample uses the WSI interface, described in the Web Connection User's Guide.
To generate the wrapper part, use VA Organizer, Parts >> generate >> HTML File wrapper. You need only specify the HTML file name (here, invoice checker.html_ and the name for the part. Clicking ‘Generate’ will generate the HTML wrapper part.
Generate the form
To create an InvoiceCheckerForm part, create an HTML form part of type ‘web connection’ by
adding a new Web Connection part named InvoiceCheckerForm inheriting from AbtHtmlFileWrapper in the Organizer and then
Generating the mapping file
XML Basic Tools can generate a mapping from a schema. The Smalltalk expression is as follows
"Translate the schema into an equivalent mapping file"
AbxXmlSchemaToMappingFile new
createMappingFileNamed: 'invoice.map'
from: 'invoice.xsd'.
The tool generates a first apporoximation of the mapping. Examine the file. You will see
<XmlMappingSpec Name="invoice.map" NameSpaceURI="http://www.skatestown.com/ns/invoice">
<!-- Mapping for element 'InvoiceType' -->
<ClassTypeMapping TypeName="InvoiceType" ClassName="InvoiceType">
<!-- Mapping for element 'ItemType' -->
<ClassTypeMapping TypeName="ItemType" ClassName="ItemType">
You must remove the nameSpace URI and change the type names to match the name which appears in the XML data: InvoiceType to invoice and ItemType to item. Once created, this file should be placed in the <vas>/xml directory or the current working directory.
Generating the Mapping Target Classes
It is often helpful to construct Smalltalk equivalents to complex data types specified in a schema. XML Basic Tools has the capability to construct the bare bones of such classes from a schema. . The class names are chosen to be as close to the data types in the schema while adhering to the requirements for naming a Smalltalk class.
Starting with the invoice schema, the XML Tools creates a set of Smalltalk classes, one for each complex type. In the invoice schema file, the type attribute had names like ‘itemType’, ‘invoiceType’ and ‘priceType’. Since the type attribute is used to name the corresponding Smalltalk classes, these are changed to ‘ItemType’, ‘InvoiceType’ and ‘PriceType’.
The classes must reside in an Application or SubApplication, In order to create or modify the classes, the Application must be open. For this example, an application named MyAbtXmlSchemaClassesApp was created.
The following code creates classes from the schema, replacing any classes it generates.
"First create or open a new edition of MyAbtXmlSchemaClassesApp
and make it the default application
Make sure the schema file is located in the <vast>/xml directory. "
AbxXmlSchemaToClass new
createClassesFrom: 'invoice.xsd'
in: MyAbtXmlSchemaClassesApp
replace: true
The above code only creates a class definition subclassed off Object and functioning accessor methods. In general there may be many additions to this class which would benefit the application.
For this example, the following methods should add methods to these objects as well, to facilitate communication with the data checker
totalPrice which would be the quantity * unitPrice.(to ItemType)
Finally, before using mapped processing to validate invoice data, open the composition editor on each of the generated classes and make the instance variables attributes using the public interface editor.(Adding each instancevariable with the defaults is adequate.) When updating the published interface, do not forget to signal in the set selector when an attribute has changed. Save the part If this is not done then instances created from an XML file will not have any values in the object.