Saturday, November 17, 2007

Extreme XML Programming

I am a proponent of agile development methodologies such as Extreme Programming and Scrum. These methodologies are based on practices such as user stories, iteration (sprint) planning, pair programming, unit test first, refactoring, continuous integration, and acceptance test. Agile programming helps create better software that is also easier to maintain.

Test-Driven Development (TDD) is a well known practice in the Java EE world with unit testing frameworks such as JUnit, EasyMock, and JMock. With TDD, you always write the test code first before you write the functional code itself. This simple principle can also be used in developing applications with XML related languages such as XSLT 2.0, XSL FO, and XQuery. Both XSLT 2.0 and XQuery 1.0 are strongly typed languages with a rich built-in function library. They can also be used to create libraries of custom functions that perform very complex business logic. Adopting a TDD methodology can increase the quality of your XSLT 2.0 or XQuery code and make it easy for you to maintain the code as you refactor and implement new requirements.

The following are my favorite unit testing frameworks:

  1. Tennison-test is an XSLT Unit Test framework that allows you to write your unit tests in XML, but also allows you to automate the execution of the test as part of a build and continuous integration process based on Ant.
  2. XMLUnit 1.1 for Java allows you to make assertions about the differences between two XML documents, the result of an XSLT transformation, the evaluation of an XPath expression, and the validity of an XML document.
  3. Apache FOP provides a LayoutEngineTestSuite which can be used to check against the "Area Tree XML" generated by FOP's XMLRenderer.
  4. Schema-aware XSLT 2.0 and XQuery can help root out errors early. With XML Schema 1.1's assertions and conditional type assignments, this will get even more interesting.
  5. ISO Schematron can be used to write XPath 2.0-based assertions about the output of your XSLT or XQuery program.
  6. xchecker is an interesting testing framework based on XML Schema, XPath 2.0, XSLT 2.0, XQuery, and Relax NG.
  7. xSpec is a Behavior Driven Development (BDD) framework by Jeni Tennison for XSLT testing.

XML IDEs today provide the ability to write XSLT and XQuery code in debug mode my setting breakpoints, stepping through code one line at the time, and inspecting the value of variables. It would be nice if they could also integrate an XSLT/XQuery unit test framework a la Eclipse IDE.

Saturday, November 3, 2007

Aviation Data Management in the Web 2.0 Era

I will make a presentation at the XML 2007 Conference in Boston on December 5. The title of my presentation is: RESTful IDEAS. IDEAS stands for Integrated Documentation Environment for Aircraft Support. Based on AtomPub and OpenSearch, the IDEAS Framework enables federated searches of technical content and updates via web feeds. This presentation is essentially about how Web 2.0 innovations can be leveraged to create a cost-effective, efficient, and massively scalable environment for aggregating and publishing up-to-date technical documentation to end users in the aviation industry.

So what exactly is Web 2.0? Does it mean anything at all? For me Web 2.0 represents new web functionalities such as:

  • Content aggregation and syndication using technologies like RSS and Atom
  • Social networking (e.g. Linkedin and Facebook)
  • User generated content with videos, pictures, blogs, wikis, and podcasts (e.g. YouTube and Flickr)
  • Mashups or the ability to merge data from different sources (e.g. Google Maps and Yahoo Pipes)
  • Rich Internet Applications (RIA) using user interface technologies such as Flex and AJAX
  • Web Services and web-based APIs particularly those using RESTful protocols such as AtomPub

Those are the types of functionalities that today's web users expect from web applications, and the consumers of aviation technical documents are no exception.

Atom and AtomPub are already playing a fundamental role in the Web 2.0 world. The Google Data API which also includes Google Documents is based on AtomPub and OpenSearch. This week, a group of social networking sites including hi5, LinkedIn, MySpace, Ning, Orkut, and XING released OpenSocial which is a common set of APIs for developing social applications across multiple websites. OpenSocial is also based on AtomPub.

So how can the aviation industry embrace and extend Web 2.0 innovations to solve the challenge of aggregating and presenting up-to-date technical data from multiple content sources? I will be sharing my thoughts.

See you in Boston!

Saturday, October 20, 2007

S1000D and SCORM Integration

I gave a presentation yesterday on S1000D and SCORM integration at the Doctrain 2007 conference in Lowell, MA. The main goal of integrating these two specifications is to reduce product life cycle costs and eliminate redundancies by streamlining business processes across the documentation and training functions. I noted that there are many opportunities for data reuse at every phase of the product life cycle including: concept, design, manufacturing, assembly, testing, delivery, and support. Documentation and training belong to the support phase and are often the only departments where content is captured in XML. This is due to the complexity and cost of current specialist XML authoring tools. XForms will allow knowledge workers to contribute knowledge assets in XML at every phase of the product lifecycle with a simple web form. I also believe that Office Open XML (OOXML) offers the opportunity to extract some value out of MS Office documents by exposing their contents to XML processing languages and tools such as XSLT 2.0 and XQuery.

Engineering data should be the trusted source of data for both publications and training. For example, product model data and engineering drawings can be used to create simulation for training and manufacturing assembly instructions can be used to create installation procedures for publications. Any data reuse strategy should look beyond training and publication to identify ways to reuse data and streamline processes across the entire product lifecycle.

The integration of SCORM and S1000D presents management and technical challenges. Since training and documentation are often two separate functions within the enterprise, the integration can have an impact on budget, processes, roles, and the organizational structure. For example, the cross-functional integrated project team is a good approach. The success of the integration will also require top leadership commitment and support. It is also important to address technical challenges such as the integration of existing content management systems (CMS) and learning management systems (LMS).

The first technical approach is to create dual purpose data modules (DMs). I highlighted the importance of clearly defining and documenting business rules, particularly for dual purpose data modules. The business rules should specify among other things the appropriate level of granularity and language style (e.g. Simplified English). These business rules should be validated with technologies such as ISO Schematron and Simplified English Checkers. Since S1000D is weak on learning content metadata, the IEEE Learning Object Metadata (LOM) specification should be used to add learning object metadata to the S1000D dual purpose DMs. It is also possible to package all training data modules as a training publication module (PM). XSLT is then used to transform the S1000D DM into SCORM sharable content objects (SCOs) and learning assets. The S1000D metadata (IDSTATUS) shall be retained in the result SCOs to facilitate product applicability filtering when the SCOs are presented to the learners with an LMS. The S1000D PM can be used to generate the SCORM manifest as well. The dual purpose S1000D DM approach does not always support complex learning interactions and good instructional design principles.

The second approach is to give complete freedom to the instructional designer to design an effective learning experience. All elements in the S1000D data modules that are reusable in SCOs are assigned a unique ID. Examples are: paragraphs, steps, warning, cautions, notes, tables, etc. This can be done automatically using the XSLT generate-id() function. The instructional designer then searches the CSDB to find and display relevant DMs. She can then use XInclude/XPointer to include reusable elements from the DM into the SCO. When this is done, the SCO is automatically updated when the DM is updated.

Future versions of the S1000D specification will incorporate change proposal forms (CPFs) that will facilitate the integration of SCORM and S1000D content.

A copy of my presentation is available here.

Sunday, September 23, 2007

Guidance for the Paperless Cockpit

One of the interesting applications of the Electronic Flight Bag (EFB) is electronic documents. Electronic documents allow aircraft operators to amend manufacturer’s flight operations manuals based on operator's policies and procedures and publish these manuals in electronic formats such as Adobe® Portable Document Format (PDF) and XML. Examples of these manuals are:

  • Flight Crew Operating Manual (FCOM)

  • Quick Reference Handbook (QRH)

  • Flight Crew Training Manual (FCTM)

  • Minimum Equipment List (MEL)

  • Fault Reporting Manuals (FRM)

  • Weight and Balance Manual

  • Dispatch Deviations Guide


US Federal Aviation Administration (FAA) Advisory Circular (AC) 120 76A “Guidelines for the Certification, Airworthiness, and Operational Approval of Electronic Flight Bag Computing Devices” specifies the design and technical criteria for the approval of the human/machine interface of EFB systems. The following is an excerpt of the EFB Operational Evaluation and Approval Job Aid used by FAA inspectors for electronic documents functionalities:

  • Is there a training program on how to display and interact with electronic documents? Is it adequate?
  • Can the crews find the material they are looking for?
  • Is the information organized in a way that makes sense to the crews?
  • Is the information arranged in a consistent way on the screen so that the crews know where to look for specific types of information?
  • Is it obvious when text is out of view? Is it easy to bring that text into view?
  • Can the crew tell where they are in relation to the full document?
  • Can the crew tell where they are in relation to the section of the document they are currently viewing?
  • Is the text of the document easy to read on the screen?
  • Is white space used to separate short main sections of text?
  • Is high priority information especially easy to read?
  • Are tables readable and usable?
  • How are especially long and complex tables handled?
  • Are figures readable and usable?
  • Can the entire figure be viewed at one time?
  • Can the crew zoom in to read details on the figure?
  • Is it easy to move quickly to specific locations (e.g., to the beginning of a section, or to recently visited locations)?
  • Are active regions (e.g., hyperlinks) clearly indicated?
  • Is it easy to move between documents quickly?
  • Is it easy to tell what document is currently in view?
  • Is there a list of available documents to choose from?
  • Can crews search the document electronically?
  • Is the search technique adequate?
  • If animation is supported, does the crew have adequate control over it?
  • Can the crew start and stop the animation as needed?
  • Is there a text description of the animation that describes its contents (so the crews know its contents without running the segment)?
  • Is printing supported? If so, is it adequate?
  • Can crews select a portion of a document to be printed?
  • Is the hard copy usable?
  • Can the crew terminate a print job immediately, if necessary?
These criteria have been developed as the result of research into human factors in the use of electronic documents in EFBs by the Human Factors Division of the Office of Aviation Programs at the Volpe National Transportation Systems Center. Knowing these criteria in advance can help an aircraft operator in preparing for approval. However, I believe that operators can benefit from a more detailed set of specifications in regard to the interface to electronic documents. Section 6.3.1 of the S1000D standard provides rules and guidance for the look and feel, and printed output from an Interactive Electronic Technical Publication (IETP). Section 6.4.1 defines a functionality matrix for IETPs to be used as an aid for defining requirements for S1000D projects. The functionality matrix leverages the US Department of Defense (DoD) long experience in defining class 1 to 5 IETMs with military specifications MIL-PRF-87268 and MIL-PRF-87269. For example, in the area of searching, the S1000D functionality matrix provides very detailed guidelines that go beyond the simple criteria "Can crews search the document electronically?" and "Is the search technique adequate?". The matrix breaks down searching functionalities into:

  • Full-text search
  • User-defined Boolean search
  • Search across multiple databases and files
  • Context search
  • Keyword search

Publishing EFB electronic documents in XML provides many benefits over the Adobe® PDF format. Key enabling technologies for XML-based EFB electronic documents are: ISO Schematron, XSLT, XSL FO, XLink, XPointer, XInclude, and XQuery. For quality assurance, the electronic documents application should be subjected to rigorous unit testing and functional testing before its release to flight crews. A content management system can help an operator by providing features such as workflow routing, versioning, document locking, access control, and full audit trail of modifications made to documents.

The Air Transport Association (ATA) has adopted S1000D as the next generation aircraft digital data standard and there is already a very close collaboration between the ATA and the S1000D TPSMG to harmonize commercial aviation technical data requirements with S1000D. That collaboration should be extended to electronic documents for EFBs to allow aircraft operators to leverage and influence the development of the S1000D IETP functionality matrix for better guidance on creating the paperless cockpit.

Sunday, September 2, 2007

The Business Value of XML

What exactly is the business value of XML and its related technologies such as XML Schema, ISO Schematron, XForms, XSLT, and XQuery? Let's review three use cases where XML adds real value.

The first use case is software configuration and metadata. Java EE Frameworks such as Hibernate, Spring, Struts, and JSF use XML for configuration and metadata. Unfortunately, not all Java EE developers understand the value of configuring their application with XML as opposed to Java annotations. There is currently a backlash against XML with some developers complaining about what they call "XML hell". These developers prefer to keep configuration and metadata closer to the code itself using Java annotations. Newer Java EE frameworks such as the JPA (Java Persistence API) and Struts 2 provide annotation capabilities which are very popular with developers. I personally believe that this is a trend in the wrong direction. Annotations are certainly convenient for developers, but not necessarily to end users of your application. With XML configuration, end users of your software who are not programmers can achieve a certain level of customization on their own by simply editing an XML configuration file in a text editor without importing your SDK in an IDE and compiling Java code or hiring a Java EE developer to do it for them. As a software buyer, when deciding between two competing products, I would choose (everything else being equal) the one that allows me to do some customizations with simple XML files. Developers can reduce the verbosity of their XML configuration files by using techniques such as inheritance, overridable default values, and preferring the use of attributes over child elements for example. A compromise could be to make the annotations overridable with XML configuration.

The next use case is data exchange across organizations. Two example applications that are currently delivering real value are UBL and NIEM. XML vocabularies such as UBL and NIEM define common semantics and data structure trough data dictionaries and XML schemas respectively. In addition, they can specify certain business rules that can be enforced with the use of an assertion-based schema language such as ISO Schematron.

Developed by OASIS, UBL (Universal Business Language) is an XML vocabulary for the exchange of business documents such as invoices, purchase orders, and receipts. In Denmark, the government has mandated the use of UBL invoices for all public-sector billing. The result is over 100 millions euros in savings every year. The Swedish government estimates that it can save 440 millions euros with the adoption of UBL for electronic commerce. Please note that these initiatives involve not only big government and Fortune 500 companies, but hundreds of thousands of SMEs (small and medium size enterprises) as well.

NIEM (National Information Exchange Model) developed by the U.S. Department of Justice and the Department of Homeland Security is an XML vocabulary for the exchange of information between government agencies. For example, it allows law enforcement agencies to quickly exchange information. Law enforcement agencies use heterogeneous applications called RMS (Record Management Systems) and XML data is the bridge between them because it is vendor neutral, cross-platform, and supports structured data of arbitrary complexity. XSLT 2.0 as a generic XML transformation language can play an important role here as well. As an example, an RMS system can export raw XML data which can then be mapped to a NIEM compliant XML Schema by performing an XSLT transformation. If a legacy RMS system can only export CSV (comma-separated values)text files, XSLT 2.0 can up-convert the CSV into a NIEM compliant XML document. It is possible to process XML with a traditional programming language such as C# or Java. However, the problem is the "impedance mismatch" between the type system of these programming languages and a type system based on XML (such as the XML schema type system). Some developers will find XQuery easier to use than XSLT (probably because of its SQL-like syntax) for processing XML data. In addition, XSLT 2.0 and XQuery are declarative processing languages (they describe the "what" as opposed to the "how") and are therefore accessible to many non-programmers.

The last use case is knowledge management in general and content management in particular. In the new global knowledge economy, the most important asset of an organization is its intellectual capital which is acquired and developed by its knowledge workers. That intellectual capital is often captured in documents such as blogs, wikis, emails, PowerPoint presentations, podcasts, engineering drawings, architecture diagrams, ISO 9000 quality manuals, installation and troubleshooting procedures, Microsoft Word documents containing requirements and design specifications, various corporate forms, etc. These mission-critical documents are often dumped into shared network drives. They are not managed with the same rigor and cannot be queried as the data contained in your CRM and ERP systems. The main reason is that these documents represent unstructured data as opposed to the well-structured relational data stored by the RDBMS on which your ERP and CRM systems sit. In some industries, the need to bring content under control is driven by regulatory compliance. In any case, organizations shouldn't wait until their most valuable employees leave before they start thinking about managing their knowledge assets. That's where an enterprise content management system (CMS) and an enterprise portal come into play.

XML goes beyond tags, taxonomy, and content categorization to provide fine-grained content discovery, query, and processing capabilities. With XML, the document becomes the database. First, using XML schema, you can constrain and validate the structure and data types of the content of your business documents just like you do with a relational database schema. Using XForms, you can provide a user friendly interface for your end users to contribute XML content by presenting them with a regular HTML form. Once the content is captured as XML, it can be stored in a native XML database. With XQuery, the native XML database allows you to perform structured queries on the content (as opposed to just full-text or metadata search). XQuery also allows you to assemble content dynamically (for example, build two distinct training manuals for two different configurations of your product from a single source). You can use XQuery to query both relational (from your ERP and CRM systems) and XML data sources and aggregate the results. With XSLT, you enable content adaptation for cross-media publishing (print, web, and wireless) from a single source.

If you decide to manage your product technical documentation with XML, there are standards that can help. The DITA (Darwin Information Typing Architecture) specification is very popular with computer software and hardware documentation. The S1000D standard is designed to support mission critical maintenance and operation documentation in industries such as aerospace, defense, automotive, oil and gas, heavy equipments and machinery, and power generation.

Organizations that have a strategy for managing their knowledge assets using XML and related technologies will have a definitive competitive advantage in today’s economy.

Friday, August 10, 2007

Canada welcomes the C-17

The Canadian Air Force will receive this weekend the first of four strategic lift aircraft-the C-17 Globemaster III.

This is a special occasion for the Canadian Air Force for the following reasons:

  • The men and women of our armed forces who are deployed in dangerous peace keeping operations worldwide deserve the best equipment and logistics support we can afford, and they need them now.

  • Canada needs its own airlift capability to provide quick humanitarian relief and piece keeping operations anywhere in the world when needed.


The C-17 was competing against the EADS A400M. The C-17 was a good choice for Canada because:


  • The C-17 is a proven airlifter and Canada can leverage that experience immediately.

  • EADS revealed last week that the A400M's first flight has been delayed until "the summer of 2008", and that "the consequence on deliveries and cost is under assessment".

  • Finally, I must confess that I've always been in love with the C-17 for its design and record breaking capabilities.


Saturday, August 4, 2007

My russian aviation adventures

Fifteen years ago, in the summer of 1992, I had the opportunity to fly the Tupolev 154 as a flight engineer. I was flying with an Aeroflot crew based in St-Petersburg, Russia. This was part of a flight training program that also included theoretical studies at the Russian Academy of Civil Aviation in Aviagorodok (near St-Petersburg) and practical exercises on a flight simulator.



The Tu-154 did not have a glass cockpit like modern aircrafts, so the flight crew had to be skilled in aerodynamics as well as aicraft systems and engines. For someone who wanted to learn about airplanes, this was the right airplane to fly. The Tu-154 was powered by three Kuznetsov NK-82 turbofans and was a very reliable airplane at the time. The Tu-154 seats about 160 passengers and flight crew of three or four. I was seated on the flight engineer deck. My role was to control and monitor the aircraft's engines and systems (hydraulic, electrical, pneumatic, air conditioning, and APU) during normal, abnormal, and emergency situations. Each flight started with a visual inspection of the exterior of the airplane followed by some systems and engines checks inside the cabin.

The Tu-154 flight manual was a good guide and I was fortunate to have a very professional and experienced instructor during the simulator training. The flights took me to the following cities:

  • Anapa
  • Mourmansk
  • Dnepr
  • Kiev
  • Moscow
  • Simferopol
  • Volgograd
  • Krasnadar
  • Sotchi


All these flights were uneventful and gave me the chance to discover the russian countryside. It was a pleasure to fly with the rest of the crew. They were interested in learning few english and french words from me. My first 6 months in Russia were spent learning to speak, read, and write in the russian language. The total immersion helped learn the language very quickly and I was able to take all the aviation courses and read the aircraft technical documentation in russian.

I wanted to revive my memories of flying the Tu-154 with a PC-based flight simulator. While surfing the web, I found a web site for Project Tupolev Tu-154B2 which is a flight simulator for the Tu-154. In the simulator, the flight engineer panel is very detailed and realistic. It is divided in sub-panels for engine parameters, hydraulic system, electrical system, etc.

Instructions are provided on how to get the engines and systems started, but clearly, my previous knowledge about the aircraft was very helpful. The simulator comes with an english user guide, but the flight deck is completely in russian.

The simulator can be downloaded from the Project Tupolev Support Site.

Sunday, July 22, 2007

RESTful IDEAS

About a year ago, I published a white paper entitled: "Beyond S1000D: an SOA Enabled Interoperability Framework for the Aerospace Industry".

The white paper proposed a framework called "Integrated Documentation Environment for Aircraft Support (IDEAS)" for the interoperability of enterprise content management and publishing systems within the aerospace industry. The goal was to allow new capabilities such as the remote access to library services, cross-repository exchange, cross-repository aggregation, and cross-repository observation.

Global aerospace organizations acquire technical publications from multiple suppliers and business partners. They must address the following challenges:

  • The elimination of the high costs associated with paper libraries and the shipping of physical products such as paper, CDs, and DVDs.

  • The safety and regulatory compliance concerns related to the slow distribution of supplements to field sites.

  • The need for a single point of access to the multitude of technical documentation needed to maintain and operate aerospace equipments.


The IDEAS concept was created to address current inefficiencies in technical data management processes within the industry by taking advantage of Service-Oriented Architecture (SOA) and emerging content management standards such as the JSR 170 Content Repository for Java Technology API.

One the Java EE platform, JSR 170 is enjoying a lot of success in terms of adoption and implementation. In the Open Source world, the Apache Jackrabbit project continues to evolve and there is now a Spring JSR 170 Module to simplify development with the very popular Spring Framework.

For cross-platform interoperability, SOA based solutions have traditionally relied on web services standards such as SOAP, WSDL, and UDDI. However, in today's Web 2.0 world, alternative approaches such as the Representational State Transfer (REST) architectural style and the OpenSearch specification (for federated searches) are getting a lot of attention for their simplicity and scalability.

REST is based on the notion that resources on the web are URI-addressable and that all CRUD (Create, Retrieve, Update and Delete) operations on those resources can be implemented through a generic interface (e.g., HTTP GET, POST, PUT, DELETE). In contrast, RPC-based mechanisms such as SOAP use many custom methods and expose a single or few endpoint URIs. It turned out that the requirements for interoperable enterprise content management systems are more amenable to the REST architectural style.

The resurgence of REST can be felt across the application development landscape. Struts 2 introduced a REST-style improvement to action mapping called Restful2ActionMapper (itself inspired by the REST support in Ruby on Rails). Support for RESTful web applications is been added to JSF through the RestFaces project. REST APIs are also easy to implement with scripting languages such as JavaScript and FreeMarker.

The technical documentation needed to operate and maintain an airline's fleet is supplied by several manufacturers including aircraft, engine, and component manufacturers. Regulatory agencies ( FAA and the NTSB) also publish documents such as Advisory Circulars (ACs), Airworthiness Directives (ADs), and various forms and regulations. If all these organizations expose their content repositories via OpenSearch, then an airline technician will be able to perform a federated search across all those repositories to obtain technical information about particular equipment. The results could be formatted in ATOM to allow the technician to receive updates via web feed.

To expose a library service with a REST-style API, a content management system would typically need to provide the following:

  1. A description of the service including URI templates, HTTP method binding, authentication, transaction, response content types, and response status

  2. The specification of the code (script or Java ) that is executed on the invocation of the URI

  3. Response templates


JSR 311, the Java API for RESTful Web Services will define a set of Java APIs for the development of Web services built according to the REST architectural style.

Sunday, July 15, 2007

XProc: The "Maven" of XML Developers

XProc, the XML Pipeline Language, is currently a W3C working draft that could become for XML developers, what Apache Maven is currently for Java developers.

According to the specification:

"An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output."
Let's see how XProc can be used to generate an S1000D IETP from a collection of data modules. The process typically involves the following steps:

  1. Get the collection of applicable data modules to be included in the IETP by issuing a query (hopefully XQuery based) against the CSDB
  2. Include XML fragments using XInclude
  3. Validate data modules against S1000D XML Schemas
  4. Validate data modules against Schematron schemas
  5. Generate identifiers for elements that will be the targets for links
  6. Generate XLink attributes on elements that will be the sources of links
  7. Generate RDF or Dublin Core Metadata
  8. Transform the data modules from XML into HTML with XSLT
  9. Transform the data modules from XML into PDF using XSLT and XSL FO.
XProc can be used to wire all these steps together, so that they can be executed from a single command. XProc provides the following interesting steps:

  • p:xquery
  • p:xinclude
  • p:validate-relax-ng
  • p:validate-xml-schema
  • p:label-elements (with a unique xml:id)
  • p:rename (renames elements, attributes, or processing-instruction targets)
  • p:string-replace
  • p:xslt2
  • p:xsl-formatter
XProc also allows you to add logic with elements such as:

  • p:for-each
  • p:viewport
  • p:choose
  • p:group
  • p:try/p:catch
Norman Walsh has released an implementation called the XML Pipeline Processor and there is another implementation called Yax. One feature of Apache Maven POM files that I would like to see in XProc is dependency management.

Friday, July 13, 2007

S1000D Business Rules

Having been involved in the exchange and use of digital publications in the aerospace industry during the last ten years, I realize the importance of specifying well defined business rules and most importantly validating the XML documents against those business rules.

The S1000D TPSMG is currently reviewing two Change Proposal Forms (CPFs) that will help S1000D implementers in the area of business rules:
  • CPF-2007-048DE: Business Rules (BR) Categories and Layers
  • CPF-2006-033CA: Schematron for Business Rules

CPF-2007-048DE (written by Victoria Ichizli-Bartels and Mike Day) has proposed the breakdown of business rules into 10 categories as follows:
  1. General business rules
  2. Product definition business rules
  3. Maintenance philosophy and concepts of operations business rules
  4. Security business rules
  5. Business process business rules
  6. Data creation business rules
  7. Data exchange business rules
  8. Data integrity and management business rules
  9. Legacy data conversion, management, and handling business rules
  10. Data output business rules

CPF-2007-048DE also proposed the layering of S1000D business rules and will help implementers in creating a comprehensive and well organized set of business rules for their projects.

The second CPF, CPF-2006-033CA (proposed by myself and accepted for inclusion in S1000D 3.x) suggested ISO Schematron as the mechanism for exchanging and validating S1000D documents against business rules. While ISO Schematron cannot validate all S1000D project specific business rules (e.g. verifying that a paragraph is written according to the rules of Simplified English), it can certainly do an excellent job at providing very valuable reports and diagnostics information about the content of an XML document.

ISO Schematron declares assertions about arbitrary patterns in XML documents and then reports on the presence or absence of these patterns. Schematron schemas use XPath for specifying the node that is the subject of the assertion and for testing the assertion itself.

Very complex assertions can be expressed by using new XPath 2.0 constructs such as regular expressions, conditional expressions, sequence expressions, type expressions, and the extensive function library.

Today, the combined validation power of XML Schema and ISO Schematron and the query and data manipulation capabilities of XQuery have made the maxim "The document is the database" a reality.

Thursday, July 12, 2007

S1000D Core

The lack of extensibility in S1000D is cited as one of its main drawbacks and could be a deterrent to potential adopters. The main strength of the Darwin Information Typing Architecture (DITA) as compared to S1000D is its extensibility mechanism referred to as specialization.

One of the benefits of DITA specialization is that it not only allows users to extend the vocabulary to satisfy their unique needs, but it also enables the reuse of processing code (e.g. XSLT stylesheets) across specializations through a fall back mechanism to base types. The DITA specialization mechanism uses an elaborate scheme based on DTDs and XSLT 1.0.

S1000D should learn from DITA’s experience and success by providing an extensibility framework that allows any party to add extensions that are needed to satisfy their unique requirements. An S1000D extensibility framework will also reduce the number of Change Proposal Forms (CPFs) submitted to the TPSMG by allowing organizations and communities of interest to adopt S1000D without "polluting" the S1000D core specification.

The combination of XML Schema’s element substitution and type inheritance coupled with XSLT 2.0 schema-aware processing facility can provide a more robust extensibility mechanism for S1000D.

Efasoft has submitted a CPF (CPF_2007-006CA ) to the TPSMG to evaluate and implement such a framework.

Information quality in NIEM exchanges

The National Information Exchange Model (NIEM) is emerging as the standard for information sharing between government agencies. At the same time, the issue of information quality has been receiving significant attention. Data exchange initiatives based on the NIEM standard will increase the need for information quality in XML data exchanges. The government has the obligation to ensure information quality in XML exchanges not only for fulfilling its mission of protecting its citizens, but also for protecting citizens’ rights.

Ensuring information quality requires a multidimensional approach based on policy, process, technology, and governance. Standards-based user interface and data validation technologies such as XForms and ISO Schematron can help in improving the quality of the data at the point of user inputs into the systems participating in the exchange.

XForms is an XML application for next generation Web forms. It implements the model view controller (MVC) pattern by splitting forms into three parts: XForms model, instance data, and user interface. The benefits are: strong data typing and validation, less client-side scripting, and device independence. Compared to other modern MVC frameworks such as Struts 2 and Java Server Faces (JSF), XForms is a declarative solution: a complete application can be created without a single line of Java code.

ISO Schematron declares assertions about arbitrary patterns in XML documents and then reports on the presence or absence of these patterns. Schematron schemas are very useful for expressing and validating data exchange business rules since they can define constraints that are beyond the validation capabilities of XML Schemas.

XForms can be generated automatically from an exchange schema or from a WSDL file. The XForms application enforces the constraints expressed in the exchange schema by flagging error messages to the user when the value entered in a field is incorrect. In addition, Schematron rules can be applied to the XForms to enforce business rules on the data entered by the user. When the user submits the form, the application generates XML data that is valid against not only the exchange schema, but also the business rules defined by the exchange. For example, in the context of law enforcement, the interface can validate that the activity date and the subject birth date are valid dates based on the XSD definition. In addition, it will also ensure that the birth date comes at least 18 years before the activity date if the business rules prohibit entering juvenile data into the system. While Struts 2 and JSF have validation features, the combination of XForms and ISO Schematron offers much more powerful validation capabilities.

XForms and ISO Schematron are declarative languages. This allows non-programmers to contribute to the specification of the user interface and business rules for the project. An additional benefit is that the XForms can generate valid XML data directly upon form submission. This eliminates the challenges traditionally associated with mapping relational data to a NIEM compliant format. Today, most implementations use an XML binding framework such as Castor to unmarshall the XML data into Java objects and an Object Relational Mapping (ORM) framework such as Hibernate to persist the Java objects into relational tables. Such a mapping can get unwieldy.

Rather than shredding the XML data in relational tables, the XForms can be integrated with an XML database to store the data natively in XML. This will happen as native XML databases become more transactional and scalable.