The Half-pipe Project

Wednesday, 18 March 2009

Testing Times, Part 2

As a result of not fully understanding the nature of the t:document element in the XProc Test Suite tests I've done a few changes that now treat results containing document-node sequences as sequences of document nodes rather than one document with more than one document element. This is all well and good, but more interesting will be if I can implement an eval step that can take an XProc pipeline as an input, compile it and apply it to the source input. The next step would be to re-cast all the Test Suite tests as XProc pipelines (I haven't checked them all to see if that's possible) and then I think that would be a much better solution for the test suite.

Monday, 16 March 2009

Propagating Errors

Error propagation is always an issue with XSLT. Transformation will fail, and when they do they either give you a rubbish result or nothing at all. For one-off manual transformations then you immediately see the result, but when deep within some service or application it's another matter entirely.

Trapping some classes of error is just a case of some transform logic, but how you get that up to the top (into the result) requires some additional thinking around what your various transformation modes will match against.

If an error is trapped low-down, then generate an error element with message and metadata and ensure the higher level templates can pass that error, maybe augmenting it too so that when it appears in the result, or as the result, you can see where it originated and how it got to the result.

Half-pipe's compiler now supports an hp:error element that is used when it encounters an XProc step that it has not implemented. Templates are generated that will insert the error mark-up into the pipeline and any subsequent steps will allow the error message to pass through to the end of the pipeline.

This has most benefit for the XProc Test Suite report, where the test suite driver transform matches against hp:error elements and generates a fail result for that test, including the message from the error in the report with it.

Friday, 13 March 2009

Testing Times

Half-pipe can now run either individual tests or the whole suite from the XProc Test Suite. There are currently two transforms xproc-tester.xsl and xproc-test-suite.xsl of which the latter imports the former.

One important point to note, the compiler implementation can, if it doesn't understand the pipeline it is processing, generate an invalid XSLT. This, by itself, is not the end of the world, but when Saxon attempts to compile and execute that transform it will throw an exception and without the ability to catch exceptions (I use SaxonB) there is no way to fail gracefully. Therefore, I read from the xproc-compiler.xsl transform which steps have been implemented and then only test those, all other step tests return a fail element with a message saying 'Not Supported'.

The end result of running the whole test suite is an XProc Test Suite report document that can then be uploaded to the website. Just for a bit more fun, and to keep any hard-code data out of the transforms, I think I put the project details into a Description of a Project (DOAP) document and pull the data from there when running the tests.

Thursday, 12 March 2009

Interpretation or Compilation?

My first pass at the XProc processor uses XSLT to compile an 'executable' XSLT that will apply the steps declared in the pipeline to the source document(s). But, whilst thinking about the structure of the compiled pipeline I began to see how I might also implement an XProc interpreter too.

The interesting thing is that the two strategies more-or-less end up applying the same kinds of transforms to the same source content but where one executes the steps on versions of the source document held in auto-generated variables (the compiled version), the other processes the whole pipeline recursively, embedding the results of each step back into the pipeline before moving on to the next step (the interpreting version). I've even been wondering whether I can build a common set of step functions that can be called from either the compiler or the interpreter so that I don't get drawn into supporting two completely separate implementations.

Both have their pros-and-cons, but the really interesting part is discovering which strategy provides the better performance, be it speed and/or memory usage that is going to be the real winner. I've already imagined how I might be able to keep memory usage under control (more on that later), but as for speed of execution, well we'll have to wait and see...

First steps

After some preliminary experiments, I've settled upon a two pass parser that expands the pipeline definition to embed all the imports and explicitly define all the otherwise implied port connections. Also some other handy metadata is added to help the next stage which is the compiler. The compiler takes the expanded pipeline and generate an XSLT transform that, when applied to the source document(s), will generate the desired result document(s). The parser and compiler are separate entities and the parser can run independently of the compiler as can the compiler run independently of its 'driver' transform.

I'm also expecting to be making use of the parser stage for my XProc pipeline viewer that will be generating an SVG representation of the pipeline.

I really do enjoy writing XSLT that generates XSLT, especially when it is done by interpreting some declarative XML language. There's something uniquely satisfying about taking XML, processing it with an XML programming language (XSLT) to generate some new XSLT that will execute the instructions in the original XML document. Many people call them meta-stylesheets, but because we are not styling anything I'll refer to them as meta-transforms.

I have also implemented a 'driver' transform that can take a test document from the XProc Test Suite, compile the embedded pipeline, apply it to the input document and then compare its result with the embedded output document to see if the test has passed or failed. The next step is to make an additional transform that can work across the entire test suite and generate an appropriate report document for uploading to the Results page of the XProc Test Suite website.

Tuesday, 10 March 2009

Introduction

Although I have been aware of the W3C's XML Pipeline Language (XProc) for sometime now, I have only recently been able to experiment with it, and I must say I rather like it. I've been using Norman Walsh's XProc implementation XML Calabash. But, like so many things, you only really get to understand something if you make one yourself. Although, saying that, I don't think that logic extends to people; after all look what happened to poor old professor Frankenstein!

Anyway, of late my opportunities for XSLT work have been few and far between, so I decided I'd have a go at implementing an XProc processor purely in XSLT 2.0. Now it won't escape anyone familiar with XProc that you can't implement all of the XProc instructions purely in XSLT - hence the name of the project is 'Half-pipe', because it's only a partial implementation!

I have so far done some proof-of-concept work around the p:insert, p:delete, p:add-attribute and p:identity steps and I'm rapidly moving towards an initial Alpha release once I've got the XProc Test Suite feeding-in to the project. As for now you can follow my random thoughts on Twitter.