Thursday, September 6, 2007

augur

There is a very active thread on muse-dev right now about how to fix or workaround the lack of thread safety in Apache Xerces, which is the XML parser used by Muse. In XmlUtils.

Yes, that XmlUtils.

Ruh-roh, Shaggy.

All of the concerns raised over this issue are valid, and the people who are working to understand and solve the problem are all users who have deployed Muse in real world projects. I trust them to get it right, it's just a bit nerve-racking to watch people consider an API change that will touch the code in over six dozen places. Muse 2.1 has already shipped as part of a WebSphere product, and I can only imagine the number of conference calls that will ensue if their Muse-based applications break when they upgrade to 2.3.

That said, I'm really stoked to see such disparate parties working together to solve the problem - this is why we wanted to have a community in the first place! I wish I could invest the same amount of time towards resolving this issue, but I can't do two jobs at once.

Definitely nerve-racking.

Labels: ,

overlooked

The Apache Muse code base includes a collection of DOM-based convenience methods that help us parse and construct XML fragments without a lot of DOM API boilerplate. These methods are included in an incredibly large class named XmlUtils and represent the collective knowledge, mistakes, and advice of a half dozen developers who have worked on WS-* technology for over three years. Changing any of the methods in XmlUtils could leave the Muse build totally broken; despite the fact that it's been extracted from the core engine into a small library, XmlUtils is very much a core piece of code because of how heavily dependent on it the rest of the project is.

Any piece of code that is so central to a project is bound to have one or two (or ten) ugly hacks to handle edge cases, bugs in dependencies, and backwards compatibilty. XmlUtils is fairly hack-free, but it does have some code that I consider... unsavory. In my opinion, the most cringe-worthy code snippets are the ones that need to accept an XML element and iterate over its child elements. The only way to get all of the direct children of an element is to call Node.getChildNodes() and save those Node objects that are actually Elements; such code involves one or more if blocks that check the concrete type of the Node objects and ignore the undesired ones:
if (nextNode.getNodeType() != Node.ELEMENT_NODE)
continue;
Today these checks are common, but when I first wrote the code I allowed myself to make a number of assumptions because it was only used for traversing schema-validated SOAP messages. Then, one day, we added a configuration file[1], and all of a sudden the input was much less predictable. One of the first bugs I had to fix was the one caused by comments in the configuration file; XML comments become their own DOM nodes - different from Element or Text - and my code failed when comments were added in places where I expected a child element. My final fix eliminated comments from the DOM tree all together, but I kept the conditionals mentioned above on the off chance that we encountered XML pre-processor nodes or CDATA nodes. I would not be fooled again!
DocumentBuilderFactory factory = 
DocumentBuilderFactory.newInstance();

//
// we don't need comment nodes - they'll only
// slow us down
//
factory.setIgnoringComments(true);
This bug, and others like it, were part of my education into the more pedantic corners of XML Land. Over the course of three years I learned many things about XML, and while many of them were logged in my brain and never used again, all of them helped to drive home the message that XML-related issues were never as simple as they first seemed. When reading or writing XML documents or schemas, great care must be taken to ensure that edge cases and ambiguity are not hiding in the bushes. And as much as I malign the DOM API, it does a pretty good job of alerting you to the fact that XML processing in a real world system is never as easy as more user-friendly APIs would have you believe.

Now, despite four excruciating paragraphs on the internals of Muse, this post is actually about frustration I've had with some of my Zero-related work. I had to create some custom Dojo widgets this week, and after a few hours of searching the Internet for proper instructions[2], I started to make some progress: I had a widget "class", a widget template, and an HTML page that was loading the widget class and calling its initialization routines. The only problem was that nothing was showing up in the page - I had picked off all of the initialization errors, and yet there was no Dojo-inspired HTML to be seen.

Just as I was about to break down and open a DOM inspector to try and read through the eighty-seven levels of HTML to find the answer, it dawned on me: comments! My widget template is encapsulated in a single <div/> tag, but the HTML file that contains that <div/> has two comment tags: one for the IBM copyright notice and one with my comments explaining the content of the file. I had a hunch that Dojo was assuming the template file had only one node (an element) and wasn't checking for other, irrelevant nodes.

I was right.

I took the copyright notice and comments out of my template file and everything worked beautifully. It's good to know that my WS-* skill set isn't completely wasted here in RESTtopia.

[1] The beginning of the end for any project.

[2] The Dojo team isn't big on documentation, so I had to piece things together using half-baked suggestions from mailing lists and forums, some of which were no longer operating. I love reading documentation through Google cache!

Labels: , ,

Tuesday, August 21, 2007

dims

Davanum Srinivas: It's been a wild ride since WS PMC inception in 2003 as the PMC chair. I'd like to step down from this role now.

Dims was one of the people that helped get me up and running in the Apache Web Services community, and it's unfortunate that he can't continue to be one of its official ambassadors forever. He answered my calls for help on a broad range of issues - from ASF rules and regulations to SVN administrivia - and he never gave me any grief for bugging him with stuff that wasn't in his job description. Despite the fact that we were eleven time zones apart, emails to Dims always received an immediate reply, which means he is either an ultra-productive engineering machine or severely overworked; my guess is the former, but WSO2 is still in startup mode, so you never know. Either way, a lot of IBM's open source work has flowed through him on its way to release, and I'd like to thank him for all of his help.

Labels: ,

Thursday, August 9, 2007

anguish

Pat Mueller: I suppose I must not have gotten around to telling Dan my horror stories of using WSDL in the early, early days of Jazz. The low point was when it once took me two working days to get the code working again, after we made some slight changes to the WSDL.

I'll see your WSDL ruined my week and raise you a WSDL caused me extreme pain that could only be soothed by setting my skin on fire and never reading a WSDL document again. I understand the pain associated with WSDL-oriented tools and code generation, but my experience creating those tools was just as difficult. During the first six months of working on the project would become Muse 2.0, the other IBMers that I was working with started asking for client code generation features; I was working well over sixty hours a week at this point, and as important as code generation is for WS-* programmers, I had other things to worry about, like implementing all 9,087 pages of WS-RF. I finally managed to throw something together one weekend, and while it wasn't pretty, it got the job done (for a while).

Eventually, Muse grew and tooling was added, and we needed more code generation support than my weekend side project could provide. Andrew Eberbach started working on a Muse-oriented version of WSDL2Java[1] that actually had a design behind it and, as a result, was much more flexible for both Muse developers and Muse users. During the creation of WSDL2Java, I found myself spending a lot of time trying to transfer my knowledge on the nuances (nuisances?) of WSDL 1.1 and their affect on Java-based service implementations. When I first started at IBM, I had only a cursory knowledge of WSDL, but after two years, I knew what I was doing and had the scars to show for it; unlike many skills (which seem easy once you have them), reading and writing WSDL documents was something that I never got over, and that made it all the more difficult to encourage new students. By the time Muse 2.0 was released, I could read WSDL documents that were thousands (thousands) of lines long and find obscure syntactical or semantic problems in a minute or two... but it never seemed easy.

Never.

The worst part was, even as Andrew picked up my unwritten, unofficial WSDL knowledge and took ownership of our command line tools, it didn't free me from the tyranny of port types, bindings, and <xsd:any/>. As author of Muse's WS-resource deployment and request-processing engines, I had to read WSDL documents at runtime to determine how SOAP requests and WS-Addressing data should be mapped to WS-resource instances and Java method calls[2]. Keeping the WSDL assumptions consistent between tools and engines was tough when I was the only author, so you can imagine what it was like when it was split between two people, one of whom had not yet come to terms with the unstoppable, day-ruining force that was WSDL 1.1.

I really thought that I had a point when I started this rant, but now that I'm nearing the end of it, I can see that I don't. Pat's comment just triggered a flashback and it was either vent through my blog or sit in the corner with spiders crawling over my skin. Whew. Crisis averted.

[1] I believe the first instance of a tool named WSDL2Java was released as part of Apache Axis, but every web services framework I've ever seen has its own implementation of this concept. Muse was no different.

[2] This requirement was put in place in order to avoid another JAX-RPC mapping file disaster.

Labels: , ,