Archive for January, 2006

Hurry up, Steve!

Given the recent issues with my powerbook, my employer gave me a replacement notebook “just in case”, and this weekend I started fiddling with it just to see how it was like.

The box is nothing fancy, it’s part of a notebook stock we keep aside for cases like mine, so it’s “just” a quite bulky IBM R50 with Windows XP. I didn’t even bothered installing my favourite Linux distro since I’m due to return it in a week or so, but it has been enough to make me consider a temporary switch until Apple comes out with decent stuff again. True enough, OS X “just works”, but it’s also a fact that it works slow. And I mean SLOW.

The Thinkpad is snappy stuff: Eclipse fires up in seconds and doesn’t suffer of any delays in switching editor windows, completing method names, compiling and firing up tests. Firefox is a breeze. Cygwin (well, you wouldn’t expect me surviving without some kind of Unixish environment, wouldn’t you?) makes the windows experience less painful. OpenOffice doesn’t spin any beach ball and opens pretty much everything without a noticeable delay.

It’s definitely true that the Apple experience is addicting stuff, but I didn’t expect to be mesmerized so much to forget my expectations about decent performance. This Intel box in front of me is a wake-up call: daily work shouldn’t involve watching a display doing nothing but fancy rainbow circles for a large amount of time, and I’m definitely thinking about convincing our hardware guy to let me stick to the Thinkpad until something decent comes up from the Mac front. So, Steve, you’re better hurry up with your keynote and tell me some good stuff about the next generation of powerbooks: I intend to stick with Apple for a number of reasons, but I’m starting to feel fed up by G4 sluggishness. Meanwhile Sanjiva is tempting me quite a bit to delve into some serious P2P and Thinkpad hacks…

More PowerBook woes and hacks

In an earlier post I described how I had to jump through hoops to have a running Powerbook again. Well, that’s just part of the whole story: in the following days I found out that problems weren’t quite over yet.

The very same evening, relieved by having a working computer again, I happily unplugged the power supply from the office socket, ready to go home. While rolling the cable in, I felt a burning smell and realized that the cable was softer than usual, hot and rapidly turning brown. I finally had something to blame: it was clear that the power supply was responsible for the whole sleeve of software issues I had so far, so I planned to flesh some money out and buy a new one (while, of course, cursing Apple and wishing I lived in the USA to start a class action).

Days went by though: a packed week, computer shops closed for after-Christmas break and the fact that the power supply seemed to behave after all, if left untouched in a certain position made me postpone the new gear acquisition. I definitely shouldn’t have been this lazy: this afternoon my powerbook started to hiccup again as the power supply started to overheat and, eventually, blow nice white smoke rings. Now, today is a national holiday over here (which means shops are closed) and tomorrow pollution laws will prohibit every kind of traffic, so the weekend plan, which included quite a bit of work, looked pretty much screwed up.

A bit of web browsing revealed though that I wasn’t the only one been hit by bad luck and fried power supplies. The hacker in me went to the toolbox, got a screwdriver, a hammer and started hacking its way inside the supply unit.

Eventually the soapbox cracked open, and I started mangling with wires with sharp knives and scissors: actually I was planning to do a nice job with my soldering gun, but lately I found out that I didn’t have any solder wire around (and again, may I remember how today it’s holiday and tomorrow we’re stuck by a traffic block?). Luckily I had a plastic connector in my toolbox left from some electrical work, so eventually I was able to get back to something running without bursting my house in flames (oh well, hopefully at least):

Hacked power supply

This isn’t exactly the ideal setup for a road warrior like me, but hopefully it should be able to last until next monday, when I guess I’ll have to spend no less than 80€ for a new, shiny and white soapbox. Oh well, at least if this hacks keeps working I might consider finishing it up with a solder gun and have a second power supply for home or office use, while hoping Intel PowerBooks will a) face up soon and b) be compatible with existing stuff including AC adaptors.

Wild pipeline API thoughts

(Note: this is a long post, and most certainly the syntax highlighter will make it look funny on your aggregator. You might want to visit the web page to get a better grasp of it)

In 2006 it will be roughly 6 years since I started juggling with XML pipelines. As my few fellow readers might remember, I’m starting to hate XML languages with a passion, but once again this doesn’t mean I don’t like XML anymore and, even more, this doesn’t mean my love for Cocoon is fading. I’m still convinced that pipeline-based processing is the way to go: the road to complex yet maintainable results clearly goes through decomposing the problem in a set of easy step to be performed sequentially and incrementally.

I’m also still convinced that XML is here to stay, for a number of valid reasons, yet I think that the overall scenario has changed since the original Cocoon vision. XML is possibly the most important player out there, but didn’t manage to pursue its Borgish ambition to assimilate everything else: there is a growing party of people who are realizing how the idea that everything could (and should!) be represented as XML is pretentious at least, and stupid at most.

This leaves us, however, with two important concepts: we need pipelines, and we need to steer clear of XML when it doesn’t make sense. To achieve the first goal what we need is a generic, easy and intuitive pipeline API. And it should be a programmatic API, because we need pipelines everywhere, and we need them to be easy enough to grasp for the average programmer (think Facade on steroids): what bugs me with the currently available pipeline API is how they tend to be clumsy and counter-intuitive. Think SAX as the perfect example of why we need a more generic and easier pipeline API and machinery: in the SAX world if you want to pipe events from foo to bar you just do this:

[java]
foo.setContentHandler(bar);
[/java]

Things however get complicated when baz and boo enter the picture. Now you have to:

[java]
baz.setContentHandler(boo);
bar.setContentHandler(baz);
foo.setContentHandler(bar);
[/java]

Which, counter-intuitively, means building the pipeline starting from the last component and going all the way to the first one. In addition to that, those statements are usually interspersed on code that contains other statements such as creation and configuration of the various pieces. Moreover, from a functional point of view the above code could be rewritten as:

[java]
foo.setContentHandler(bar);
bar.setContentHandler(baz);
baz.setContentHandler(boo);
[/java]

Which, even if it looks better from the user point of view (the pipeline steps are now ordered) it has no relation with the underlying model. In fact, you could actually obfuscate stuff when considering switching jobs:

[java]
foo.setContentHandler(bar);
baz.setContentHandler(boo);
bar.setContentHandler(baz);
[/java]

The above lines will still work as expected, but I dare anyone to understand who sends events to whom in a real life scenario where those statements might be ten lines away from each other. To me, this just doesn’t sound right.

Now enter Cocoon and see how, in its declarative sitemap, it shines from the user’s point of view:

[xml]

[/xml]

This just sounds right: pipeline steps are listed sequentially, as they should be, and everyone now understands who’s first and who’s next. But, heck, this is a domain-specific language by all means, moreover written in XML. No way this stuff can be reused in different context, and the “strong typing” nature of the Cocoon pipeline (where everything starts with a Generator and ends with a Serializer, assuming not only that the whole world will talk XML but actually that the whole world will talk SAX) makes things even more difficult.

Finally, consider what the Unix genius have brought us:

[code]
$ grep index.html access.log | awk ‘{ print $1 }’ | sort | uniq | wc -l
[/code]

I think there’s no better comment for the above solution than this:

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” (Antoine de Saint-Exupery)

Expressing the pipeline concept with just one character (the | sign) is a clear indicator of what could be achieved when thinking about simplicity: the concept is powerful, yet the user space view of it is as simple as it can get (admittedly, a bit opaque but it doesn’t take much to get used).

So, what do the above snippets bring to us? In this quest for a simple pipeline API we learn that simplicity is key and that the principle of least surprise suggests that the pipeline declaration should happen at once and in an ordered way. Sticking to Java, this leaves us with something like

[java]
pipeline.setupPipeline(List components);
[/java]

or (uglier, but sometimes just effective enough):

[java]
pipeline.setupPipeline(PipelineComponent[] components);
[/java]

Actually I’d much rather see the setup happening during the construction phase, but for the sake of interface design I’ll leave the convenience method for now. This means that our Pipeline interface becomes something like:

[java]
interface Pipeline extends PipelineComponent {

void setupPipeline(List components);

void start();

}
[/java]

Easy and effective: whoever knows the pipeline concept should be able to grasp how this API works in minutes. We need to talk about the PipelineComponent interface though, and this is related to the next wild idea: pipeline machinery.

(warning: shaky ground ahead, this is the part which needs *much* more thinking)

In the OO world things aren’t quite as simple as in a CLI environment, where all you have are basic pipeline contracts such as “whatever byte streams comes from the left side is pumped to the right side”. We have objects here, and I don’t want this generic pipeline API to be strongly typed as in being able to work just with - say - SAX events or other XML gibberish. What I want is an API which is able to work with many formats in a way that’s transparent to the user: this means that the various pipeline stages should be able to express their contracts in terms of required input and output format. It’s up to the pipeline machinery providing adapters and bridges so that, say, a pipeline component working with SAX events might be able to cooperate with another component working with streams. This could be accomplished either through some kind of PipelineDescriptor, with annotations or just through different interfaces: whatever keeps things simple, makes me happy.

Another nice solution comes (again!) from a chat with Sylvain reminding me of the IAdaptable approach in Eclipse. This solution fits like hand in glove with a world of interchangeable and heterogeneous pipeline component stages, even though I have a few concerns thinking about the added complexity for pipeline component writers in implementing an Adaptable strategy: if the first and foremost objective of this API is simplicity, then writing components should be as easy as possible.

Anyway, the final outcome of all this would be something like this during the pipeline assembly phase:

[java]
PipelineComponent current;
PipelineComponent next;

if (next.accepts(current)) {

current.setNext(next);

} else {

PipelineComponent adapter = next.getAdapter(current.class);

current.setNext(adapter);
adapter.setNext(next);

}
[/java]

With this mechanism in place, in theory, the pipeline is much more versatile: sticking to the XML world it would be possible to build pipelines using whatever mix of SAX, DOM, StAX, AXIOM and YouNameWhat. Moreover, it would be easy enough to provide adapters to the stream world, tees and nested pipelines (there is a reason for Pipeline to extend PipelineComponent after all).

Of course I expect the pipeline machinery to do much more than just adaptation: caching, logging, monitoring and management are vital to the pipeline deployer. But the real point of this effort is, once again, simplicity. Once I’m able to do this:

[java]
// Get the default pipeline implementation
Pipeline pipeline = PipelineFactory.getPipeline();

// Set up the pipeline with an array of PipelineComponents
pipeline.setupPipeline({reader, transform1, transform2, streamAdapter });

// grab the InputStream from the latest component
InputStream result = streamAdapter.getInputStream();

// start processing
pipeline.start();

// enjoy results
is.read()…
[/java]

or this:

[java]
// This time we use SAX events straight away
pipeline.setupPipeline({reader, transform1, transform2 });

// connect to the pipeline result
transform2.setContentHandler(myContentHandler);

// start processing and handle events coming in
pipeline.start();
[/java]

or, why not:

[java]
pipeline.setupPipeline({reader, transform1, transform2 });

anotherPipeline.setupPipeline({something, pipeline, somethingElse});

anotherPipeline.start();
[/java]

Then I could do this from within Cocoon:

[javascript]
function handlePage() {

var pipeline = cocoon.newPipeline({ file(”something.xml”, xslt(”foo.xsl”), forms(), i18n() });

cocoon.sendPipelineAndWait(pipeline);
}
[/javascript]

but also, when Cocoon is not an option:

[xml]
< %@ taglib uri="http://jakarta.apache.org/taglibs/pipeline" prefix="pipeline" %>

[/xml]

Conclusion: if you managed to survive this far, well, congrats and thanks for sticking: it’s been a bumpy ride and there are certainly a ton of rough edges, but the more I think about it, the more I’m convinced that a simple, painless and easy to use Pipeline API could be an invaluable tool. I’d love to use the incredible experience of Cocoon in building solid pipelines to factor out a new and fresh approach that allows anyone to enjoy the power of pipeline-based processing: it’s not going to be easy, but the goal is indeed worth the effort. Now, finding the time to make it happen is a totally different question…

(Unusual) hacking fun

These post-Christmas days are a bit less hectic (but hey, just a tad) than what I’m used to survive to normally, so I’m having some good old hacking fun (you know, the kind of stuff you don’t know how much you’ve been missing until you return to it).

The excuses for firing up that IDE again were multiple: I wanted to take Maven 2 for a spin and, after my recent rant about pipeline languages vs. APIs, I wanted to try some concepts out and see where they might bring. I’m well far away from a solution, but so far I have a few points to make:

  • Maven2 is really, really nice. I’m still reluctant to say that it rocks since I need to see how it behaves with complex stuff, but so far it has way exceeded my expectations. After a few years spent juggling megabytes of jars even for the simplest stuff, seeing that my (automagic, just mvn assembly:assembly) distribution of what I’ve got so far is *just* 4K despite having 15 different library dependancies makes me so happy that I might break in tears any moment now.
  • E4X looks terrific. It’s so immensely powerful to sparkle a lot of weird ideas in mind to exploit every single bit of it in my quest to reduce XML-induced overtyping and messy stuff. Now, if only I could convince Rhino to use my (Java) DOM trees directly in E4X instead than having to go through string serialization every time (yuck!) I’d be a very, very, happy puppy (suggestions are welcome, of course).
  • JSON is another neat piece of technology worth visiting. A suggestion from Sylvain revealed how my quest for a comfy pipeline API might be soon over if I manage to bend it a little bit to my needs, but so far looks promising indeed.

I’m almost positive work stuff will make me drawn any moment now, so that I’ll be forced to quit these nice experiments, but I definitely want to pursue the above technologies, see if and how they might help in the OSS stuff I’m involved in and, last but not least, bring them to our projects where it makes sense. Moreover, as my formal new year proposition, I want to commit myself to some hacking on a regular basis: yes, old lawyers farts can code!

A scary morning

Since a few days ago my PowerBook has been behaving strangely, with weird hisses and whistles coming from below the keyboard and an “ozonish” smell that I never noticed before. This morning, my fellow companion started to misbehave: the screen went flickering after a couple of hours spent working and eventually the whole thing froze. Upon successive reboots, things were just getting worse and the lifetime of my computer would decrease from ten to one minute.

This is not exactly what you’d call an ideal situation: first of all I couldn’t figure myself buying a new G4 PowerBook when in a matter of days we should be getting news from Steve about the long-awaited Intel shift. Then, when I started calling Apple shops around here to see if I could get it fixed, maybe getting a spare in the meantime, I noticed that January 2nd isn’t exactly a working day for Christmas-exhausted stores, with no one answering my frantic calls.

I decided to switch to friends, among others managing to bother Pier all the way to Japan (thanks mate! I owe you a Sashimi) and eventually finding a friend of a friend who gave me the right hint (yes Enrico, that’s you: just name the restaurant and be my guest): what seemed to be a quite typical hardware issue turned out - well, hopefully at least - as a software glitch.

Apparently the overheating symptom can be caused by the OS misreading sensors and mismanaging fans: he suggested to start with the Tiger DVD, keep the computer running for 30 minutes or so and, if all was well, proceed with an OS reinstall. Well, it took me the best part of my morning and a good deal of bravery to see the progress bar while fearing everything could break any second now, but in the end all went just fine. My PowerBook is back, even though it’s definitely backup & update time, and I feel much relieved overall: I could imagine a whole sleeve of better ways to start 2006, though.

(Yet another) XML pipeline language

Sylvain has been noticing the recent W3C activity in establishing a standard XML pipeline language. My first reaction has been “uhm?”, then I went to check what was that group up to and the result was more like “ouch”, “yikes” and “OMG” (add despair to the latest).

It strikes me how, in early 2006, people are still thinking that another XML domain-specific language is the way to go. We are all learning the hard way how the XML verbiage has been useless and, to some extents, detrimental: from Jelly onwards (and yes, I deserve some blame as well) it became crystal clear how programming in XML leads to unmaintainable, opaque and unreadable stuff. The fake myth that XML can be written by grandmas, coupled with the low entry barrier in creating new languages (no compiler’s compiler needed, yay!) has produced a plethora of half-baked solutions that just don’t get how grandmas aren’t going to code anyway, while angle brackets get heavily in the way of anyone who understands even just the basics of programming.

Now, don’t get me wrong: XML is great when properly used. That is, data (some grandmas might even write data at a certain point), information interchange and tool-oriented stuff. But please, pretty please, when talking about programming (that is, data processing and component connections), take those angle brackets out of the picture and give us the power of effective syntaxes. There might be some exceptions: transformation languages such as XSLT, having to deal with XML all the way, are consistently expressed in XML, but that’s not the case for XML pipelines.

Pipelines are about connecting, chaining, concatenating: there’s nothing there that needs XML to be expressed. It’s meta-XML, in a way, a side order to the main XML dish. What we (well, I at least) need are APIs: a standard and effective way to tie XML processing components together so that data manipulation can work in a multistage environment. We then need some machinery around it that provides transparent adapters between the different XML processing world (SAX, DOM, StAX) and we could definitely use some services (logging, management, security) on top of it. But we don’t need XML for that. We might want to use it at a later point, as a sort of wrapper around the pipeline language if, and only if, there is a clear need for tooling that could use a well-established and easy to parse data format, but please save our aging eyes and our carpal tunnels from angle brackets whenever possible.

Personal rule of thumb after a few years spent writing way too many XML languages: if you’re outside the context of an XML document (that is you’re not writing a template processor, basically) and you’re thinking about expressing control flow (if, switch, etc…) in XML, well, think again. Chances are you’ll end up with something like:

<my:test xmlns:my="http://keep/this/short">
      <my:when condition="whatever">
         ...
     </my:when>
     <my:otherwise>
         ....
     </my:otherwise>
</my:test>

for a whopping 177 characters compared to the following 43 (a 4x factor!):

if (whatever) {
    ...
} else {
    ...
}

Need to say more?