XTech 99 : XML Technical conference

San Jose Convention Center, Mar 7-11, 1999

Cedric Beust
 

General remarks

XML is taking off. Of course, all the participants in this conference have an interest in XML, but it was pretty clear from all the talks that everyone agrees on its soundness. Everything is not so rosy, though.

One of XML's main strengths is that it's extremely bare-bone and simple. This is also its drawback : in order to be really useful, XML must be complemented with a host of additional standards that take on where XML leaves off. Thus the necessity of the DOM, RDF, CSS, XSL, XPointer, XLink, and ECMAScript. One of the dangers is that XML might collapse under the weight of all these additional standards, which are far from being simple (check out CSS2 or Schemas).

XSL and Java are definitely hot. A good property of XSL is that it can be used both on the server (on the short term, to output HTML) and on the client, on next-generation browsers (IE5, Gecko). XSL is powerful but it also has blatant lacks. It's hard to predict where is it headed now since the standard is still under way, but let's hope that it will address all the issues. Meanwhile, vendors are using subsets of the norm and mixing those with proprietary solutions (see the talk on the "Coolest XML client-side application").

Text in italics and enclosed in brackets reflects personal opinions and is not part of the talk I am currently attending.

I assume in this talk that you are familiar with the current new Web standards, like XSL, DOM, CSS, etc... If I am being a little too specific in the following texts, please let me know by email and I will try and add a glossary to this article with broad explanations on how all these acronyms fit in the Big Picture.

 


Monday, Mar 7, 1999 

 

Tutorial : RDF and Metadata, Ora Lassita (Nokia), Bob Schloss (IBM Watson) 

[Note : RDF is the Resource Description Framework, a standard put forth by the W3C to express meta-data on any kind of documents. See their Web site for more details] 

What is wrong with the Web ?

The Web is suited for machine to machine interaction but its content is formatted for human interpretation. It is machine readable but not machine understandable, and therefore, hard to automate. Some metadata is needed, data with structure and semantics.

A Web Metadata is a machine understandable description of Web resources (and a Web resource is an object addressable by a URI).

There are a lot of possible applications to Metadata :

[First question to speakers : is Metadata going to replace DTD's ? Answer : the W3C Schema Working group is working on that. It's true that there are a lot of commonalities] 

[Question : what happens when we have 32 different "standard" DTD's ? Answer : There is an ongoing work to register DTD's in a central repository. And even though DTD's may differ superficially, a lot of doctypes are actually compatible. Use XSL to make up for the differences. ] 

In brief, RDF is machine understandable, I18N compatible, domain neutral and application neutral. It provides interoperability with no or little ambiguity. It came from META and PICS.

As of today, there are three Propositions [i.e., not Recommendations (in W3C terminology).] : PICS-NG (W3C), MCF (Netscape) and XML Data (Microsoft).

Model and Syntax

RDF in a few words : [At this point, I can't resist and ask a question that has been nagging me for a while. The speakers keep mentioning this "machine understandable" thing but XML, as a text format, is much more human-readable than machine readable. Why isn't RDF using a binary format, much more appropriate for machine interpretation ? Answer : the speaker (Ora) agrees and actually, his first proposition to the W3C was not XML. But he was turned down and XML was eventually chosen. It doesn't make much sense in that respect, except maybe that XML is trendy]. 
[I still find that there is a greater issue at stake here. XML has very good properties but we can very much predict that it will be abused. I have the feeling that RDF as XML is such an abuse. This could be reasonably addressed by compression or tokenization, though, but my point is that nobody is writing XML or HTML documents by hand (except for me, currently writing HTML in emacs ;-)) and that the "readability" point for XML is not always relevant. ] 

As a conclusion to this debate, RDF is independant of XML. Other syntaxes are possible.

RDF Model

There are three different types : Values are not only strings : they can be arbitrarily complex tree structures and can also be embedded or referenced (with links). [There is no such thing as an RDF DTD : XML is not even sufficient for RDF, they are using a lot of namespaces and sometimes, they go beyond what is actually specified. ] 

[One more reason in my eyes why XML is not an appropriate format for RDF. Their simple examples, using merely two or three namespaces, were already painful to read. And all that for simple annotations. 

Containers

Containers allow to group objects. There are three different types of containers : It is possible to create collections based on URL patterns. [Actually, the patterns are limited to prefixes]. 

Duplicates are allowed since there are no mechanisms to enforce uniqueness.

Example of a container in pseudo-language :
 

{ "http://...", dc:Creator, x}
{ x, rdf:_1, "Ora" }
{ x, rdf:_2, "Ralph" }
{ x, rdf:Type, rdf:seq }

[I find this specification in pseudo-language much easier to read than the actual XML source !] 

Referents

You can reference groups of URI's of different manners : For example, it is possible to specify that all documents of a given site (or sub-tree in the site) must receive a certain meta-data (like a copyright).

[Overall an enjoyable talk although the afternoon was much more boring, going through a lot of different sources and explaining them line by line. I have a few concerns, though :


Tuesday, Mar 8, 1999 

 

Keynote : Jon Bosak 

Jon Bosak opens the conference with his keynote. His goal is to get over with "Boring Old Documents" (the current documents you can find on the Web) and switch to the next step. It is vital to separate content and structure and then format them (HTML mixes all that). Whatever may come in the future, documents will live, they won't go away.

The current trend is to move away from procedural and toward declarative and object oriented documents (Poscript vs PDF). Printing will not disappear either. Actually, formatting is a superset of printing. A printed document allows very limited interaction whereas an online document will let you do almost anything.

Bosak then shows NetPost, a prototype used by Sun to illustrate the benefits of XML. It is an online newspaper that can change its look completely without changing the content. With only one click, the shown pages switch from the the "NY Post" layout to the ESPN one. Advertisement will gain more and more importance. Online documents allow to break new grounds in that respect : you can customize your ad depending on the content of the article (for example, if an article tells the story of a plane crash, it would be a bad idea for a company like United Airlines to advertise on that very page).

[I am puzzled : I am not sure an online newpaper is a good illustration of the concept. People will never leave their paper newspaper for an online one. He could have picked a much more telling example, like one of a portal gathering summaries of major headlines.] 

However good XML is, its rendering is limited to HTML right now, and it is trapped in Middleware. That's why Sun wants to jumpstart XSL.

[Bosak then announces that Sun is offering a $30,000 grant for whoever will be able to add XSL support in Gecko (the new open-sourced Navigator) by the end of the year. The announcement comes as a complete surprise and still makes us wonder. In my opinion, it just can't be done. More on this below with the CiRTECH presentation.] 

Bosak concludes that "we are getting out of proprietary format, and XML is making that possible".

[I agree, and as a matter of fact, Office 2000 will come out with full XML support for save files, thus making the binary upward compatibility headaches of current Word and Excel documents history. However, I am definitely not convinced that this is happening because of Sun's sheer mental force. XML is just an obligatory step now, and Microsoft is pushing it more than anybody else. However, this will most likely allow us to see more and more non-Microsoft document viewers, and that is a good thing.] 
 

Keynote : New Web standards 

The next keynote is a panel of people taking parts in the various W3C working groups. Five of them talked about the various activities they are in charge of :
 

 

SVG : Scalable Vector Graphics 

This talk was made bu Jon Ferraiolo, who works for Adobe. Adobe is one of the main proponents (and main implementor) for this spec. The idea is to be able to specify 2D vectorial graphics in XML. Graphic is a very important piece of document presentation and it is blatantly absent of current Web standards. SVG addresses this need.

SVG is a joint work by Microsoft, AutoDesk, HP, Macromedia, Adobe and Visio. It must be

SVG offers a very rich feature set. It intends to replace totally GIF, JPG and other PNG as an image format. It allows full resolution printing, progressive rendering [This means that its content will be displayed incrementally, so as not to leave the user with a blank page while it is loading. It's a good thing that they're aware of the importance of such a feature]. Another crucial point : texts embedded in the graphics must appear in textual form in the source, so that they can be accurately archived by search engines. [Another fine idea] Finally, it must supply interactivity and animation.

SVG is made of three distinct items :

It features gradients, patterns, text paths and alpha masks [Alpha blending is a feature that makes a lot of promises for graphically appealing documents]. 

Jon then proceed to do a demo. Pretty simple : the city of SF is represented, then zoomed, showing coffee places. It also illustrates curved text (textpath) with appropriate mouseOver() events. [I was unimpressed by the demo] 

The main benefits of SVG is : faster downloads, less round trips to the server, searchable graphics and scalable and zoomable graphical elements.

[A good talk overall but Ferraiolo lacks a lot of "bigger picture" vision. He didn't mention any drawback to SVG, not even the most obvious one : scalable graphics imply much bigger documents for possibly unused data. This is where the current map Web sites shine : they only show you the map you want and will cause a round trip at each zoom, but it's very much acceptable even with a slow connection.

There is definitely a need for 2D graphics on the Web, but I am not so sure it has to be vectorial. We have a few ideas on that one, but it's still confidential ;-)] 

XJ2 : Java based XML and XSL technology

XJ2 is a Java XML parser (tests both well-formedness and validity). It is SAX compliant and implements XML Namespaces. It implements the first half of XSL's norm (Transformation) and is being realized in collaboration with DataChannel. It supports both DTD's and DCD's, including strong data typing.

Available transformations as of today :

[not sure what this last one means] 

[The talk was fine until then. XJ2 definitely looked like something worth checking out. However, the speaker started contradicting himself by claiming support for a wide range of heterogeneous browsers but then admitting that they only work on the latest IE. Pressed to answer about additional features, the speaker would consistently answer "if it is implemented in IE5, we will do it".

It more and more looked to me like they must have some special contract with Microsoft but still, have no idea of what lies ahead. I am definitely convinced that they are piggybacking MSIE5 but I completely fail to see what their goal is or the point in their work. Oh well.] 

Datacraft : databases on the Web&

This tool comes from IBM Research. In order to achieve their e-commerce strategies, IBM have to deal with major database issues. One of the challenge is to map databases (often queried through XML or OO requests) to Web languages (XML and HTML). Creating such mappings is not easy and Datacraft aims at allowing developers to make this connection with a visual tool that will take care of all the gritty details for them.

Datacraft is an application development tool that allows to graphically

Datacraft liberates application builders from database details. A database schema is exported as both XML and RDF. The schema is then graphically converted into a query.

XML and RDF are used to describe database schemas and construct the query graph. Datacraft uses Resource Attachments (RA) to specify connections. The application runs as a Swing applet and therefore, needs a Swing-compatible browser (not very common).

The approach is definitely interesting but I suspect the speaker to have left aside all the important stuff. The core of this tool is in RA's, which specify how an XML/RDF request is mapped into a database query and reciprocally. Since this specification has to be generic, I suspect that writing RA's must be extremely tedious. It's probably okay to use Datacraft when the appropriate RA exists, but it's most likely a daunting task to write one. I also regretted that he wasn't more explicit about how they use RDF, since it is a format that can be easily abused and is definitely not easy to write either.

The Coolest Client-side XML application 

Tony Stewart, from Rivcom, a small (~10 employees) company, gave this talk. He spent about ten minutes raving about what they did and explaining that he wouldn't have the time to explain it all within 45 minutes.

[I was already hating the talk before it even started. Although this guy definitely has charisma and knows how to capture an audience, his neverending rave without showing anything was very annoying.] Rivcom has had to write a prototype of a Web application that they would show to VC's in order to raise fundings. They were understaffed and faced impossible deadlines [bla bla bla, yeah, right, Rambo] but they took on the contract. They needed a shared XML database and had to be able to update data in real time, both for the developer of the site, but for the customer. Stewart then starts a first demo.

[The demo consisted in store listings that are being updated in real time. The GUI and styles are also updated on all open pages in real time. Now the guy has my attention. What he is showing can definitely not be done with HTTP. My first guess is an underlying socket with a file protocol. This will turn out to be WinINet and ftp later in the talk.] 

They use a proprietary language for both styles and behaviors (their project started two years ago, CSS did not exist then and Javascript was barely a blip on the radar). The whole demo fits in 13 XML files and uses some additional Javascript with empty DIV statements.

Rivcom was in charge of the browser side functionalities. The work was divided in four tasks, four people and they used Frontpage and CSS to lay out the basic look. Then came some C++ programming (ActiveX) and scripting. The server side occupied three people.

The main guidelines they followed to build this prototype are :

[Now that last one is neat. Notice that it's a GET request but what is really returned is just a true/false value, to indicate whether the files could be successfully retrieved or not.]

[I wasn't very optimistic at the beginning of this talk but it turned out to be really interesting. Of course, certainly not as earth-shattering as the speaker had announced, but one should never set the bar for oneself too high. As goes a popular golf saying : "Never start a swing with something as predictible as 'watch this'".

Anyway. While their solution relies on a great part on proprietary standards, these guys are definitely on to something. Their exploration of this area shows where the needs are and their prototype is a glimpse of future Web applications. This is as much hacky as Web programming can be. I like that. But I hate people who brag about their work that much ;-)]

Mozilla from the trenches 

Michael Leventhal works for CiTEC, a small company in Finland. They are leveraging Mozilla's open source to hack into the code and add useful functionalities. Then they probably want to sell the browser with their added value, although he didn't make their business plan extremely clear.

This talk was not very interesting from a technical point of view. Leventhal emphasized the fact that dealing with Mozilla source code is Really Really Hard ("if you can defeat the Russian army, you should have no problem compiling Mozilla"). The code weighs 5M LOC.

[A few words on Mozilla.

Mozilla uses its own COM-like format (XpCOM, which can expose COM interfaces) and they invented their own cross platform GUI specification format, based on XML and RDF. They have done an impressive amount of work in that area. I strongly suggest you to take a look at their technical data and especially XUI and XPFE. The sad news is that while their approach is definitely interesting from a Research point of view, it is holding them back a lot, while Microsoft is not hampered at all by this problem.

More on Gecko below. Leventhal also mentioned the $30,000 grant offered by Sun to add XSL into Mozilla but my guess is that they can never make it by this summer].



Wednesday, March 10, 1999 

Keynote : How to find a hit-man online, David Siegel 


This humoristic presentation was given by David Siegel, a GUI guru, who realized (among other things) some of the best-selling typefaces and also authored "Secrets of Successfuly Web Sites".

Nothing much to say on this one. It was entertaining but I can't think of anything really new that was said during this talk. 

The DNA Infrastructre in Windows, J. Allard  (Microsoft)

Just a simple explanation on how Microsoft sees the Internet and how it can be put to work for enterprise business.

XML Beans, David Epstein IBM Research 

The BML (Bean Markup Language) is a language that describes component based applications. It is made of a BML Processor (that realizes instances of the language) and a BML Player (a runtime kernel that dynamically constructs applications).

BML can operate on any Java object (not just Java beans) and provides a way to capture the component structure.

The markup is made of two categories :

The BML Processor takes a DOM as input and is self-contained in a 53k JAR file.

An interesting approach but to me, Java Beans have failed and I don't see in that way their approach can make this change. I have yet to see a Bean demo that shows something else than a juggler being manipulated at runtime. 

Java standard extensions for XML, Dave Brownell (Sun) 

Dave gave a quick overview of his work and explained how his XML parser was fitting in the bigger picture in Sun's XML strategy.

[A talk that captured the audience's attention and probably gave a better idea on Sun's vision for XML. Which doesn't differ much from Microsoft's, by the way. All developers interested in writing or using an XML parser should definitely check out Dave's work, which has a lot of interesting features. ] 


Thursday, March 11, 1999 

Keynote : Sun and XML, Jonathan Schwartz (Sun) 

In this talk, Jonathan gave a very high-level (marketing oriented, not technical) overview of Sun's strategy of XML. Somebody pointed out to me that Jonathan managed to give a 45mn lecture on XML without even mentioning the W3C once... not very nice.

Keynote : IBM's strategy for XML, Marie Wieck (IBM) 

IBM's vision for XML. [A little less interesting presentation than Jonathan's, although Marie has a strong presence on the scene]. 

XML in IE5 and on the server, David Walsch (Microsoft) 

David Walsch is the technical product leader for IE5 (final release due out next week). In this talk, he demoed all the various supports that IE5 offers for new standards.

Even though IE5 is definitely ahead of any other browser in the Web Standards Race, David's demo curiously didn't score very well on the "Wow" scale. He showed XML mixing with CSS, dynamically modifying the graphical layout of several pages. We also saw some XSL (not much though) but that was about all. 

XML and Related Standards in Gecko, Vidur Apparao, Netscape 

Vidur is the technical lead for the implementatino of the DOM and XML for the Mozilla open-source project. I thought that Gecko would pale in comparison to IE5 (previous talk) but actually, the demo was okay. However, Gecko is clearly way behind IE5 in a lot of areas (except maybe for CSS)


Back to my home page