Saturday, 4 July 2009

On Learning XML, its Processing, and Unicode

In learning XML, I used both Mark Long's Introduction to XML and the W3schools' XML Tutorial. As much as I like the W3schools tutorials, I rarely learn a technology or language that would be completely new to me from them. I visit the site relatively often, but usually for reference questions or background information. I liked Mark Long's casual and unpretentious style, which was to the point and clear of jargon.

I think that one of the greatest benefits of XML is its ability to accommodate a variety of languages and scripts; that one can have a source code containing several languages and scripts and yet the browser will display only that language or script the user expects if set-up properly. Therefore, I looked at the encoding and special character issues in both tutorials. I am great fan of Unicode, which I find to be an amazing intellectual achievement. It is a pity that for many web developers there is virtually no alternative to ISO-8859-1 (a basic Latin Western character set). Even at the 3Wschools' pages aimed at examples of XML in real life, in spite the fact that UTF-8 is a default encoding for XML, one finds the unnecessary suboptimal value in the processing instruction .

However, the issue that I was most wondering about was the processing of XML. I heard about both SAX and DOM and knew that there are different XML parsers, but the difference was never really clear to me. I think that after watching Mark Long's video I now understand the utilities much better than I did before.

No comments:

Post a Comment