Slashdot Log In
Tim Bray Says RELAX
Posted by
ScuttleMonkey
on Mon Dec 04, 2006 10:26 PM
from the holy-war-schema-2.7 dept.
from the holy-war-schema-2.7 dept.
twofish writes to tell us that Sun's Tim Bray (co-editor of XML and the XML namespace specifications) has posted a blog entry suggesting RELAX NG be used instead of the W3C XML Schema. From the blog: "W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs."
Related Stories
[+]
Developers: XML Co-Creator says XML Is Too Hard For Programmers 608 comments
orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Don't do it. (Score:4, Funny)
Couldn't agree more (Score:5, Insightful)
On the other hand, RELAX NG "just works".
(all IME of course...:)
ant.
Maximizing Composability and Relax NG Trivia (Score:5, Informative)
(http://www.donhopkins.com/ | Last Journal: Monday February 23 2004, @09:48AM)
Tim Bray is right, and he couldn't have put it better: W3C XML Schemas (XSD) suck. The reason Relax NG is so much cleaner and more powerful than committee-designed XML Schemas, is that it's based on a sound mathematical foundation (tree regular expressions, or "hedge automata theory"). While XML-Schemas suffer from ad-hoc design, committee-burn, lack of focus, and half-baked attempts to solve too many unrelated problems.
Here's some interesting stuff from my blog [donhopkins.com] about the design and development of Relax NG [oasis-open.org].
-Don
James Clark [oasis-open.org] wrote about maximizing composability:
Clark [oasis-open.org] describes the derivative algorithm's lazy approach to automaton construction:
The Relax NG derivative algorithm [thaiopensource.com] is implemented in a few hundred elegent declarative functional lines of Haskel [thaiopensource.com], and also in tens of thousands of lines and hundreds of classes of highly abstract complex Java code [thaiopensource.com].
Clark's Java implementation of Relax NG is called "jing [thaiopensource.com]", which is a Thai word meaning truthful, real, serious, no-nonsense, and ending with "ng".
Comparing the Java and Haskell implementations of Relax NG illustrates what a wicked cool and powerful language Haskell [haskell.org] really is. The Java code must explicitly model and simulate many Haskel features like first order functions [wikipedia.org], memoization [wikipedia.org], pattern matching [wikipedia.org], partial evaluation [wikipedia.org], lazy evaluation [wikipedia.org], declarative programming [wikipedia.org], and functional programming [wikipedia.org]. That requires many abstract interfaces, [wikipedia.org], concrete classes [wikipedia.org] and brittle [wikipedia.org] lines of code [wikipedia.org].
While the Java code is quite brittle and verbose, the Haskell code is extremely flexible and concise. Haskell is an excellent design language, a vehicle for exploring complex problem spaces, designing and testing ingenious solutions, performing practical experiments, weighin
I have to agree. (Score:4, Insightful)
(http://www.luminance.org/ | Last Journal: Wednesday April 24 2002, @05:35PM)
Re:I have to agree. (Score:5, Interesting)
(http://slashdot.org/)
It also works really, really well with the nXML [thaiopensource.com] mode for emacs.
Finally, XML schemas in a way that are not verbose, ugly and unreadable. And if you do need one of the older schema languages there are translators from RelaxNG available.
Re:I have to agree. (Score:5, Interesting)
I was at SGML '96 where XML was first announced, and was one of those people who went home and wrote a (non-validating) XML parser over the weekend, based on the draft spec. I've used both DTDs and XML Schemas and can say without question that schemas are actually a bigger pain to work with than DTDs. DTDs were bad enough, but schemas have been a major step backwards, adding complexity without adding the features one actually needs.
Some years ago I wrote a code generator that used DTDs as the data modelling language. I sold it to the company I was working for at the time and someone I had no control over re-wrote it use schemas because they were "simpler". The result had major bugs and dropped features, not entirely due to schema-related problems, although it is worth noting that the "simplifications" included handling schemas in completely incorrect ways, because if you handled them correctly they could not do the job. I created a new generator from scratch last year and tried to do thing "properly" with schemas. It was essentially impossible, and I wound up creating a custom XML-based language use as input.
At the time there was no Relax NG standards process, so I stayed clear of it. But it has the blessing of James Clarke too (author of the SP SGML parser and the expat XML parser.) So it is probably worth another very hard look.
To the point. (Score:2, Funny)
Hey Tim, don't hold back, tell us what you really think.
I agree! (Score:3, Funny)
Like two peas in a pod (Score:1)
Telltale Sign... (Score:2)
Relax NG: Design-by-Inspired-Individuals (Score:4, Interesting)
(http://www.donhopkins.com/ | Last Journal: Monday February 23 2004, @09:48AM)
Relax NG is a great example of the triumph of Design-by-Inspired-Individuals vs. Design-by-Committee.
In The State of XML [xml.com], Edd Dumbill explains the secret behind the success of Relax NG:
-Don
Great job, now to clean up XML itself (Score:3, Insightful)
Re:Great job, now to clean up XML itself (Score:4, Insightful)
One fix to XML I'd like to have... (Score:2)
(http://slashdot.org/ | Last Journal: Thursday November 01, @12:01PM)
Speaking of XML, how much smaller would XML files be if they made one minor simple change...
Add to mean "close the matching element".
*sigh* I wish I'd been on the committee when they specified the standard.
Re:One fix to XML I'd like to have... (Score:5, Insightful)
SGML is full of fun little hacks like that, and it was a pain in the ass to read. Omitting the tag name from the end tag makes it impossible to know you have an improperly closed tag til the end of the document, and then you have no idea which tag wasn't closed. And no, that wasn't a theoretical problem either, this became a real problem with giant SGML docs that used all the shortcuts.
If you really hate XML's verbosity so much, realize that it was designed for easy reading, not easy writing. I whipped up my own xml mode in emacs and made '</' trigger an "electric-slash" behavior that closes the tag automatically. Not rocket science.
XML nightmare (Score:4, Insightful)
XSD: "Mission Accomplished!" (Score:4, Funny)
(http://www.donhopkins.com/ | Last Journal: Monday February 23 2004, @09:48AM)
From the xml-dev [xml.org] mailing list:
From: Rick Jelliffe
To: xml-dev@lists.xml.org
Date: Wed, 29 Nov 2006 12:46:06 +1100
Robert Koberg wrote:
Maybe a better analogy would be that the people who say that XSD is lovely is Mr Bush's "Mission Accomplished!"
Though of course there are differences between Iraq and XSD. One seems to be about people with their own fiefdom agendas stubbornly miring us in a quagmire, using a grabbag of thin reasons to justify it, denying any evidence that things are not rosy, perpetually promising that things are turning around, and enmeshing all sorts of decent people in a life of horror, difficulty and with no confidence in accomplishing the mission. The other is in the Middle East.
Just joking...
Rick
Slashdot Tags (Score:1)
(http://www.jaredbroad.com/ | Last Journal: Sunday November 26 2006, @01:39AM)
Relax NG. (Score:2)
(http://tirania.org/blog)
In addition to RelaxNG, it provides NVDL and RNC support.
Relax NG - constraining based on attribute values (Score:1)
One thing I really like about Relax NG is that it's possible (with very easy syntax) to constrain the XML structure based on an attribute value, something you can't do in schema or a DTD. For example, suppose you want to have an XML element:
true
'
With Relax NG it's possible to constrain the text in the arg element (e.g. "true" or "false") based on the value of the type attribute. For example, if type="int", you could limit the text in arg to an integer value. This is something you can't do in schemas or dtds.
Why just RELAX when you can REST too? (Score:2)
I call this the LineOfView (as in PoV) Problem (Score:5, Insightful)
The question now is: where do you draw the line of view? Along which line do I take my knife to cut open my n-dimensional structure to unravel it and flatten it out into a 1-dimesional string of characters? This is a problem that is impossible to solve satisfactory for all possible PoVs or - as I say - Lines of View, or better yet, Horizons to the structure. Will I unravel my DB of books by authors? By issues? By vendors? By publishers or by weight and size?
What I'm getting to is this: mapping n-dimensional stuff to 1-dimensional structures will allways suck one way or the other. It's just that with XML we all start agreeing upon in which way it's supposed to suck. I don't think that changing the Schema standard (or worse: introducing additional standards) will actually attack this hard problem. I have a strong suspicion that Relax NGs relief is illusional, short term and re-introduces downsides that XML Schema allready has takled with it's pesky and strict nature. For one it would be consistency with the View-Horizon once chosen all the way through the given data-structure. I don't know for shure - go test and find out - but I do know that universal serialization will allways come with downsides and RelaxNG (or any other schema) won't change that.
Wait wait wait (Score:1)
(http://ufy.sourceforge.net/)
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">
<zeroOrMore>
<element name="card">
<element name="name">
<text/>
</element>
<element name="email">
<text/>
</element>
</element>
</zeroOrMore>
</element>
is easier to read than this:
<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
WTF ?!
Relax NG experiences (Score:1)
(http://twelves.blogspot.com/)
The compact syntax is enjoyable as you can be quite precise (compared to XSD) and there are tools that convert between the compact syntax and the xml Relax NG syntax allowing you to use syntax that suites your needs. In general, JING it is quite a bit quicker than a few of the XSD validators for comparably complex schemas.
There are a few disadvantages:
* The full range of tools that are available are not advanced on a regular basis. I found a few bugs in the JING source code and had the opportunity to fix them where necessary.
* I feel that RelaxNG is marginalized because of XSD and along with that goes alot of additional OSS support. They are maintained by individuals instead of teams. I would recommend that the author of JING puts his software forward to the apache foundation (jakarta commons) and see if it can attract a bit more attention.
* Web services are a bit of a sticking point. The use of a Relax NG schema can be embedded into the WSDL, however, the various 3rd party clients may not necessarily understand the schema, and by extension, they would not generate any supporting classes making integration with a relax NG defined webservice a little more complex than it needs to be.
Relax NG really is great.
-Tim
RELAX compact syntax = BNF notation (Score:2)
It is really annoying when CS has to be discovered all over again. The problem of validating text to a certain format has been solved many decades ago, and BNF and variations of are known from the 60s...
Unfortunately, there is no automatic fetching.... (Score:2)
(http://slashdot.org/~wowbagger/journal/87552 | Last Journal: Monday September 03, @08:07PM)
I agree that RelaxNG is much easier to read, and it will much more completely describe a grammar than will the other standard - and MUCH more completely define it than will a DTD.
Unfortunately, as far as I can tell there is no way to, within an XML document, state "Use THIS RelaxNG schema file to validate this document", as you can with a DTD. Thus, even if I have placed my RelaxNG schema on my web server, I cannot set things up such that (for example) libXML2 can automatically fetch that schema when it starts parsing my document. I can map the RelaxNG schema to a DTD (losing information) and allow that to be fetched, but if I want to use a RelaxNG schema with libXML2 I the programmer must tell libXML2 where the schema is.
IMHO it would be a Good Thing if the W3C would standardize on some way to associate a RelaxNG schema with a given XML file - say, by some form of XML processing directive within the XML file.
Polishing a turd (Score:2)
(Last Journal: Thursday December 08 2005, @11:00PM)
Do we lose anything other than bandwidth use by doing this,
<tagNameThatCanBeLong>Some Text</>
instead of this:
<tagNameThatCanBeLong>Some Text</tagNameThatCanBeLong>
If the next end tag must belong to the last start tag what's the point of naming it?
Re:it's a rather straightforward observation (Score:2, Insightful)
xml is a b**ch to read
XML uses a binary format (Score:5, Insightful)
There are more sophisticated binary standards that are more efficient than ASCII and it wouldn't take a lot of effort to create viewers/editors for them as well. Of course most markup documents would be significantly smaller if tags didn't have to be S-P-E-L-L-E-D O-U-T character by character. Each HTML tag could be encoded in just two bytes with lots of room to spare.
It always fascinates me that we have no problem making customers use a new specialized tool like a browser, but it's taboo to use a non-ASCII tool for development. So we continue to structure our data as if it were going to be processed by a VT100.
Re:XML uses a binary format (Score:4, Interesting)
You could certainly make XML vastly more compact if you had some table of tags mapped to 2-byte codes. You're not the first to have such an idea, and I and others will be happy to use it... as soon as you've got it standardized, implemented, and as widely accepted as ASCII. Point being, I, and everyone I've never even met who will ever touch some particular XML file, already has a text editor.
We also all have some way of decompressing files in several standard compression formats, which will squash the XML down to the same size as your custom scheme, if storage space is an issue, which it generally isn't. There's all manner of custom schemes one can use to do various things better when one defines the platform. When you want to inter-opperate well, you need to use the capabilities that already exist on only semi-known systems.
Generally we don't actually make customers use new specialized tools. We take advantage of the new specialized tools they already have. I'm pretty sure not one of my customers ever got a browser to read my documentation; I wrote it in HTML because they've all got browsers already.
Re:Just sit back... (Score:5, Informative)
(http://www.b-list.org/)
Helpful hint for understanding the above: Tim Bray, author of TFA, is one of the guys who originally developed and spec'd out XML. Really. His name's on the spec [w3.org] and everything. So if he says that a particular XML tool has problems, it's probably a good idea to take him at his word ;)
Re:XML Totally Sucks - All of it! (Score:3, Insightful)
For flat data, sure a flat file is fine...for structured/hierarchical data, a flat file is
Re:it's a rather straightforward observation (Score:5, Informative)
(Last Journal: Wednesday July 06 2005, @10:01PM)
Re:XML Totally Sucks - All of it! (Score:2, Insightful)
Relax NG's compact non-XML syntax (Score:2, Interesting)
(http://www.donhopkins.com/ | Last Journal: Monday February 23 2004, @09:48AM)
Relax NG has a compact non-XML syntax. But C++/Java is a horrible syntax to use if you want a language to be readable and easy to understand. Since when was 17 levels of operator precedence [wikipedia.org] easy to understand? Of course any good programmer always uses parenthesis to avoid ambiguity, so why should a language have 17 levels of built-in ambiguity just to make it that much easier to make hard to find mistakes?
-Don
From my blog: Relax NG Compact Syntax: no to operator precedence, yes to annotations! [donhopkins.com]
James Clark [jclark.com] is a fucking genius! Hes the guy who wrote the Expat XML parser, works on Relax NG [oasis-open.org], and does tons [jclark.com] of other important stuff. Relax NG is an ingeniously designed, elegant XML schema language based on regular expressions, which also has a compact, convenient non-xml syntax [oasis-open.org].
I totally respect the way he throws down the gauntlet on operator precedence (take that you Perl and C++ weenies!):
You can translate back and forth between Relax NG's XML and compact syntaxes with full fidelity, without losing any important information. Relax NG supports annotating the grammar with standard and custom namespaces, so you can add standard extensions and extra user defined meta-data to the grammar. That's useful for many applications like user interface generators, programming tools, editors, compilers, data binding, serialization, documentation, etc.
Here's an interesting example of a complex Relax NG application: OpenLaszlo [openlaszlo.com] is an XML/JavaScript based programming language, which the Laszlo compiler translates into SWF files for the Flash player. The Laszlo compiler and programming tools use this lzx.rnc Relax NG schema for the OpenLaszlo XML language [openlaszlo.org]. This schema contains annotations used by the Laslzo compiler to define the syntax and semantics of the XML based programming language.
The schema starts out by defining a few namespaces:
default namespace = "http://www.laszlosystems.com/2003/05/lzx" .0"
namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace a = "http://relaxng.org/ns/compatibility/annotations/1
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace lza = "http://www.laszlosystems.com/annotations/1.0"
The a: namespace [oasis-open.org] defines some standard annotations like a:defaultValue, and the lza: namespace defines some custom annotations private to the Laszlo compiler like lza:visibility and lza:modifiers. Thanks to the ability to annotate the grammar, much of the syntax and semantics of the Laszlo programming language are defined directly in the Relax NG schema in the compact syntax, so any other tool can read the exact same definition the compiler is using!
To show how truly simple and elegant it is, here is the snake eating its tail: The Relax NG XML syntax, written in the Relax NG compact syntax:
# RELAX NG XML syntax specified in compact syntax.
default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace loc
Re:XML Totally Sucks - All of it! (Score:2, Insightful)
While XML may have it's places (I've yet to encounter one in the commerical world), passing large amount of data is not one of them. A good flat file design is a lot more efficent than XML, and short of hardware accelartion I don't see that changing.
I'm currently trying to assist a customer, whose changing from one system to another, the current system generates flat files of approx 2gig in size every couple of days (billing data). The new system produces files of approx 13gig. The data contained within files result in the exact same bill being produced for the customers.
Needless to say, the extra diskspace (yes we do compress them), and processing time to parse/compress is such a waste.
In my mind, XML trades shorter development time / 'portability' (well so the theory goes), for greater resource usage (CPU/Disk), whereas most customers I've dealt with would rather take a little longer to develop, and have a lot less resource limitation issues on the production systems. The old methods of 'just throw more hardware at it' just don't work in the real world anymore.
Re:XML Totally Sucks - All of it! (Score:5, Insightful)
(http://honeypot.net/ | Last Journal: Friday April 07 2006, @09:33AM)
Yeah, well I have to look at EDI every day. I'd switch to XML in a heartbeat if it were up to me.
You picked some obvious strawmen to shoot down. XML isn't for building gigabyte databases (regardless of whether some people try to use it for that). It's for easily moving data between applications. If you think writing a flat text parser is easy, then you've never had to deal with nested data or escaped characters. Say what you will about XML, but it's nice to have one set standard that deals with all that, even if suboptimally, because I never want to write another ad-hoc parser for as long as I live. Been there, done that, have no desire to bother again.
XML is like Electricity (Score:5, Insightful)
(http://www.donhopkins.com/ | Last Journal: Monday February 23 2004, @09:48AM)
It's good for transmitting information/energy, but it's not good for storing it.
-Don
If it's not meant to be read by humans (Score:2)
(http://www.chesmontastro.org/)
Oh, and, "Hi! How you doing? Long time no see!"
Re:it's a rather straightforward observation (Score:1)
(Last Journal: Wednesday June 20, @12:01PM)
I'll take XML over a positional format any day, even if it only has to be looked at by human eyes 5% of the time. If you find yourself in a situation requiring eyeball examination of purchase order/shipping data at a large electronic commerce company it is likely an emergency and <ctrl>-f 'ing for a tag name, or using a web browser to check well-formedness can be a lifesaver.
Re:it's a rather straightforward observation (Score:2)
(http://www.people.virginia.edu/~drs2n/)
Like any other formalism, it's difficult until you get used to it. The more familiar you are with a particular XML tagset and markup conventions, the easier it is to pick out the relevant structures and information. I remember being apalled at the verbosity of XSLT when I first begin to use it, but nowadays if I'm working with well structured XSLT code (and color-coding in the editor) I can scan it pretty efficiently.
That said, a non-XML syntax is almost always going to be more human-friendly. Which is another advantage of RELAX NG, of course, since it has a compact syntax that translates back and forth without loss of information to the XML form of the language.
COBOL Totally Rocks - All of it! (Score:1, Funny)
Well I'm your counterpart in India and I'm happy to hear you're having problems getting use to newer technologies. Keep up the good work.
Re:MyXML scheme (sucks too) (Score:1, Funny)