I've had two recent run-ins with the new docx files being produced by Word 2007. In both cases, an unsuspecting user created a document and emailed it out thinking that as Word document everyone should be able to read it. Of course, no one was.
At the core of docx is Microsoft's new "open standard" Office Open XML. Being XML, I was able to just strip out the text in the first case. With the second document, those involved wanted the formatting preserved. I poked around a bit and discovered that Microsoft had released a patch for older versions of Word. I installed it and opened the file but all the formatting disappeared. I then tried one of those online conversion pages and received a much better result. Microsoft can't even implement their own standard. That's really not that surprising given the size and complexity of the specification. It seems like they don't want anyone else to implement it either which defeats the whole purpose of an open standard. There's a good write-up of some of the problems with the standard here. Oh, and it also may become an ISO standard soon.
Comments
I just got one of those .docx files yesterday. For me, the Word 2003 patch worked fine. Of course, maybe the document used to be formatted in curly script with pink unicorns, but I'll never know.
A favorite quote: the nice thing about standards is that there are so many to choose from.
Posted by: Jeremy Stein on Tuesday, April 24, 2007
The patched version of Word handled the text but not the color or borders. It seems like Microsoft wrote the "standard" to be as backward compatible as possible with previous Word formats so I don't understand why they couldn't get this right.
Posted by: CJ on Tuesday, April 24, 2007