Skip to main content.

Monday, June 21, 2004

Yesterday, I fixed a problem with skin import/export when CDATA sections were involved. A CDATA section is a way to include just about any "character data" in an XML file, without having to worry about the well-formedness of the XML. It's used like this: <![CDATA[...contents...]]>

Since the only way a parser can tell if such a section is finished, nested CDATA sections are a big no-no. The sequence ]]> can not occur inside a CDATA section.

Nucleus exports its skins/templates as such a section (it's the safest way, since we don't know if its valid XML), so problems arise when the skin/template itself is using a CDATA section. The skinbackup.xml file is no longer well-formed. Although only RSS/Atom/... skins are affected, and import still worked fine, the problem had to be taken care of.

The solution

The solution is to replace any occurrence of ]]> in the skin by ]]]]><![CDATA[>, so that when including the contents in a CDATA section, that section will be split into two valid sections.

So, why didn't import fail with the broken files?

Ah, there's trickery involved!

In the import/export code included in Nucleus v2.0/v3.0, some code is present that transforms the read file before parsing the XML. It changes ]]> by }}> (only for nested CDATA). Once the skin data is read, the inverse transformation is done. I can't remember where that code came from. I probably copy-pasted it from somewhere, since the regex used was a rather complicated one.

Can I still import broken skinbackup.xml files?

Yup. The old trickery has been removed and replaced by some new trickery: when invalid CDATA-nesting is found, it is transformed into proper nesting, before the XML is parsed.

Can new exports be imported using old Nucleus versions?

At first, I was afraid this would be a problem. But since the new exports don't have nested CDATA sections, all is fine.

Why all that work? Import/Export did work, didn't it?

From a users perspective: it did. From a developers perspective, it didn't. It wrote out a skinbackup.xml which was claimed to be an XML file, but in fact might be not.

Comments

It's working fine now, I exported the nudn_rss2 feed and it shows in a browser window.

One thing I had to take out was the copyright line, the browser choked on the symbol. Is this needed?

Posted by hcgtv at Tuesday, June 22, 2004 17:17:01

It's probably a character encoding issue.

Here's what I _think_ is happening: Since the skinbackup.xml file does not define a character encoding, it gets interpreted by the browser using its default encoding, which does not have a copyright character. It could also be that the server is sending out some content-encoding header by default

I don't think it's possible to specify a character encoding in the skinbackup.xml file, since we don't know which encoding the contents of the skin is written in.

Character encoding is one of those things I'll probably never fully understand, or care to understand. As far as I heard, the preferable encoding should be utf-8 or utf-16 (one of the unicode variants) for most files. I mostly end up using iso-8859-1

Posted by karma at Tuesday, June 22, 2004 20:01:03

this is great fix karma, although there is no need to use CDATA for Atom feed, it is nice to have this bug fixed for any future possible problems.

This is what I use to wrap Atom feed:

<content
type="application/xhtml+xml"
xml:base=""
xml:lang="cs-CZ"
xml:space="preserve">
<div
xmlns="http://www.w3.org/1999/xhtml">
<p>....article XHTLM text...</p>
</div>
</content>

Posted by rADo at Saturday, July 10, 2004 21:53:49
Posted by rADo at Saturday, July 10, 2004 21:55:12

@rado: I don't think thats a good solution: if a user enters invalid XHTML in his/her posts, the feed would become invalid

Posted by karma at Tuesday, July 13, 2004 21:41:49

Add Comment

This item is closed, it's not possible to add new comments to it or to vote on it