I am still staying away from the social aspect of scientific reference management, à la CiteULike or Mendeley. Instead, I manage my research references the old-fashioned way: as BibTeX files using the excellent BibDesk. So when I set up this website I looked for a WordPress plugin to embed my BibTeX files into a page. I opted for a plugin instead of various bibtex2html options to avoid manually updating the page when BibTeX file changes.

The papercite plugin won eventually among the several available choices, because it is actively updated and had a nice template out of the box. I quite like the expanding BibTeX entry…

It has been recently updated to version 0.4, which features a couple of my contributions: the plain citation style and av-bibtex entry template. The plain citation style aims to mimic its namesake in LaTeX distributions. The av-bibtex template adds support for displaying abstract and URL fields: it is used in my publications page. Yes, it is not a perfect showcase with a couple entries, but hey, I am working on adding more during my PhD!

Get papercite plugin from WordPress plugin directory.

XML 1.1 is not as widely adopted as XML 1.0 and is recommended for use only by those who need its unique features. I was writing a Java/EMF program that was storing serialised objects in XML. The serialisation involved control characters, e.g. , so XML 1.1 seemed a good choice.

To my surprise I could not parse some documents that I have just serialised. Some attributes went missing and then EMF was failing to load the objects. To check whether this is my problem I tried finding a minimal example to reproduce the problem. The rest of this post gives the example in detail, but to cut things short, it looks like there is a bug in Java 6 XML libraries: the problem is not reproducible with latest version of Apache Xerces libraries.

Parsing long attributes of XML 1.1

I created an example that parses an XML 1.1 file with long String attributes using Java 6 SAX parser. If an element has another attribute (usually shorter), it is resolved erroneously. It only happens for XML 1.1, while XML 1.0 works correctly.

The Java example below reproduces the problem. It generates an XML file and parses it back using the SAX parser, then prints out the attributes:

StringBuilder longAttr = new StringBuilder();
for (int i = 0; i < 8175; i++) {
  longAttr.append(i % 10);
}

String xmlText = "<?xml version=\"1.1\"?>" +
  "<test target=\"targetAttr\" long=\"" + longAttr.toString() + "\"/>";

SAXParserFactory parserFactory = SAXParserFactory.newInstance();
SAXParser parser = parserFactory.newSAXParser();

InputSource source = new InputSource(new StringReader(xmlText.toString()));
parser.parse(source, new DefaultHandler2() {
  
  public void startElement(String uri, String localName, String qName,
      Attributes attributes) throws SAXException {
    
    for (int i = 0; i < attributes.getLength(); i++) {
      String name = attributes.getQName(i);
      String value = attributes.getValue(i);

      System.out.println("Attr: " + name + ", Value: " + value);
    }
  }
});
view raw snippet.java This Gist brought to you by GitHub.

When the above example is run, I get the following attribute values:

Attr: target, Value: 4"/>etAttr
Attr: long, Value: 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234
view raw output This Gist brought to you by GitHub.

The size of the long attribute (8175 here) is set to exhibit the problematic result. If the long string is shorter (e.g. 8150), the value is resolved correctly: Attr: target, Value: targetAttr. For longer strings, the attribute value seems to slide in the long string’s buffer.

This issue appears when the file is indicated to be XML 1.1. If changed to XML 1.0, the problem disappears (even for way longer strings) and the attribute value is resolved correctly.

The problem is not limited to Java SAX parser: the DOM parser also suffers from the same issue. In my case, it appeared when using Java 1.6.0_29 on Mac OS X Lion.

Finding the solution

The text above is an almost exact copy of the question I was prepared to post on StackOverflow. I could not find an answer online and was worried that I may be using the serialisation wrongly; or that some hidden attributes needed to be set on the parser.

Before posting on StackOverflow, they make you jump through several hoops to ensure that this is indeed an unanswered question. The approach is very nicely explained by one of the creators as “Rubber duck problem solving”. In one of the suggested “related questions”, I found a suggestion that Java usually bundles old XML parser libraries, which may have bugs.

After downloading latest Apache Xerces libraries, the problem disappeared. So to use the long attributes, I should bundle the additional libraries in my program. Eventually I chose an alternative: to work with a different serialisation, which avoids the control characters. This allowed me to use XML 1.0, which does not exhibit the problems.

Every now and then I spend some time on things that may also be useful to others. I often find myself turning to the Internet for that quick “how do I”—and the Internet is usually extremely helpful. I will try to put some of my small ideas and results here. If they save a minute or an hour of somebody’s time—that will be my contribution.

P.S. I am not an avid writer… so may go silent for prolonged periods of time.