I would try another HTML 5 parser. HTML 5 is somewhat of a unification of HTML and XHTML, getting into syntax-specifics between the two with XML parsing is probably going to be an uphill battle. That said, I’m curious what the first line is, it could just be malformed entirely.
Hmm, doctype declarations are sort of like the markup equivalent of headers. Usually parsers read them to know what flavor to expect and then go parse the rest of the page separately. You shouldn’t have to do this, but if you chop off that first line and run it through a standard HTML parser it might work fine.