Ticket #461 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

XSLTUtil's URIResolver resolves unresolvable URIs wrongly and ignores base

Reported by: clange Owned by: clange
Priority: critical Milestone:
Component: XSLT Version: v0.1.3
Keywords: Cc: dmisev, vzholudev, cmueller, kohlhase
Blocked By: Blocking:
Due to close: YYYY/MM/DD Include in GanttChart: no
Dependencies: Due to assign: YYYY/MM/DD

Description

I think I understood why we're getting strange XSLT error messages with some OMDoc input documents. These documents contain broken links (e.g. to images, via omlet/data). Inside the OMDoc XSLTs, these links are resolved using the document(uri) function, whose result is an XML document. When a non-default URIResolver is in operation, and we need such a thing for JOMDoc in some cases (#432 or TNTBase future), you (Dimitar) should know one more thing I think I haven't told you (shame on me for that): It is not only invoked for resolving xsl:include (i.e. paths to other XSLTs relatively to the current XSLT document), but also for document() (i.e. for resolving URIs relatively to the URI of the current document).

Now that is what the base argument of the resolve document is important for, which we have ignored so far in the URIResolver implementation of XSLTUtil. (Ceterum censeo: Use a smarter URIResolver!) Every document D that we deal with has a notion of its URI (the "system ID" that I mentioned before). When a URI relative to D is to be resolved, that system ID is passed as base to the URIResolver. That means that the URIResolver itself also has to set a reasonable system ID for any Source that it returns, otherwise relative URIs from that document won't be resolved correctly. This all works smoothly if we are in the file system and using the default URI resolver (i.e. don't set our own one).

Now for the return value of resolve. It may well be the case that a certain URI is not resolvable. For the case of resolving an xsl:include that is not desirable, but shit happens. But for the case of document(), it is quite common, because there it was the author of the document who placed a dangling link there -- with or without his own fault. When a URI target does not exist, URIResolver.resolve is supposed to return null, but the XSLTUtil URIResolver does not. It returns new StreamSource?(null), and that gives us nice, almost incomprehensible error message that appear to come from the XSLTs, because then the XSLT processor (Saxon) tries to parse the empty StreamSource? into an XML document, but does not find an XML document in that place. The error messages are then like "non-wellformed input", "invalid character at position 1, line 1", or similar.

Change History

Changed 3 years ago by clange

  • cc cmueller, kohlhase added

@Christine, I realize that this may affect a thing you are interested in. It also means that, when you are running JOMDoc with the "default" XSLT (= the -X option), ref/@xref links are not resolved correctly by the XSLT.

Changed 3 years ago by dmisev

thank you for all the help. The URIResolver is a bit smarter now, and all issues mentioned here should be solved, so you can close this ticket.

Changed 3 years ago by clange

  • status changed from new to closed
  • resolution set to fixed

OK, I looked at the code and it looks good. This ticket can be closed. If there are still bugs, I think we will discover them during Michael's testing in the file system and Slava's testing in TNTBase.

Now some further comments that came to my mind while reading your new code. Note: reading, not debugging! ;-)

But now there's one thing I don't understand. Rendering a test document with an include-ref worked both with -x and -X, but in the -X case I don't understand why, because after "resolving" href against base, you load a file using XSLTUtil.class.getResourceStream, and an OMDoc document referenced by <ref xref="..."> of course is not in the scope of the Java class loader but somewhere in the filesystem. Still it works. -- OK, if you can't explain it either, let's not think about it (unless it turns out to be buggy still ;-).

Then, the URIResolver used for "-x" uses File -- which looks like this may only work in the filesystem, but not e.g. in TNTBase. But I suppose TNTBase will need its own resolver anyway -- is your changed interface still compatible with the one that Slava needs to plug his TNTBase-specific resolver in?

new StreamSource(new File(path)) works, but there is also a StreamSource(String) constructor that seems to do at least a very similar thing directly.

I see that now you feed the JAXP-style transformer from a StreamSource?, and when the actual input was a XOM document that you obtain a StreamSource? from it via a String. This is still quite inefficient. A "legal" way in XOM would be using DOMConverter and DOMSource, i.e. first converting the XOM document to a DOM document, then feeding the DOM document to the transformer, but that's also still quite inefficient. In some earlier comment I mentioned how to enable the most efficient way by using the hidden XOMSource class, so I'd still recommend you to do that.

And one different thing: in IOUtil.getScheme, it might not be so efficient, but maybe more reliable to construct a URI from the String and use URI.getScheme -- just to be prepared for changes/improvements to java.net.URI.

Note: See TracTickets for help on using tickets.