Ticket #460 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

XSLTUtil optimizations, questions

Reported by: vzholudev Owned by: nmueller
Priority: major Milestone:
Component: System Implementation (SI) Version: v0.1.3
Keywords: Cc: dmisev, clange
Blocked By: Blocking:
Due to close: YYYY/MM/DD Include in GanttChart: no
Dependencies: Due to assign: YYYY/MM/DD

Description

in static block of XSLTUtil.java I can see the possibleoptmization: Now there is code:

Reader reader = new StringReader(getDefaultXSLT().toXML());
            transformer = tfactory.newTransformer(new StreamSource(reader));

I.e. we get a default xslt in the method getDefaultXslt() as an InputStream?, then convert it to XOM's Document, and then again convert it to string via toXML(), and then new Trasformer(reader) tries to parse XSLT again. Maybe it would be worthwhile to write the following line:

            transformer = tfactory.newTransformer(new StreamSource(XSLTUtil.class.getResourceAsStream("xsl2/bla.xsl")));

2) Also I don't understand why we use xalan (the default value) when transforming using default xslt, thas is I don't see where we set a property in the static block like:

            System.setProperty("javax.xml.transform.TransformerFactory",
                    "net.sf.saxon.TransformerFactoryImpl");

3) Also when we provide a custom stylesheet to XSLTUtil, then there is no way to provide a URI resolver, thus files in a jar for example won't be resolved

Change History

  Changed 3 years ago by vzholudev

Also I saw some other places where a document is parsed, then transformed to a string and an input stream, then the latter is passed to some XSLT classes, where the XML file is parsed again. We don't see much difference for small files, but for large ones (like I'm using is more than 1Mb) the diference is considerable.

in reply to: ↑ description   Changed 3 years ago by clange

Replying to vzholudev:

Reader reader = new StringReader?(getDefaultXSLT().toXML()); transformer = tfactory.newTransformer(new StreamSource?(reader));

Argh! However, I perfectly understand where this comes from, so here's nothing to blame on Dimitar. I think he got trapped by the intentional incompatibility of XOM with JAXP (Java API for XML Processing = the javax.xml.* packages, and classes like Transformer, TransformerFactory?, URIResolver, Source, Result, etc.). JAXP is ill-designed in certain aspects (details are less relevant here, but I can tell you if you're interested), and Elliotte Rusty Harold, the developer of XOM, is a zealot w.r.t. pure, elegant, and standards-conforming XML API design. Internally, XOM's XSLTransform class, which is used by JOMDoc's XSLTUtil for the transformations that involve external XSLTs but not transformations with the built-in default XSLT (*), is also based on JAXP, but intentionally does not expose certain classes and interfaces, e.g. XOMSource and XOMResult. Therefore, there is only the one and holy and only way of using XOM's XSLTransform, and there you don't have the possibility to define a URIResolver -- which we need: First for resolving relative paths inside Tomcat WAR files (because Tomcat is too stupid for it), secondly for resolving custom URI schemas, as will be introduced with TNTBase. Therefore, Dimitar somehow tried to force a XOM document into a JAXP-style transformer.

(*) Aha! @Slava, that explains a lot! It does make a HUGE difference whether the "default" omdoc2pmml-copymobj.xsl is (1) used as the built-in XSLT, or whether you (2) pass the same file as an "external" XSLT to JOMDoc -- because in case (1) a different URI resolver is in operation, namely the very specific one that Dimitar implemented for the directory structure of the OMDoc XSLTs. And that URIResolver gets a certain thing wrong -- I will file my next ticket on that.

Dimitar, the class that would help here is XOMSource. It is available but private.  I know how to forcefully make it public, though. A very ugly workaround, but it works fine.

Maybe it would be worthwhile to write the following line: transformer = tfactory.newTransformer(new StreamSource?(XSLTUtil.class.getResourceAsStream("xsl2/bla.xsl")));

Indeed, that would be a straightforward solution. That should be implemented for the "default" XSLT. Nobody needs the getDefaultXSLT method anyway; why has it been implemented? However, I can predict other situations where XOM and JAXP will clash, so we will also need a workaround to accommodate for that.

2) Also I don't understand why we use xalan (the default value) when transforming using default xslt, thas is I don't see where we set a property in the static block like: System.setProperty("javax.xml.transform.TransformerFactory?", "net.sf.saxon.TransformerFactoryImpl?");

Hmm, that is very interesting now. Indeed Saxon should be used everywhere, as our XSLTs are version 2.0 and Xalan only supports 1.0. I.e. that system property also needs to be set for the default case, but the default case needs to be redesigned anywhere. I'd set the system property in the static initializer altogether, as it will be valid for the whole time the JVM runs. Now that bug would explain why you (Slava) sometimes got strange syntax error messages that were no errors -- because that was XSLT 2.0 syntax not valid in 1.0. But then I wonder how and why the transformation with the default XSLT has ever worked.

OK, I got it: Slava, you called the XSLTUtil.transform(Document, Context) method, which applies the default XSLT. You called it directly, which is allowed, because it is public. So, Saxon was never set as the XSLT processor, and you got strange error messages from Xalan. However, that transform method was not made to be called from outside. JOMDoc itself only calls it from the other transform methods (search for transform in cli/cmd/RenderCommand) -- an example of how not to design method signatures and how not to use null default values :-( And all the other transform methods first make Saxon the default XSLT processor.

3) Also when we provide a custom stylesheet to XSLTUtil, then there is no way to provide a URI resolver, thus files in a jar for example won't be resolved

Right. The answer is no, see above :-(

follow-up: ↓ 4   Changed 3 years ago by vzholudev

Actually, I tried to set the property to Saxon, still the same error messages. I google a bit and found out that this type of error could happen when somebody includes binary data (like images) into XML via XSLT.

Why default transformation method doesn't work, and why the custom method with explicit providing default XSLT works? BEcause there are slightly two different mechanisms for these 2 cases. In latter, XOM might be doing something that eliminates those errors. The disadvantage of the latter approach is that included xslts cannot be resolved inside a jar file.

in reply to: ↑ 3   Changed 3 years ago by clange

Replying to vzholudev:

Actually, I tried to set the property to Saxon, still the same error messages. I google a bit and found out that this type of error could happen when somebody includes binary data (like images) into XML via XSLT.

Please paste any such search result here. And please say in what case you get these error messages -- only when using the "default builtin XSLT" (I'd expect that), or also when using the same XSLT "externally"? (That would be strange)

Why default transformation method doesn't work, and why the custom method with explicit providing default XSLT works? BEcause there are slightly two different mechanisms for these 2 cases. In latter, XOM might be doing something that eliminates those errors. The disadvantage of the latter approach is that included xslts cannot be resolved inside a jar file.

XOM itself doesn't do any error prevention. It is our URIResolver for the case where we do JAXP instead of XOM, which introduces errors (#461).

  Changed 3 years ago by dmisev

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.