Ticket #472 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

rendered XHTML is pretty-printed but should not

Reported by: clange Owned by: dmisev
Priority: major Milestone:
Component: System Implementation (SI) Version: v0.1.3
Keywords: Cc: dmisev, vzholudev, kohlhase, cmueller, nmueller
Blocked By: Blocking:
Due to close: YYYY/MM/DD Include in GanttChart: no
Dependencies: Due to assign: YYYY/MM/DD

Description

Pretty-printing the XHTML rendered by JOMDoc is harmful. Consider nested inline markup like

<span class="a">This is <span class="b">cool</span></span>!

This is perfectly legal HTML, which frequently occurs in GenCS. JOMDoc outputs it pretty-printed with indentation, leading to sth. like

<span class="a">This is 
  <span class="b">cool</span>
</span>!

which in HTML renders as "This is cool !", as span is inline markup and not block markup, i.e. whitespace between tags counts as one space.

For now, Slava hacked his TNTBase variant of JOMDoc like this:

Index: src/jomdoc/org/omdoc/jomdoc/util/xml/XMLUtil.java
===================================================================
--- src/jomdoc/org/omdoc/jomdoc/util/xml/XMLUtil.java   (revision 963)
+++ src/jomdoc/org/omdoc/jomdoc/util/xml/XMLUtil.java   (working copy)
@@ -226,7 +226,7 @@
      */
     public static void serialize(Document xomDocument, OutputStream os) throws IOException {
         nu.xom.Serializer serializer = new nu.xom.Serializer(os);
-        serializer.setIndent(2);
+        serializer.setIndent(0);
         serializer.write(xomDocument);
     }

I could imagine the following proper solution: Pretty-printing will never be needed for production output. But it will be needed for debugging. So I'd say when the "verbose" (debug) mode is on, then JOMDoc should pretty-print. Or maybe pretty-printing or not should even be a separate option to configure.

Change History

  Changed 4 years ago by clange

BTW, @Michael: I had to try hard to find a reasonable example where the nesting of two spans is not avoidable. In our XHTML documents, we have lots of avoidable span nestings, e.g.

<span class="foo"><span class="bar">hello</span></span>

However, I have no idea how to easily get rid of them. Usually, "foo" and "bar" are generated by different XSL templates, so it would require post-processing with an additional XSLT run in order to obtain the equivalent

<span class="foo bar">hello</span>

Or consider

<span class="foo">
  <math>...</math>
</span>

Here, <math class="foo"> would also have the same effect. But as the span is generated by XSLT and the math by JOMDoc, it would be a bit harder to implement (but not impossible).

Michael, what do you recomment?

  Changed 4 years ago by dmisev

I added --no-pretty-print option for the command line and XMLUtil.serialize(document,.., boolean noPrettyPrint) methods.

follow-up: ↓ 5   Changed 4 years ago by clange

@Dimitar, Michael: As I found out now, the problem was not only this pretty-printing. I mixed up certain things when testing yesterday. So both the JOMDoc/XOM pretty-printing have to be turned off, and there must be <xsl:output indent="no"/> in the XSLT. I made that change in the XSLT now, please rebuild JOMDoc. I put all other JOMDoc users on Cc, as the new behavior may be unexpected, but it is unavoidable. (BTW, when outputting to HTML, not XML nor XHTML, XSLT processors do some heuristics and do not create whitespace around HTML inline markup -- but, as we want to use XML namespaces and MathML inside (X)HTML, we can't output to good old HTML.)

Now that means that for the pretty-print mode, we also need to tell Saxon that we do want indentation on XSLT level. Via XOM's restricted XSLTransform this is again not possible. Via JAXP's Transformer it is:

Transformer t = tf.newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");

There is also SaxonOutputKeys?.INDENT_SPACES taking an integer value for the number of spaces.

Dimitar, maybe you can also connect such a customization of the transformer, in order to make the --[no-]pretty-print option useful again.

  Changed 4 years ago by clange

  • cc cmueller, nmueller added

in reply to: ↑ 3   Changed 4 years ago by dmisev

  • status changed from new to closed
  • resolution set to fixed

Replying to clange:

Dimitar, maybe you can also connect such a customization of the transformer, in order to make the --[no-]pretty-print option useful again.

Done, when --no-pretty-print is specified there's no indentation, if it isn't specified then the transformed output is indented with 2 spaces:

transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(SaxonOutputKeys.INDENT_SPACES, "2");

Thanks for the help Christoph!

  Changed 3 years ago by clange

  • status changed from closed to reopened
  • resolution fixed deleted

Via #663 this bug is coming back. I tried

$ jomdoc render --no-pretty-print -x ~/svn/omdoc.org/jomdoc/src/jomdoc/trunk/xsl/omdoc2pmml-copymobj.xsl mathtalk-definitions.omdoc

(same setting as in #664), and still I get indentation. Indentation breaks <span>s in HTML and causes unwanted whitespaces. For HTML output we really only want indentation for debugging purposes.

  Changed 3 years ago by vzholudev

  • owner changed from nmueller to dmisev
  • status changed from reopened to new

@Dimitar, please write tests on any fixed bug or new feature.

  Changed 3 years ago by dmisev

  • status changed from new to closed
  • resolution set to fixed

I only had a test for -X, I fixed it for -x now.

Note: See TracTickets for help on using tickets.