Ticket #640 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

Include cdbase in OpenMath output

Reported by: clange Owned by: vzholudev
Priority: major Milestone: Release v1.3.0
Component: System Implementation (SI) Version: v0.1.4
Keywords: Cc: kohlhase, frabe, dmisev
Blocked By: Blocking:
Due to close: YYYY/MM/DD Include in GanttChart: no
Dependencies: Due to assign: YYYY/MM/DD

Description

We would like to translate <OMS cd="..." name="..."/> into URIs (cf. #639). For that to work, we need an additional disambiguation aid: the CDBase. In OMDoc input documents this is usually not needed, as symbols are resolved via theory imports, but in the XHTML+MathML+OpenMath? output we need to make this information explicit for the JOBAD client.

(@Michael, @Florian, I'm just Ccing you FYI. You see that OpenMath?'s standard way of translating OMS to URIs can be really troublesome, especially when combined with OMDoc. I'm glad that with OMDoc 1.6 we can get rid of that mess.)

(@Slava, can you please "process" this ticket and delegate appropriate parts of the work to Dimitar? I would be available for a meeting on Monday if there are things to be discussed.)

In GenCS in TNTBase using OMDoc 1.3, the situation is always like this: Behind a cool URI like http://linkeddata.kwarc.info/gencs/dmath/en/sets-introduction, which redirects to a semi-cool TNTBase URI (e.g. http://alpha.tntbase.mathweb.org:8080/...) we have an OMDoc document, which contains a theory with the same name. The case of having more than one theory in a document does not occur in practice. It would get us into trouble, which I don't want to deal with now. OMDoc 1.6 with MMT will solve that problem for us.

When rendering an OMDoc document, JOMDoc knows what URI it has. In DOM this is done via Document.setDocumentURI/getDocumentURI; I don't recall the XOM way right now. TNTBase should set that to the "cool URI" of the document. So, JOMDoc knows that it is currently rendering e.g. http://linkeddata.kwarc.info/gencs/dmath/en/sets-introduction. And we can safely assume that the only theory in this document is named "sets-introduction". Thus we will have symbols like <OMS cd="sets-introduction" name="emptyset"/>. Therefore, the CDBase URI is the document's URI without the name of the document, i.e. http://linkeddata.kwarc.info/gencs/dmath/en. For the parallel markup output, the OMS thus has to be <OMS cdbase="http://linkeddata.kwarc.info/gencs/dmath/en" cd="sets-introduction" name="emptyset"/>, in order to make the OMS→URI construction from #639 work.

So far I have sketched this for symbols from the current document. When the current document uses symbols from another document, you somehow have to do the same construction, i.e. from a given <OMS cd="..." name="..."/> find out in what document the symbol has been declared.

Initially we can add a @cdbase attribute to every OMS. As @cdbase scopes like XML namespace (i.e. the closest ancestor @cdbase counts when there is no @cdbase on an OMS), we could post-process that and move common @cdbase values to parent nodes or even to the root of a formula.

Change History

follow-up: ↓ 2   Changed 3 years ago by vzholudev

Wouldn't it affect performance or pollute a document when not needed?

in reply to: ↑ 1   Changed 3 years ago by clange

Replying to vzholudev:

Wouldn't it affect performance or pollute a document when not needed?

Of course it will slightly decrease performance, but pollution is the wrong word. The whole parallel markup and RDFa output decreases performance, but it is needed for our services. And for the parallel markup I have just identified a thing that has simply been done wrong so far. Things first have to be right, then we can talk about performance ;-) It is wrong to have <OMS cd="sets-introduction" name="emptyset"/> in a parallel markup fragment, as that would make a client think that the URI of the symbol is http://www.openmath.org/cd/sets-introduction#emptyset, which it is not. Therefore we need this additional information. The reason why our definition lookup has worked so far is just that we were lucky not to have two theories with the same name in different places of GenCS.

BTW, are you talking about JavaScript performance or about JOMDoc performance? Deep inside JOMDoc's rendering algorithm, it should not be too expensive to make the CDBase information available, as we know what document we are rendering, and what theories from what other documents it imports.

follow-up: ↓ 4   Changed 3 years ago by vzholudev

  • owner changed from vzholudev to dmisev

I was talking about JOMDoc performance. ok, @Dimitar, can you please work on this ticket with the highest priority?

in reply to: ↑ 3   Changed 3 years ago by clange

Replying to vzholudev:

ok, @Dimitar, can you please work on this ticket with the highest priority?

Thanks for prioritizing!

One more comment: the assumption that each document contains exactly one theory of the same name is GenCS-specific and therefore should not be implemented too deeply inside JOMDoc. What I would actually imagine is that TNTBase somehow passes a "CDBase of this document" into JOMDoc, same as it passes the "URI of this document" into JOMDoc via the setBaseURI/getBaseURI mechanism (that's XOM syntax now)

  Changed 3 years ago by dmisev

  • status changed from new to assigned
  • version changed from v0.1.3 to v0.1.4
  • milestone set to Release v0.1.5

Slava you need to push this cool URI in two places in JOMDoc:

  • Document - as I understood a cdbase is different from a baseURI, so the baseURI should not be modified. Maybe you could add the cdbase as an attribute to the root element, and then JOMDoc will remove it after it has finished the processing? Or I could extend Document with a custom class which additionally has cdbase? I'm not sure which is better.
  • DetachedElement? - I'll add a cdbase field here.

follow-up: ↓ 7   Changed 3 years ago by dmisev

From my side it's done. Slava I added CDBaseDocument, you should use that setting the cdbase in it when rendering a document, and setting cdbase in a DetachedElement? when rendering a fragment.

Christoph, when a specific cdbase URI is missing, I take the cdbase from the baseURI. If this is incorrect let me know, I'll fix it.

Note that this with including the cdbases in the OpenMath? in parallel markup will only work when ImportsAware? is also used, because it uses the already cached ImportsAware? from the input document to compute the cdbases. It should be quite efficient because the needed things are already cached in the notation collection.

Btw, this was already implemented in JOMDoc (pragmatic-strict conversion), just needed some fine-tuning. Christoph you can easily test it with the client

jomdoc transform --pragmatic ...

in reply to: ↑ 6   Changed 3 years ago by clange

Replying to dmisev:

From my side it's done.

Thanks for fixing it so quickly! At the bottom, see an important question about jomdoc transform --pragmatic. Some other sections are FYI, giving you a bit more background about CDBases and URIs.

@Slava, you are taking care of this feature being used correctly in TNTBase, right?

Slava I added CDBaseDocument, you should use that setting the cdbase in it when rendering a document, and setting cdbase in a DetachedElement? when rendering a fragment.

That looks like good design, because that way the concrete CDBase to be used is determined by the application.

FYI, the whole CDBase issue is mis-designed in OMDoc 1.2/1.3, therefore the one right way of giving a symbol a CDBase so that we can decompose the URI of the symbol into CDBase, CD and name does not exist. The whole problem will be solved by OMDoc 1.6, which specifies that cdbase?cd?name is the complete URI of a symbol. The implementation of the MMT server makes sure that each of cdbase, cdbase?cd and cdbase?cd?name yields something retrievable: the document containing the theory graph, the theory, and finally the symbol.

Christoph, when a specific cdbase URI is missing, I take the cdbase from the baseURI. If this is incorrect let me know, I'll fix it.

You mean when the application doesn't set a CDBase via the above-mentioned CDBaseDocument mechanism, you use the base URI of the document, e.g. cdbase="http://domain.tld/documents/doc.omdoc"? That is reasonable.

FYI, in that case, the cdbase would just serve as a means of disambiguation for same-named theories, but one would not be able to retrieve a symbol from the URI cdbase / cd # symbol (the OpenMath? and pre-OMDoc-1.6 convention of constructing a symbol URI), but that's not our fault, because the application could have used CDBaseDocument to set a more reasonable CDBase. So far we want to use the linked data principle (i.e. retrieve things from their URIs) in TNTBase, so TNTBase will care about the right CDBase.

For further background, see  http://en.wikipedia.org/wiki/Dereferenceable_Uniform_Resource_Identifier and  http://en.wikipedia.org/wiki/Linked_Data

Note that this with including the cdbases in the OpenMath? in parallel markup will only work when ImportsAware? is also used, because it uses the already cached ImportsAware? from the input document to compute the cdbases. It should be quite efficient because the needed things are already cached in the notation collection.

I suppose that TNTBase uses ImportsAware?; is that right?

Btw, this was already implemented in JOMDoc (pragmatic-strict conversion), just needed some fine-tuning. Christoph you can easily test it with the client {{{ jomdoc transform --pragmatic ... }}}

I tested that and it works as expected. I retraced the old discussion on that in #84. Let me just emphasize once more that for CDBase output in the documents rendered by TNTBase, I don't want the other side effects of #84. That is:

  • I don't want TNTBase to call the Java equivalent of jomdoc transform --pragmatic to obtain a transformed OMDoc document, which would then only be rendered in the 2nd step. But I want the generation of CDBases to take place during the single-step rendering process.
  • During rendering, CDBases should be added to all OMSes in the parallel markup output. The input OMDoc document should not be modified. In particular, no imports elements should be removed from it.
  • I.e. the XHTML+MathML+OpenMath? output of the renderer should be the same as it would have been without this ticket, except that additionally all OMSes should carry a @cdbase attribute.

Is all that the case?

  Changed 3 years ago by dmisev

Is all that the case?

Yes, that's exactly how I implemented it, the cdbase is added in the ParallelRenderer?, separately from the imports removing stuff and only affects the content object which is rendered. It should be working with the command-line client as well, when you're using --imports-aware.

  Changed 3 years ago by clange

  • cc dmisev added
  • owner changed from dmisev to vzholudev
  • status changed from assigned to new

Just checked the cdbase in a document rendered by TNTBase. It's e.g.

<om:OMS cd="sets-operations" name="sseteq" cdbase="tntbase:/slides/dmath/en/sets-operations.omdoc" id="a4b741447-2b3d-49e5-93e3-9d3e2c419535-fun" />

which is wrong, as it reflects the TNTBase-internal URL of the document. Instead it should be the external "cool" URI of the document, i.e.:

cdbase="http://linkeddata.tntbase.org/slides/dmath/en/sets-operations"

@Slava, can you somehow pass that parameter into JOMDoc? (Similarly to the current passing into Krextor)

But note that this is less urgent now. We will survive the ESWC demo without that fix, as definition lookup bypasses the linked data mechanisms (which is good for us, as it allows us to do a better customization of how it works). But it will be annoying for external users and linked data crawlers, as they can currently not make any sense out of our OMSes.

follow-up: ↓ 11   Changed 3 years ago by dmisev

  • status changed from new to closed
  • resolution set to fixed

I think this can be closed

in reply to: ↑ 10 ; follow-up: ↓ 12   Changed 3 years ago by clange

Replying to dmisev:

I think this can be closed

From your e-mail I conclude that you do now include the document name in the cdbase, in order to account for cases when a document contains more than one theory, and the name of the theory is not the same as the name of the document. If I got this right, then the ticket can indeed be closed.

Other tickets also depend on this cool URI schema change, but they can be resolved independently from this one:

in reply to: ↑ 11   Changed 3 years ago by dmisev

Replying to clange:

Replying to dmisev:

I think this can be closed

From your e-mail I conclude that you do now include the document name in the cdbase, in order to account for cases when a document contains more than one theory, and the name of the theory is not the same as the name of the document. If I got this right, then the ticket can indeed be closed.

Yes, exactly

Note: See TracTickets for help on using tickets.