Ticket #84 (closed task: fixed)

Opened 5 years ago

Last modified 4 years ago

Conversion between IMPORT and CDBASE

Reported by: cmueller Owned by: dmisev
Priority: major Milestone: Release v0.1.2
Component: System Integration Version: v0.1.2
Keywords: Cc: nmueller, kohlhase, clange, frabe
Blocked By: Blocking:
Due to close: Include in GanttChart: no
Dependencies: Due to assign:

Description

From our meeting on imports and cdbase, I have identified to important features needed by the panta rhei system to read in and output OMDoc documents:

For the IMPORT

JOMDoc can be called to produce documents with cdbase in all OMS elements. It takes the import-URL and extracts the respective cdbase and cd values. panta rhei thus needs to call JOMDoc for each imported document, which will slow down the import tremendously. However, it prevents us from re-implementing JOMDoc functionality in panta rhei

For the OUTPUT (used for the conversion)

panta rhei produces the following OMDoc. It does not construct IMPORT elements:

<omdoc>
<theory>
<OMOBJ>
<OMS cdbase="http://openmath.org" cd="arith1" name="minus"/>
<OMV name="x"/>
<OMV name="y"/>
</OMOBJ>
</theory>
</omdoc>

However, the cdbase and cd are interpret as implicit imports. When reading in the document from panta rhei, JOMDoc constructs the value of the IMPORT for attribute from the cdbase and cd attributes.

Change History

  Changed 5 years ago by cmueller

for further details on the side of panta rhei see also:  https://trac.kwarc.info/panta-rhei/ticket/408

  Changed 5 years ago by dmisev

  • status changed from new to assigned

Ok, after some thinking and baffling around, all this with the imports comes down to what we discussed with Christoph some time ago about the T notation source. So I'll open a new ticket about that idea, and then this translation OMS <-> imports will be very easy to do.

  Changed 5 years ago by cmueller

see also ticket:76

  Changed 5 years ago by cmueller

Can you please link to that ticket and give me an update on the implementation status.

I'd like to know, whether ...

(1) We can call the latest JOMDoc client and include cdbase attributes, which are then converted to import-elements by JOMDoc

(2) We can call JOMDoc to convert Michael's slides (without cdbase attributes) into slides with cdbase attributes

As soon as this is provided by JOMDoc, we'll test and integrate it in panta rhei! Thank you.

  Changed 5 years ago by dmisev

  • cc kohlhase, clange added

The general algorithms I came up with for these conversions. Please check if I missed something :-)

imports to cdbases

  • cd is this theory or a theory in this document, then the cdbase is the path to this document
  • cd is a theory not in this document, then
    1. search for cd in the imports in the outer theories which contain this theory. If found then construct cdbase from the 'from' attribute
    2. if not found, then for each import (going backwards) resolve the theory it points to, and repeat 1. for the resolved theory

I'm just not sure here whether to do 2. in a BFS or DFS way. I think that I should do it as depth first in order to have proper shadowing.

cdbases to imports

This is fairly simple to do.

  Changed 5 years ago by kohlhase

this algorithm is correct. Except that to get around the BFS/DFS question we decided last time in my office that it is forbidden to have an import clash, i.e. a mapping from a cd name to two different cdbases somewhere up the tree.

I am not sure whether you want to implement this the way you are saying. I would probably construct the current environment (as a partial function from cd names to cdbases) as we recursively descend through the document. This would seem more efficient and would make it simpler to find the errors talked about above.

  Changed 5 years ago by dmisev

Christine, can you please attach some examples from panta rhei on which I can test?

  Changed 5 years ago by cmueller

I am happy to do so, but currently have a quite tight schedule. I have tagged this ticket and will respond asap.

follow-up: ↓ 10   Changed 5 years ago by dmisev

Sure no problem, I can make a couple of simple tests in the meantime ;-)

in reply to: ↑ 9   Changed 5 years ago by nmueller

Replying to dmisev:

Sure no problem, I can make a couple of simple tests in the meantime ;-)

I am very happy to hear that! Hence you are making process with the cdbase/ import-traversal issue. Very good job, man!

follow-ups: ↓ 12 ↓ 13   Changed 5 years ago by cmueller

I am happy to give you testcase but I first need to know about the interface for the functionality:

For simplicity, I distinguish (please note that this is not necessary the official use of strict and pragmatic):

  • strict OMDoc, i.e. OMDoc with import-elements
  • pragmatic OMDoc, i.e. OMDoc without import-elements

Alternative 1

The rendering command only accepts strict OMDoc, i.e. OMDoc with import-references. Thus, panta rhei has to call JOMDoc twice during the import of OMDoc content and for the adaptation of panta-material:

  • During the import panta rhei first calls JOMDoc to convert strict OMDoc into pragamatic OMDoc (i.e. remove the import elements and construct cdbase/cd/name attributes).
  • For the adaptation, panta rhei calls JOMDoc to produce strict OMDoc, i.e. to convert cdbase/cd/name attributes into import-reference.

After the pre-processing, it executes the rendering.

Alternative 2

panta rhei can use JOMDoc without any change to the system, i.e. if JOMDoc receives (pragmatic) OMDoc input it first converts it to strict OMDoc, i.e. JOMDoc creates the import-references from cdbase/cd/name attributes. If JOMDoc receives strict OMDoc, no additional conversions are needed as the import-references are already provided.

However, also if this alternative is available, it would be preferable if an interface for the conversion from strict to pragmatic and vice versa is available. So basically, alternative 2 builds on the functionality for the pre-processing in alternative 1 only that the pre-processing would be hidden from the user.

Please ask me, if the above is unclear or confusing and I'll try my best to express myself better.

in reply to: ↑ 11 ; follow-up: ↓ 15   Changed 5 years ago by clange

Replying to cmueller:

= Alternative 1 = The rendering command only accepts strict OMDoc, i.e. OMDoc with import-references.

Note that there are formats that do not have import references, but that we would still like JOMDoc to render -- think of OpenMath? CDs. Therefore, I assume that, here, we are talking about a case where JOMDoc knows that the input is OMDoc, and not just any document containing OMOBJs.

@Dimitar, you mentioned the "T source" earlier. What notation source are we assuming here, is it already a T source?

* During the import panta rhei first calls JOMDoc to convert strict OMDoc into pragamatic OMDoc (i.e. remove the import elements and construct cdbase/cd/name attributes).

I don't understand why panta rhei internally needs pragmatic OMDoc.

* For the adaptation, panta rhei calls JOMDoc to produce strict OMDoc, i.e. to convert cdbase/cd/name attributes into import-reference. After the pre-processing, it executes the rendering.

If you say that JOMDoc needs strict OMDoc (in the setting that we assume here), there must be some other use that panta rhei has for pragmatic OMDoc. Is it needed for some adaptations that are done without the help of JOMDoc?

Note that such mandatory conversions add even more overhead in a SWiM setting where users edit documents. Let me talk very generally, as I do not fully understand why you need to convert. But let me assume that the user authors a document in format A, and the JOMDoc renderer needs format B. Then, SWiM would have to convert the document from A to B after every edit. As a document is rendered at least as often as it is edited, it makes sense to cache the B version in addition to the A version, which the author sees.

in reply to: ↑ 11 ; follow-up: ↓ 14   Changed 5 years ago by dmisev

Replying to cmueller:

Please ask me, if the above is unclear or confusing and I'll try my best to express myself better.

It's all clear, I just didn't know about strict/pragmatic, so I was baffling with some weird names like c2i, i2c.. :-)

The way I do it is most probably alternative 1, I added two options to the transform command for converting to strict or pragmatic, and then the external system will use these as needed before rendering. But it's not a problem to add a shortcut option to the render command for automatically doing this conversion.

in reply to: ↑ 13 ; follow-up: ↓ 19   Changed 5 years ago by cmueller

  • cc frabe added

Replying to dmisev:

Replying to cmueller:

Please ask me, if the above is unclear or confusing and I'll try my best to express myself better.

It's all clear, I just didn't know about strict/pragmatic, so I was baffling with some weird names like c2i, i2c.. :-)

Please note that I only used strict and pragmatic for illustration. However, Florian and Michael might use these terms a bit different. So before using these distinction inside JOMDoc, we should talk to them whether this already fits in this context. @Florian: can you please comment?

The way I do it is most probably alternative 1, I added two options to the transform command for converting to strict or pragmatic, and then the external system will use these as needed before rendering. But it's not a problem to add a shortcut option to the render command for automatically doing this conversion.

I prefer alternative one as well. Before integrating any shortcut, please talk to Normen, whether he approves this kind of extensions (is this good design?). However, for panta rhei this would be an extreme increase in efficiency, as the system only has to call JOMDoc once. So as a developer of the system, I have to vote for the shortcuts.

in reply to: ↑ 12 ; follow-up: ↓ 16   Changed 5 years ago by cmueller

Replying to clange:

Replying to cmueller:

= Alternative 1 = The rendering command only accepts strict OMDoc, i.e. OMDoc with import-references.

Note that there are formats that do not have import references, but that we would still like JOMDoc to render -- think of OpenMath? CDs. Therefore, I assume that, here, we are talking about a case where JOMDoc knows that the input is OMDoc, and not just any document containing OMOBJs.

I was aware of this, thus I wrote "OMDoc" and assumed that JOMDoc will make a distinction of OMDoc input or other XML, e.g. OpenMath? CDs. @Dimitar: Do you do that, i.e. handle OMDoc different from arbitrary XML? If so, how is this done? Based on the file-extension or do you make use the Mime-Type of the input?

I don't understand why panta rhei internally needs pragmatic OMDoc.

I will try to explain, let me know whether this is clearer to you afterwards.

Conversion

panta rhei manages notation definitions for an  notation survey. In particular, the system maintains a set of OpenMath? example which match to the notation definition and allow to generate example-notations for the survey. For the conversion, I currently simply take a set of OpenMath? expression I want to convert, create an OMDoc file (see below) an pass it to JOMDoc. I do not create any imports as this will be an overhead for the system.

$inDoc= "
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<omdoc xml:id=\"panta-omdoc-$date\" modules=\"CD\" version=\"1.8\" xmlns=\"$omdocNS\"
       xmlns:om=\"http://www.openmath.org/OpenMath\" 
       xmlns:dc=\"http://purl.org/dc/elements/1.1/\"
       xmlns:cc=\"http://creativecommons.org/ns\" 
      xmlns:m=\"http://www.w3.org/1998/Math/MathML\">"; 
for ($i=0;$i<count($snippet_array);$i++){
  if (isset($snippet_array[$i])){
    $inDoc .= $snippet_array[$i];
    $length++;
  }
}
$inDoc .="</omdoc>";

So basically, I am using OMDoc as a container for the conversion-objects. But maybe I should not use OMDoc for this, but rather arbitrary XML (e.g. a panta-rhei XML). Then this would not be recognized as OMDoc by JOMDoc, and treated differently - then I would not need to call JOMDoc to convert the above to a more strict version ...

Extraction

I am not sure about this: But I think that we said that the OMS in OMDoc will no longer have a cdbase/cd/name attribute but be limited to cd/name as the cdbase is given by the import, correct? For panta rhei, this is a problem, as the system extracts all symbols from the imported OMDoc for its  notation survey. Here I want to store the cdbase/cd/name attributes, thus I wanted to use JOMDoc to convert imports back into these triples to ease up the extraction in panta rhei.

* For the adaptation, panta rhei calls JOMDoc to produce strict OMDoc, i.e. to convert cdbase/cd/name attributes into import-reference. After the pre-processing, it executes the rendering.

If you say that JOMDoc needs strict OMDoc (in the setting that we assume here), there must be some other use that panta rhei has for pragmatic OMDoc. Is it needed for some adaptations that are done without the help of JOMDoc? Note that such mandatory conversions add even more overhead in a SWiM setting where users edit documents. Let me talk very generally, as I do not fully understand why you need to convert. But let me assume that the user authors a document in format A, and the JOMDoc renderer needs format B. Then, SWiM would have to convert the document from A to B after every edit. As a document is rendered at least as often as it is edited, it makes sense to cache the B version in addition to the A version, which the author sees.

I don't fully understand but currently see no problems for SWiM. You have "pure" OMDoc and deal with theories and imports. So there should be a change in using JOMDoc, right?

in reply to: ↑ 15   Changed 5 years ago by dmisev

Replying to cmueller:

@Dimitar: Do you do that, i.e. handle OMDoc different from arbitrary XML? If so, how is this done? Based on the file-extension or do you make use the Mime-Type of the input?

Why would I need to do that? These conversions are separate from the rendering.

Replying to clange:

@Dimitar, you mentioned the "T source" earlier. What notation source are we assuming here, is it already a T source?

No not yet, this is needed by panta rhei as Christine explained. I'll implement that "imports aware rendering" in a day or two, and it should be a quick job since I already got most of the needed stuff by implementing this.

I'd really appreciate if anyone can give me some real test cases (from SWiM, panta rhei, GenCS slides?, anything at all) on which you'd use this imports/cdbase conversion and then imports aware rendering, so that I spent less time on writing XML and guessing and more on coding.

  Changed 5 years ago by dmisev

The respective commands for this conversion are:

jomdoc transform --strict - convert to strict OMDoc jomdoc transform --pragmatic - to pragmatic OMDoc

  Changed 5 years ago by nmueller

  • version changed from unknown to v0.1.2

in reply to: ↑ 14   Changed 4 years ago by frabe

Replying to cmueller:

Please note that I only used strict and pragmatic for illustration. However, Florian and Michael might use these terms a bit different. So before using these distinction inside JOMDoc, we should talk to them whether this already fits in this context. @Florian: can you please comment?

Omitted cdbase attributes are a typical example of a strict/pragmatic difference.

The difficulty for JOMDoc is that some serious thinking is needed how strict and pragmatic OMDoc should be handled in general.

  Changed 4 years ago by dmisev

  • status changed from assigned to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.