Ticket #48 (new task)

Opened 4 years ago

Last modified 3 years ago

Implement a translation from abstract notation syntax to XML syntax

Reported by: cmueller Owned by: clange
Priority: major Milestone: zzz Future zzz
Component: [SI] NTN Version: v0.1.2
Keywords: Cc: cmueller, kohlhase, clange, frabe, dmisev
Blocked By: Blocking:
Due to close: 2009/01/31 Include in GanttChart: no
Dependencies: Due to assign:

Description

This ticket was moved from the mmlkit trac:

Implement a parser that transforms text input given in the abstract syntax for notation definitions (see  http://www.kwarc.info/cmueller/papers/mkm2008.pdf) to the XML syntax specified in the MathML specification (see  http://www.w3.org/TR/MathML3, section 8.6).

Note: Christoph Lange has already implemented a similar parser for text --> OpenMath? in Java.

Attachments

table(5).pdf Download (9.7 KB) - added by mgrintal 4 years ago.
Notation Declaration in ASCII

Change History

  Changed 4 years ago by nmueller

  • cc cmueller, kohlhase, clange, frabe added
  • component changed from Presentation to [SI] NTN
  • priority changed from trivial to major
  • milestone changed from zzz Future zzz to xxx Branch - NtnASCII xxx
  • due_assign YYYY/MM/DD deleted
  • due_close changed from YYYY/MM/DD to 2009/01/31

  Changed 4 years ago by nmueller

  • type changed from potential innovation to task

  Changed 4 years ago by nmueller

[Copied form an e-mail discussion on this task]

Regarding on how to set up a parser for java you should ask Christoph Lange (cc) / Dimitar.

@Dimitar, what experience with JavaCC do you have? My existing implementation of an OpenMath?-ASCII -> OpenMath?-XML parser is here:

 https://svn.salzburgresearch.at/svn/kiwi/IkeWiki/branches/SWiM/trunk/WEB-INF/src/info/kwarc/swim/render/MObjParser.jj

I think we should be able to reuse most of it for the notations. Of course I don't insist on DOM output; I just needed it for technical reasons in SWiM ;-)

@Maja: JavaCC has some easy examples to get started with. The file linked above is not the first thing you should look at ;-)

FYI (@Maja, don't be scared, this is not part of your first task!): The reverse direction, which is not yet part of THIS task for the notations, is done by a straightforward XSLT implementation in  https://svn.salzburgresearch.at/svn/kiwi/IkeWiki/branches/SWiM/trunk/WEB-INF/src/at/srfg/ikewiki/render/transform-omdoc-editor.xsl

In the longer run, we might have the possibility to get a dedicated notation editor from Alberto, who could implement this as a variant of his formula editor. But at the moment he has different priorities. Our experience from THIS task will however be valuable input for that. See  https://trac.mathweb.org/sentido/ticket/27.

  Changed 4 years ago by nmueller

  • owner changed from nmueller to mgrintal

  Changed 4 years ago by mgrintal

  • cc dmisev added
  • status changed from new to assigned
  • version set to v0.1.2

Changed 4 years ago by mgrintal

Notation Declaration in ASCII

follow-up: ↓ 7   Changed 4 years ago by mgrintal

I attached the Notation declaration in ASCII corresponding to the notation declaration given in  http://www.kwarc.info/cmueller/papers/mkm2008.pdf. Let me know if you have any suggestions.

in reply to: ↑ 6   Changed 4 years ago by clange

Replying to mgrintal:

I attached the Notation declaration in ASCII corresponding to the notation declaration given in  http://www.kwarc.info/cmueller/papers/mkm2008.pdf. Let me know if you have any suggestions.

Cool! Here are my suggestions:

  • symbols: why not s(n,n,n)? Should be possible; see below for the "s" in "symbol jokers". Or maybe the more OpenMath?-like syntax n/n#n (where the first n is the cdbase, the second n the cd, and the third one the name). (@All: Do we want relative symbol names, i.e. just cd#name or name, when the rest is known from the context?)
  • variables: here, we could also be creative.  In my formula syntax, I say, for example, that _var is a variable of name "var", and that _ is not allowed for anything else. Just an example.
  • applications: I think we can keep the @, as it is perfect ASCII. That would allow us to use a, or maybe A, for attributions, which is more intuitive.
  • jokers: I think that, in the table in the paper, the underscore is the actual syntax, as you see in the example on the following page. If it's the underscore that's important, we have to rethink the ASCII.

About contexts and renderings, I have no idea at the moment. For the last section of the table, I'd do it as in the basic syntax of popular programming languages, but that's the easiest section of the table anyway, and won't be subject to debate.

Two general remarks:

  • I tend to pledge for creativity in using more of the ASCII "vocabulary", as programming languages do, e.g. using characters like { } [ ] _ : ;, and not sticking too closely to the syntax of the paper, but the others may not agree with me.
  • We should try to rewrite the long example from page 5 in the ASCII syntax and see if that works. (@Others: Want me to create an additional ticket for that?)

BTW, let me give you an example about the syntax that I developed for formulas earlier, just to give you an idea of that:

B(A([color=blue], binders#lambda), [ _y ], @(P, @(arith1#times, A([type=integer], 2), _y)))

Which means something like \lambda y . P(2y) in LaTeX, where 2 and \lambda are attributed with something.

Note that I would like the ASCII syntaxes for formulas and the one for notations to be somewhat similar, but that doesn't mean that I'm not willing to revise mine, once we agree on something that's better than its current state.

  Changed 4 years ago by clange

Hi all! I'd like some more discussion about this. Florian and Michael, particularly from you, because you designed the syntax in the paper. From implementing an ASCII syntax for OpenMath? I do have some experience, and a good understanding of how this notation syntax could be done, but on the other hand, …

  • the notation syntax is a bit more complex than OpenMath?, and there are aspects where I'm not an expert.
  • my personal taste, despite being backed by some experience, might not be shared by all JOMDoc stakeholders.

So please make your comments! Once we have agreed on the syntax, Maja can actually start to implement the final version.

follow-ups: ↓ 10 ↓ 12   Changed 4 years ago by frabe

Hi,

The abstract syntax in the paper was designed by me. I should clarify one thing: I wrote the syntax very quickly so that there be something we can build on. I have no idea what syntax is actually implemented. As far as I know there is no paper-like description of the implemented syntax.

That being said, maybe these ideas help:

Patterns:

  • symbol: cdbase?cd?name with components optionally empty, and trailing ?s optional. This is the syntax of MMT. See the paper I announce in a separate email.
  • variable: name
  • attribution: term{key1 -> value1, key2 -> value2, ...}
  • application: (function arg1 arg2 ...)
  • binding: (binder [var1, var2, ...] scope)
  • jokers: %name for symbols, #name for variables, $name for objects, $(name: pattern) for lists

On a first glance, that looks parsable if some characters are not used in symbol and variable names.

Renderings:

  • XML and text literally
  • for attribute values and element content, use {} to escape to our syntax as in Scala
  • references to jokers by the joker names
  • iteration over a list: some kind of for loop

in reply to: ↑ 9   Changed 4 years ago by clange

Replying to frabe:

I wrote the syntax very quickly so that there be something we can build on. I have no idea what syntax is actually implemented. As far as I know there is no paper-like description of the implemented syntax.

What do you mean by "implemented syntax"? So far, only the XML syntax has been implemented, but looking at that is not helpful for designing an ASCII syntax.

* symbol: cdbase?cd?name with components optionally empty, and trailing ?s optional. This is the syntax of MMT. See the paper I announce in a separate email.

That will be incompatible to OpenMath?, but more scalable, so it's fine with me.

* attribution: term{key1 -> value1, key2 -> value2, ...} * application: (function arg1 arg2 ...) * binding: (binder [var1, var2, ...] scope)

Here I'd rather like to continue using my syntax, as it's closer to the abstract syntax for OpenMath?.

* jokers: %name for symbols, #name for variables, $name for objects, $(name: pattern) for lists

Sounds good to me.

  Changed 4 years ago by clange

@Maja: Even though some issues in the discussion are still open, we can nevertheless get started. Just contact me by e-mail for an appointment.

in reply to: ↑ 9   Changed 4 years ago by mgrintal

Replying to frabe:

Patterns: * symbol: cdbase?cd?name with components optionally empty, and trailing ?s optional. This is the syntax of MMT. See the paper I announce in a separate email.

Could you please send me a link from the paper I can not find the email.

* attribution: term{key1 -> value1, key2 -> value2, ...} * application: (function arg1 arg2 ...)

I would agree with Christoph to use a for attribution and @ for application.

* jokers: %name for symbols, #name for variables, $name for objects, $(name: pattern) for lists

I agree on this.

  Changed 4 years ago by clange

BTW, I collected some information on the JavaCC parser generator. Just click on that link.

  Changed 4 years ago by clange

Here is Florian's paper. (@Florian, you should also have sent it to Maja!)

 https://svn.kwarc.info/repos/kwarc/rabe/Papers/omdoc-spec/KEAPPA_08/paper.pdf

  Changed 4 years ago by clange

follow-up: ↓ 17   Changed 3 years ago by clange

Some updates, especially for Maja:

  • First of all: All developers who are interested, please put the ASCIINotationParser page on your watch list!
  • I discussed the details of the syntax with Florian. He gave a good justification for his proposed syntax for attributions, binders and applications, which convinced me. The idea is not to be compatible to OpenMath? at any cost, but to be intuitive for programmers and people who know paper mathematics. Consider e.g. (??forall [x] (??p x)).
  • Note that the parser can only distinguish a binder from an application once the [ is parsed. But that's a thing JavaCC can are about; it doesn't lead us into ambiguity.
  • @Florian, what was again you argument in favor of your attribution syntax?
  • Anyway, I remember that he suggested a shorthand syntax for a single attribute indicating the type of some term, if the key is a default one. Something like term:type, which would expand to term{some?type?key : type}. Note that we're not definitely decided about the ASCII symbol mapping a key to a value. -> looks intuitive to paper mathematicians, : is more common to programmers ({key:value, key2:value2} is also known from JavaScript, e.g.)
  • Indeed, a symbol looks like cdbase?cd?name, ?cd?name, or ??name.

in reply to: ↑ 16 ; follow-up: ↓ 18   Changed 3 years ago by frabe

* @Florian, what was again you argument in favor of your attribution syntax?

Attributions are terrible in OpenMath?. So I tried to come up with something nice-looking. But that wasn't such a good idea yet. So we should think about it.

The overall motivation is that

OMATTR(OMATP(?mycd?oftype,type),term)

should look as similar to

term : type

as possible.

What about this idea:

OMATTR(OMATP(key1,value1),...,OMATP(keyn,valuen),term)

is written as

{term key1 value1 ... keyn valuen}

Then a user could introduce : as a shorthand for ?mycd?oftype and the ASCII would look good.

in reply to: ↑ 17 ; follow-up: ↓ 19   Changed 3 years ago by clange

Replying to frabe:

* @Florian, what was again you argument in favor of your attribution syntax?

Attributions are terrible in OpenMath?. So I tried to come up with something nice-looking. But that wasn't such a good idea yet. So we should think about it.

I think it was already OK.

The overall motivation is that OMATTR(OMATP(?mycd?oftype,type),term) should look as similar to term : type as possible.

Let me disagree and say that I think that that is your motivation and you're biased towards types ;-) Maybe in your use case the type attribution is the most, or even the only important one, but there are other cases of attributions.

@All: What attributes have you seen so far, which ones do you consider important? -- Let me start:

  • (notation) context, if we want to encode it that way
  • dimension of a group (CAS use case from the OpenMath? spec): not important for us, but maybe important for impact in the OpenMath? community
  • "spoken or visual rendering" (OpenMath? spec) -- invalid here because
    • "spoken" is not possible with an ASCII syntax
    • concerning "visual", this seems to be an obsolete alternative to MathML parallel markup
  • color (OpenMath? spec): not relevant for us (or is it?), as we do presentation differently

What about this idea: OMATTR(OMATP(key1,value1),...,OMATP(keyn,valuen),term) is written as {term key1 value1 ... keyn valuen}

I don't agree. Imagine a long list: Without additional formatting (e.g. newlines), you would easily lose track of which word is a key and which is a value. And this is not related to any syntax of a formal language (e.g. a programming language) that the author may already be familiar with.

Then a user could introduce : as a shorthand for ?mycd?oftype and the ASCII would look good.

As type attributions are important, no doubt, we can hard-code it into the parser anyway. (A generic shorthand mechanism, as you suggest, is overkill IMHO, and applications would need to offer additional support for it.)

in reply to: ↑ 18 ; follow-up: ↓ 20   Changed 3 years ago by clange

Any further comments on this? Florian, what do you think about my comments?

in reply to: ↑ 19   Changed 3 years ago by frabe

Replying to clange:

Any further comments on this? Florian, what do you think about my comments?

So far I've only encountered two attributions that I care about: the type and an equality in a let-binder (let x = a in b). Both could be handled nicely with my above idea.

I'm not sure how useful/important others are.

  Changed 3 years ago by clange

Florian and I just realized that the table with the abstract syntax in the MKM 2008 paper and the prototype/rendering implementation of JOMDoc is not exactly equivalent. In particular, his symbol/variable/expression/literal jokers do not have a counterpart in the prototype/rendering syntax. This is not nice but we have to accept it.

@Normen, Michael: What's your opinion on that?

  Changed 3 years ago by clange

General remark: Once I'm going to integrate ASCII -> OpenMath?/notation parsing into the new KiWi?-based SWiM again, I will take care of this.

  Changed 3 years ago by clange

  • owner changed from mgrintal to clange
  • status changed from assigned to new
Note: See TracTickets for help on using tickets.