Changeset 1466

Show
Ignore:
Timestamp:
01/23/10 17:17:43 (3 years ago)
Author:
clange
Message:

more "spin" for intro

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • doc/pubs/eswc-demo10/gencs-lod.tex

    r1465 r1466  
    33% Draft? 
    44\newif\ifdraft 
    5 \drafttrue 
    6 % \draftfalse 
     5% \drafttrue 
     6\draftfalse 
    77 
    88% \usepackage[english]{babel} 
     
    2323\fi 
    2424 
    25 \usepackage{biblatex} 
     25\usepackage[firstinits=true,mincrossrefs=3,minnames=3,maxnames=3]{biblatex} 
     26\makeatletter 
     27\AtBeginBibliography{% 
     28\setcounter{maxnames}{100} 
     29} 
     30\makeatother 
    2631\bibliography{kwarc} 
    2732 
     
    127132 
    128133\begin{abstract} 
    129   Lecture notes, particularly such with mathematical formulÊ, are often written   non-semantically in {\LaTeX} and thus only really useful for reading and printing.  We   present a setup that converts them to semantic markup and exposes them as Linked Open   Data using {\xhtml}+{\mathml}+\linebreak[1]{\rdfa}.  Our demo application makes the   resulting documents interactively browsable.  All components of our setup are   reusable; therefore, we also discuss how to reuse them in other applications. 
     134  Lecture notes, particularly such with mathematical formulÊ, are often written   non-semantically in {\LaTeX} and thus only really useful for reading and printing.   Important questions of learners (“what does the $\vDash$ symbol mean?”, “what examples   are related to structural induction?”) and lecturers (“what new concepts can I   introduce in next semester's lecture, given the students' previous knowledge?”) cannot   be answered.  We convert a corpus of {\LaTeX} lecture notes to semantic markup and   expose them as Linked Open Data in   {\xhtml}+\linebreak[1]{\mathml}+\linebreak[1]{\rdfa}.  Our demo application makes the   resulting documents interactively browsable, and our ontology enables query answering   and paves the path towards an integration of our corpus with external data sources. 
    130135\end{abstract} 
    131136 
    132 \ifdraft 
    133 Message\ednote{CL: just FYI, delete for final version}: 
    134 \begin{itemize} 
    135 \item Semantic Web technologies (RDF, RDFa, Linked Open Data) are the best, and they   help us to get our problems solved 
    136 \item We must not advertise sTeX and OMDoc as better alternatives to anything the   semantic web community loves, but present them as means to an end, as intermediate   formats that help us to obtain XHTML+RDFa from {\LaTeX} 
    137 \item Copy/paste from any of our existing MKM literature does not work, they won't   understand it. 
    138 \item parallel presentation/content math markup is similar to RDFa, is Linked Open Data,   just differently encoded -- we are among the first to really operationalize parallel   markup 
    139 \end{itemize} 
    140 \fi 
    141 \section{Application} 
     137\section{Application: Computer Science Lecture Notes} 
    142138\label{sec:application} 
    143139 
    144 One of the authors (Michael Kohlhase) has been giving computer science lectures – a general first-year introduction and specialized lectures on logics – over a time of seven\ednote{CL@MK: correct figures} years and accumulated more than 3,000 slides written in {\LaTeX}.\ednote{CL@MK: I counted \textbackslash begin\{frame\}.  FYI, this   includes talks and presentations, but we don't have space for a detailed elaboration.} {\LaTeX} has proven suitable for writing high-quality lecture notes – both slides and handouts – and publishing them as PDF, especially in our setting with a lot of mathematical formulÊ.  However, the PDF output is barely usable for anything \emph{beyond} on-screen reading and printing, as it does not have any semantic structure.  Save for a few exceptions, such as \verb|\title| for the title of a document or \verb|\frac{a}{b}| for a fraction, {\LaTeX} does not allow the author to semantically annotate a document, and where it does, the semantics is not preserved in the output.\footnote{We do not consider the mere sectional structure of a document   (\texttt{\textbackslash section} etc.), which \emph{is} preserved in PDF, semantic.   The usual reader does not want to know where section 1 is, but where the basics of set   theory are introduced.}  This is particularly unfortunate for mathematical symbols, as most of them are not easily retrievable via full-text search, and as they are often overloaded with multiple definitions – consider $+$ for adding integers vs.\ vectors – or presented using multiple notations – consider $\binom{n}{k}$ vs.\ $\mathcal{C}^k_n$ for the binomial coefficient –, which can be an issue in a large corpus written by multiple authors having different backgrounds. Therefore, we have developed a semantic representation of mathematical knowledge in {\LaTeX} and a presentation process that preserves these semantic structures as Linked Open Data in the output, making them amenable to mashups that offer interactive exploration and thus promote a deeper understanding of mathematical structures, such as “what does the $\vDash$ symbol mean and where is it defined?”. 
     140One of the authors (Michael Kohlhase) has been giving computer science lectures – a general first-year introduction and graduate logics courses – for the past seven years and accumulated more than 3,000 slides.\ednote{CL@MK: I counted \textbackslash   begin\{frame\}.  FYI, this includes talks and presentations, but we don't have space   for a detailed elaboration.} {\LaTeX} has proven suitable for writing high-quality lecture notes and publishing them as PDF, especially in our setting with a lot of mathematical formulÊ.  However, the PDF output is barely usable for anything \emph{beyond} on-screen reading and printing. Important questions that occur to students while learning, and to lecturers while preparing their lecture notes, cannot be answered based on the {\LaTeX} or PDF data. Students often wonder what a symbol (e.\,g.\ $\vDash$) in a formula means, or where to find examples about a difficult concept, such as structural induction.  A lecturer preparing a lecture for the upcoming semester would often like to know what existing content from the repository is reusable, given the prerequisites that students are expected to meet.  Pure {\LaTeX} does not support semantic annotation to the extent required for answering such queries. \verb|\title| is a rare example of a command making explicit what its argument means. \verb|\frac{a}{b}| is a rare example of a formula whose representation focuses on functional structure instead of visual layout.  Symbols in formulÊ are even less trivial, as they are often overloaded with multiple definitions or presentable using different notations. $\binom{n}{k}$ can be a two-dimensional vector or a binomial coefficient, and in the latter case the French or the Russians would write it as $\mathcal{C}^k_n$.  We have developed a semantic representation of mathematical knowledge in {\LaTeX} and a presentation process that preserves these semantic structures as Linked Open Data in the output, making them amenable to mashups that offer interactive exploration, as well as semantic searching and querying. 
    145141   
    146142\section{Research Background and Related Work} 
    147143\label{sec:research} 
    148144 
    149 The importance of {\LaTeX} in scientific authoring and its extensibility by macros has motivated research on semantic extensions enabling modern publishing workflows â€“ not only our own s{\TeX} (see below).  SALT (Semantically annotated {\LaTeX}~\cite{Groza:SALT07}) marks up rhetorical structures and fine-grained citations in scientific documents; however, its vocabulary is not extensible, whereas s{\TeX} offers macros for introducing new mathematical symbols.  Research on mathematics e-learning has led to the interactive systems ActiveMath~\cite{Melisetal-SemanticAware-BJET-2005} and MathDox~\cite{CuypCoheKnop2008g4}, in which students can explore lecture notes adapted to their previous knowledge and interactively solve exercises.  These systems draw on a semantic representation of mathematical formulÊ and higher-level structures, such as proof steps or dependencies of course modules, in standardized XML-based languages, such as OpenMath~\cite{BusCapCar:2oms04} and OMDoc~\cite{Kohlhase:omdoc1.2}.  They utilize the semantic structures of mathematical knowledge but do not preserve it in their HTML output; thus, it is not generally interchangeable with other systems on the web.  The Linked Data movement promotes best practices for publishing semantic data on the web~\cite{LinkedDataGuidesTutorials}; they can be published as standalone RDF, or embedded into HTML documents as RDFa~\cite{AdidaEtAl08:RDFa}.  The Sparks Ozone Browser is an example of a mashup that utilizes RDFa annotations in HTML documents for interactive browsing~\cite{BCL:OzoneBrowserSemanticOverlays09}.  The design of our interactive documents is similar but additionally supports annotations in MathML formulÊ.  MathML has pioneered embedded annotations long before RDFa.  Since 1998, it has supported \emph{parallel markup}, interlinking both the rendered appearance and the semantic\ednote{CL@MK: let's not introduce the term ``content'' markup here; they don't   understand that.} structure of mathematical expressions, where the meaning of mathematical symbols is usually defined in lightweight ontologies called OpenMath Content Dictionaries~\cite{W3C:MathML3:biblatex}.  However, hardly any mathematical software that uses a semantic representation of mathematical formulÊ internally also exposes it on the web\ednote{CL@MK: do you know more?  Are there any other systems that   seriously output parallel markup?}.  The situation is worse for higher-level structures.  In the early days of the Semantic Web, HELM (Hypertext Electronic Library of Mathematics~\cite{APSGS:MKM-HELM03}) pioneered the use of RDF for representing structures of mathematical knowledge, such as in what mathematical theory a symbol is introduced, what of its properties have been declared or asserted, and how the latter are proved.  Again, these semantic structures were not preserved in documents published on the web, thus only accessible to services running in the backend. 
     145The importance of {\LaTeX} in scientific authoring and its extensibility by macros has motivated research on semantic extensions enabling modern publishing workflows.  SALT (Semantically annotated {\LaTeX}~\cite{Groza:SALT07}) marks up rhetorical structures and fine-grained citations in scientific documents; however, its vocabulary is not extensible, whereas our own s{\TeX} (see below) offers macros for introducing new mathematical symbols.  Research on mathematics e-learning has led to the interactive systems ActiveMath~\cite{Melisetal-SemanticAware-BJET-2005} and MathDox~\cite{CuypCoheKnop2008g4}, in which students can explore lecture notes adapted to their previous knowledge and interactively solve exercises.  These systems draw on a semantic representation of mathematical formulÊ and higher-level structures, such as proof steps or dependencies of course modules, in standardized XML-based languages, such as OpenMath~\cite{BusCapCar:2oms04} and OMDoc~\cite{Kohlhase:omdoc1.2}.  They utilize the semantic structures of mathematical knowledge but do not preserve it in their HTML output; thus, it is not generally interchangeable with other systems on the web.  The Linked Data movement promotes best practices for publishing semantic data on the web~\cite{LinkedDataGuidesTutorials}; they can be published as standalone RDF, or embedded into HTML documents as RDFa~\cite{AdidaEtAl08:RDFa}.  The Sparks Ozone Browser is an example of a mashup that utilizes RDFa annotations in HTML documents for interactive browsing~\cite{BCL:OzoneBrowserSemanticOverlays09}.  The design of our interactive documents is similar but additionally supports annotations in MathML formulÊ.  MathML has pioneered embedded annotations long before RDFa.  Since 1998, it has supported \emph{parallel markup}, interlinking both the rendered appearance and the semantic\ednote{CL@MK: let's not introduce the term ``content'' markup here; they don't   understand that.} structure of mathematical expressions, where the meaning of mathematical symbols is usually defined in lightweight ontologies called OpenMath Content Dictionaries~\cite{W3C:MathML3:biblatex}.  However, hardly any mathematical software that uses a semantic representation of mathematical formulÊ internally also exposes it on the web\ednote{CL@MK: do you know more?  Are there any other systems that   seriously output parallel markup?}.  The situation is worse for higher-level structures.  In the early days of the Semantic Web, HELM (Hypertext Electronic Library of Mathematics~\cite{APSGS:MKM-HELM03}) pioneered the use of RDF for representing structures of mathematical knowledge, such as in what mathematical theory a symbol is introduced, what of its properties have been declared or asserted, and how the latter are proved.  Again, these semantic structures were not preserved in documents published on the web, thus only accessible to services running in the backend. 
    150146 
    151147\section{Architecture and Demo}