Help:WikiText DocBookXML Conversion

From ApCoCoAWiki

Warning: This article may be deleted! We think, here are no essential information and this article is obsolete. If you disagree, please leave a message at the cocoa forum . Thank you! All articles being discussed to be obsolete can be found here.

Goal

There has been a discussion in the CoCoA team whether to use the wiki as its primary repository for the documentation or not. To convince everybody that this is a useful system it would be good to have an implementation to demonstrate that such a system is feasible.

Concept

The concept would look something like this:

  • Transfer documentation from current XML based format to WikiText. This will be done manually since there isn't any efficient way to automate this.
  • Implement software that
    • translates WikiText to DocbookXML.
    • aggregates the individual pages of a section, i.e. all the pages describing the commands of the CoCoAL.
  • Implement makefile to create documentation in required format out of DocBookXML.

There is some open source software out there that does something similar to what we need, but it would be easier to implement what we need from scratch while reusing some of the code out there. Mabshoff has developed a prototype in python that could have been demonstrated at the CoCoA Meeting at Dortmund University, Germany (26.9 - 30.09.2005).

Obviously we should initially test this on a couple of pages from the current CoCoA documentation and go ahead with the conversion if everybody agrees that it works.

Offline-Editing

In order to enable Anna to do offline editing of the Wiki-pages we came up with a proposal: There are three phases:

Phase 1 - Pull:

  • Create new directory in DocRoot, i.e. 2005-09-30
  • Open list of Wikipages to edit offline and for each one:
    • extract wikitext to Name_of_page.wikiml
    • extract changes with user and timestamp and save in Name_of_page.edits
    • md5sum Name_of_page.wikiml and save into Name_of_page.md5sum
    • create empty Name_of_page.comment with format
minor = 0
comment = NONE

Phase 2 - Edit

  • user edit Name_of_page.wikiml as he/she pleases.

Phase 3 - Push

  • user gives push command and a directory as parameter.
  • find all *.wikiml and make sure that *.edits, *.md5sum & *.comment exist.
  • check if md5sum of Name_of_page.wikiml has changed. If not proceed with next Name_of_page.wikiml.
  • Since Name_of_page.wikiml has changed get current online-version from wiki and extract wikitext to Name_of_page.wikiml.current
  • md5sum Name_of_page.wikiml.current and compare to original md5sum of unedited Name_of_page.wikiml. If they are identical push Name_of_page.wikiml into wiki. Make sure that comment in Name_of_page.comment is not empty and set minor edit depending whether minor == 0. If comment == NONE ask for comment. Proceed with next Name_of_page.
  • if md5sum of Name_of_page.wikiml.current has changed from the original unedited Name_of_page.wikiml, diff Name_of_page.wikiml and Name_of_page.wikiml.current and ask what to do, i.e. push anyway, skip or drop. Proceed with next Name_of_page.

What do we use for implementation?

python 2.3 or higher. We don't need that many features but we should use a decent XML-parser.

What is implemented?

  • Pull: Getting XML out of Wiki and ripping out interesting bits is done in a prototype.
  • Edit: Nothing to see here, go along.
  • Push: Nothing yet, shouldn't be too my trouble, since it is all basic file-IO or html-push with clever parameters.

What needs to be specified?

Some Meta-information, i.e. user-account and password in wiki, directory where to store files, etc. as needed. We should use some format like

key=value;

Building documentation

There are two possible way to do things:

a) Templates: Per unit of the documentation

  • {{Blah: PageA}}
  • {{Blah: PageB}}

b) Manually Extract individual pages and merge them after conversion to DocBookXML.

WikiML -> Man, Texi, HTML, PDF, ... conversion

Initially we plan to support only a small subset of the WikiML-tags:

= =
== ==
=== ===
\   (verbatim beginning with spaces)
* (list, only one level deep at the moment)
# (numbered list, only one level deep at the moment)
[[Blah]] [[Blah|Link]] [[Blah Link]]
<code>
<nowiki>
Tex-Formula/MathML

Ignore certain tags: {{stub}}, {{COPYRIGHT}}

Edit history

  1. Pull individual pages (see above).
  2. Iterate over Name_of_page.edits and extract edits since a given date into seperate XML or text files.
  3. open editor to inspect list of edits, do changes as needed, i.e. remove Spam reverts (might be automated over keywords) or minor edit
  4. convert list to html, latex, pdf, whatever


useful links

meta.wikipedia.org/wiki/Alternative_parsers

Resources

  • Not all tags of the WikiText specification will be supported, at least not initially. Hence we have a testpage to demonstrate which tags we currently support.

ToDo

  • write more code and do some more testing