Difference between revisions of "Help:WikiText DocBookXML Conversion"

From ApCoCoAWiki
(add input from session with Michael, Karsten & Anna)
(marked as trash)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{trash_bin}}
 +
 
=Goal=
 
=Goal=
  
Line 11: Line 13:
 
** translates WikiText to DocbookXML.
 
** translates WikiText to DocbookXML.
 
** aggregates the individual pages of a section, i.e. all the pages describing the commands of the CoCoAL.
 
** aggregates the individual pages of a section, i.e. all the pages describing the commands of the CoCoAL.
* Implement makefile need to create documentation in required format out of DocBookXML.
+
* Implement makefile to create documentation in required format out of DocBookXML.
  
There is some open source software out there that does something similar to what we need, but it would be easier to implement what we need from scratch while reusing some of the code out there. [[User:Mabshoff|Mabshoff]] has developed a prototype in python that could be demonstrated at the CoCoA Meeting at Dortmund University, Germany (26.9 - 30.09.2005).
+
There is some open source software out there that does something similar to what we need, but it would be easier to implement what we need from scratch while reusing some of the code out there. [[User:Mabshoff|Mabshoff]] has developed a prototype in python that could have been demonstrated at the CoCoA Meeting at Dortmund University, Germany (26.9 - 30.09.2005).
  
 
Obviously we should initially test this on a couple of pages from the current CoCoA documentation and go ahead with the conversion if everybody agrees that it works.
 
Obviously we should initially test this on a couple of pages from the current CoCoA documentation and go ahead with the conversion if everybody agrees that it works.
Line 35: Line 37:
  
 
Phase 3 - Push
 
Phase 3 - Push
* user gives push command and a directory as paramter.
+
* user gives push command and a directory as parameter.
 
* find all *.wikiml and make sure that *.edits, *.md5sum & *.comment exist.
 
* find all *.wikiml and make sure that *.edits, *.md5sum & *.comment exist.
 
* check if md5sum of Name_of_page.wikiml has changed. If not proceed with next Name_of_page.wikiml.
 
* check if md5sum of Name_of_page.wikiml has changed. If not proceed with next Name_of_page.wikiml.
 
* Since Name_of_page.wikiml has changed get current online-version from wiki and extract wikitext to Name_of_page.wikiml.current
 
* Since Name_of_page.wikiml has changed get current online-version from wiki and extract wikitext to Name_of_page.wikiml.current
 
* md5sum Name_of_page.wikiml.current and compare to original md5sum of unedited Name_of_page.wikiml. If they are identical push Name_of_page.wikiml into wiki. Make sure that comment in Name_of_page.comment is not empty and set minor edit depending whether minor == 0. If comment == NONE ask for comment. Proceed with next Name_of_page.
 
* md5sum Name_of_page.wikiml.current and compare to original md5sum of unedited Name_of_page.wikiml. If they are identical push Name_of_page.wikiml into wiki. Make sure that comment in Name_of_page.comment is not empty and set minor edit depending whether minor == 0. If comment == NONE ask for comment. Proceed with next Name_of_page.
* if md5sum of Name_of_page.wikiml.current has changed from the original unedited Name_of_page.wikiml diff Name_of_page.wikiml and Name_of_page.wikiml.current and ask what to do, i.e. push anyway, skip or drop. Proceed with next Name_of_page.
+
* if md5sum of Name_of_page.wikiml.current has changed from the original unedited Name_of_page.wikiml, diff Name_of_page.wikiml and Name_of_page.wikiml.current and ask what to do, i.e. push anyway, skip or drop. Proceed with next Name_of_page.
  
What do we use for implementation?
+
== What do we use for implementation? ==
  
 
python 2.3 or higher. We don't need that many features but we should use a decent XML-parser.
 
python 2.3 or higher. We don't need that many features but we should use a decent XML-parser.
  
What is implemented?
+
== What is implemented? ==
  
Pull: Getting XML out of Wiki and ripping out intersting bits is done in a prototype.
+
* Pull: Getting XML out of Wiki and ripping out interesting bits is done in a prototype.
Edit: Nothing to see here, go along.
+
* Edit: Nothing to see here, go along.
Push: Nothing yet, shouldn't be too my trouble, since it is all basic file-IO or html-push with clever parameters.
+
* Push: Nothing yet, shouldn't be too my trouble, since it is all basic file-IO or html-push with clever parameters.
  
What needs to be specified?
+
== What needs to be specified? ==
  
 
Some Meta-information, i.e. user-account and password in wiki, directory where to store files, etc. as needed. We should use some format like
 
Some Meta-information, i.e. user-account and password in wiki, directory where to store files, etc. as needed. We should use some format like
Line 62: Line 64:
 
There are two possible way to do things:
 
There are two possible way to do things:
  
a) Templates: Per unit of the documentation  
+
a) Templates: Per unit of the documentation
 
+
* <nowiki>{{Blah: PageA}}</nowiki>
<nowiki>{{Blah: PageA}}</nowiki>
+
* <nowiki>{{Blah: PageB}}</nowiki>
<nowiki>{{Blah: PageB}}</nowiki>
 
  
 
b) Manually
 
b) Manually
 
 
Extract individual pages and merge them after conversion to DocBookXML.
 
Extract individual pages and merge them after conversion to DocBookXML.
  
Line 81: Line 81:
 
  <nowiki>* (list, only one level deep at the moment)</nowiki>
 
  <nowiki>* (list, only one level deep at the moment)</nowiki>
 
  <nowiki># (numbered list, only one level deep at the moment)</nowiki>
 
  <nowiki># (numbered list, only one level deep at the moment)</nowiki>
  <nowiki>[[Blah]] [[Blah|Link]] {{Blah Link]]</nowiki>
+
  <nowiki>[[Blah]] [[Blah|Link]] [[Blah Link]]</nowiki>
 
  <nowiki><code></nowiki>
 
  <nowiki><code></nowiki>
 
  <nowiki><nowiki></nowiki>
 
  <nowiki><nowiki></nowiki>
Line 91: Line 91:
  
 
# Pull individual pages (see above).  
 
# Pull individual pages (see above).  
#( Iterate over Name_of_page.edits and extract edits since a given date into seperate XML or text-file.
+
# Iterate over Name_of_page.edits and extract edits since a given date into seperate XML or text files.
# open editor to inspect list of edits, do changes as needed,m i,e. remove Spam reverts (might be automated over key-words) or minor edit
+
# open editor to inspect list of edits, do changes as needed, i.e. remove Spam reverts (might be automated over keywords) or minor edit
 
# convert list to html, latex, pdf, whatever
 
# convert list to html, latex, pdf, whatever
  
Line 98: Line 98:
 
=useful links=
 
=useful links=
  
meta.wikipedia.org/wiki/Alternative_parsers
+
[http://meta.wikipedia.org/wiki/Alternative_parsers meta.wikipedia.org/wiki/Alternative_parsers]
  
 
=Resources=
 
=Resources=
  
* Not all tags of the WikiText specification will be supported, at least not initially. Hence we have a [[WikiText_DocBookXML_Conversion_testpage| testpage]] to demonstrate which tags we currently support.
+
* Not all tags of the WikiText specification will be supported, at least not initially. Hence we have a [[Help:WikiText_DocBookXML_Conversion_testpage | testpage]] to demonstrate which tags we currently support.
  
 
=ToDo=
 
=ToDo=
  
 
* write more code and do some more testing
 
* write more code and do some more testing
 +
 +
[[Category:CoCoAXML]]

Latest revision as of 20:16, 22 October 2007

Warning: This article may be deleted! We think, here are no essential information and this article is obsolete. If you disagree, please leave a message at the cocoa forum . Thank you! All articles being discussed to be obsolete can be found here.

Goal

There has been a discussion in the CoCoA team whether to use the wiki as its primary repository for the documentation or not. To convince everybody that this is a useful system it would be good to have an implementation to demonstrate that such a system is feasible.

Concept

The concept would look something like this:

  • Transfer documentation from current XML based format to WikiText. This will be done manually since there isn't any efficient way to automate this.
  • Implement software that
    • translates WikiText to DocbookXML.
    • aggregates the individual pages of a section, i.e. all the pages describing the commands of the CoCoAL.
  • Implement makefile to create documentation in required format out of DocBookXML.

There is some open source software out there that does something similar to what we need, but it would be easier to implement what we need from scratch while reusing some of the code out there. Mabshoff has developed a prototype in python that could have been demonstrated at the CoCoA Meeting at Dortmund University, Germany (26.9 - 30.09.2005).

Obviously we should initially test this on a couple of pages from the current CoCoA documentation and go ahead with the conversion if everybody agrees that it works.

Offline-Editing

In order to enable Anna to do offline editing of the Wiki-pages we came up with a proposal: There are three phases:

Phase 1 - Pull:

  • Create new directory in DocRoot, i.e. 2005-09-30
  • Open list of Wikipages to edit offline and for each one:
    • extract wikitext to Name_of_page.wikiml
    • extract changes with user and timestamp and save in Name_of_page.edits
    • md5sum Name_of_page.wikiml and save into Name_of_page.md5sum
    • create empty Name_of_page.comment with format
minor = 0
comment = NONE

Phase 2 - Edit

  • user edit Name_of_page.wikiml as he/she pleases.

Phase 3 - Push

  • user gives push command and a directory as parameter.
  • find all *.wikiml and make sure that *.edits, *.md5sum & *.comment exist.
  • check if md5sum of Name_of_page.wikiml has changed. If not proceed with next Name_of_page.wikiml.
  • Since Name_of_page.wikiml has changed get current online-version from wiki and extract wikitext to Name_of_page.wikiml.current
  • md5sum Name_of_page.wikiml.current and compare to original md5sum of unedited Name_of_page.wikiml. If they are identical push Name_of_page.wikiml into wiki. Make sure that comment in Name_of_page.comment is not empty and set minor edit depending whether minor == 0. If comment == NONE ask for comment. Proceed with next Name_of_page.
  • if md5sum of Name_of_page.wikiml.current has changed from the original unedited Name_of_page.wikiml, diff Name_of_page.wikiml and Name_of_page.wikiml.current and ask what to do, i.e. push anyway, skip or drop. Proceed with next Name_of_page.

What do we use for implementation?

python 2.3 or higher. We don't need that many features but we should use a decent XML-parser.

What is implemented?

  • Pull: Getting XML out of Wiki and ripping out interesting bits is done in a prototype.
  • Edit: Nothing to see here, go along.
  • Push: Nothing yet, shouldn't be too my trouble, since it is all basic file-IO or html-push with clever parameters.

What needs to be specified?

Some Meta-information, i.e. user-account and password in wiki, directory where to store files, etc. as needed. We should use some format like

key=value;

Building documentation

There are two possible way to do things:

a) Templates: Per unit of the documentation

  • {{Blah: PageA}}
  • {{Blah: PageB}}

b) Manually Extract individual pages and merge them after conversion to DocBookXML.

WikiML -> Man, Texi, HTML, PDF, ... conversion

Initially we plan to support only a small subset of the WikiML-tags:

= =
== ==
=== ===
\   (verbatim beginning with spaces)
* (list, only one level deep at the moment)
# (numbered list, only one level deep at the moment)
[[Blah]] [[Blah|Link]] [[Blah Link]]
<code>
<nowiki>
Tex-Formula/MathML

Ignore certain tags: {{stub}}, {{COPYRIGHT}}

Edit history

  1. Pull individual pages (see above).
  2. Iterate over Name_of_page.edits and extract edits since a given date into seperate XML or text files.
  3. open editor to inspect list of edits, do changes as needed, i.e. remove Spam reverts (might be automated over keywords) or minor edit
  4. convert list to html, latex, pdf, whatever


useful links

meta.wikipedia.org/wiki/Alternative_parsers

Resources

  • Not all tags of the WikiText specification will be supported, at least not initially. Hence we have a testpage to demonstrate which tags we currently support.

ToDo

  • write more code and do some more testing