OpenOffice and DocBook Tutorial XML Publishing with DocBook support in OpenOffice Writer About Author: Sandro Zic Date: ...
29 downloads
567 Views
178KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
OpenOffice and DocBook Tutorial XML Publishing with DocBook support in OpenOffice Writer About Author: Sandro Zic Date: 2003-04-30 Company: ZZ/OSS Information Networking GbR Website: http://www.zzoss.com Copyright: 2003, ZZ/OSS GbR Contributors: Axel “Burgi” Burkhardt and Michael Schaller. License: This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/bysa/1.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
Abbreviations API DbK OOo OOW Zip
Application Programming Interface DocBook OpenOffice.org OpenOffice Writer A compression algorithm and archive
Introduction OpenOffice is an Open Source office suite that internally works with XML. As XML has been created to separate semi-structured content from layout, it is extremely useful for advanced publishing environments and interoperable information transfer. OpenOffice Writer (OOW) is the document editor of OpenOffice. It writes text to a specified XML vocabulary and saves the resulting XML file together with some meta information, a stylesheet, and binary data in a compressed archive. This archive is actually a Zip file and can be decompressed easily. The latest OpenOffice development release called 1.1beta has some new features for testing. Interesting enough, it comes with export and import filters for DocBook. DocBook is itself an established XML vocabulary, very much in use for software documentation. It is known for its rich element set that covers a wide variety of semantic concepts and is therefore ideal to store meaning. Furthermore, there are many tools available to transform DocBook XML to various document formats. This tutorial explains, •
where to obtain the required software and how to install it;
•
how to create a document in OpenOffice Writer that can later be saved in DocBook XML;
•
how to convert Microsoft Word documents to DocBook XML via OpenOffice Writer.
•
how to convert from DocBook to HTML.
This text has itself been created with OpenOffice and exported to DocBook XML. At http://www.zzoss.com/oowdbk.php you will find the OpenOffice Writer, DocBook XML and HTML files of this text. Of course, this text suffers from the bugs described here ;) You can download and use these files to test the examples below for yourself. At that location, more formats will be added and new versions of this tutorial will describe how to create the respective formats. So keep getting back to our Website or subscribe to our newsletter to be notified: http://www.zzoss.com/newsletter.php
Stability Be aware that DocBook support is labeled alpha by the developers, hence bugs are likely to occure and functionality likely to be missing. This tutorial has an eye on the status quo and points out all annoyances and deficiences. Nevertheless, DocBook support in OpenOffice is a great tool and the existing implementation in the 1.1beta version is a promising start. Furthermore, beta releases of OpenOffice can be buggy and are primarily released for brave users who want to test new features. During my work with OpenOffice 1.1beta some occassional crashes occurred, but no data was lost, and some minor bugs appeared.
Installation This section will tell you how to set up the software environment that has been in use to create the examples of this tutorial. All of which are available under an Open Source license for free download. The paths provided are relative to a user's home directory on a Linux system - indicated by the “~”. For Windows, please change the paths accordingly, e.g replace “~” with “C:”. The tutorial has been written with OpenOffice 1.1beta, which is the first OpenOffice release with a built-in DocBook export filter, thus the examples will not work with any older version of OpenOffice. You can download all sample files mentioned here from our Website to perform testing on your own. For more information consult http://www.zzoss.com/oowdbk.php. Before installing OpenOffice, you first of all need to install the Java Runtime Environment version 1.4. This is due to the fact that the DocBook filters depend on the XML parser Crimson and the XSLT processor Xalan, which are both included in the 1.4 package. Get your copy of JRE 1.4 for your operating system here: http://java. sun.com/j2se/1.4.1/download.html. If you already have JRE 1.3 installed, you can instead place Crimson (or Xerces) and Xalan in your Java class path. Please refer to the links below. The next step is to carefully read how to enable DocBook support in OOW 1.1beta at http://xml.openoffice.org/xmerge/docbook/. Finally, you can find instructions on how to obtain and install OpenOffice 1.1 Beta at http://www.openoffice.org/dev_docs/ source/1.1beta/. Installation should be straight forward, simply follow the instructions.
Now we install the software needed for DocBook XML transformation. First of all, we download the DocBook XSL stylesheets from http://sourceforge.net/project/ showfiles.php?group_id=21935. I have used docbook-xsl-1.60.1 for the examples below. Extract the files to any place on your harddisk and delete the version number; let's assume you placed them in ~/com.zzoss.oowdbk.doc/docbook-xsl/
Of course, we need an XSLT processor to work with the stylesheets. Allthough Xalan is already available since installation of OpenOffice, I recommend you additionally install Saxon XSLT processor for the DocBook stylesheets. Download Saxon 6.5.2 from http://saxon.sourceforge.net/#F6.5.2, extract the archive to any place on your harddisk; let's assume you placed them in ~/com.zzoss.oowdbk.doc/java/
The following examples will work with this path. Optionally, you can copy the saxon. jar file to your Java class path. Make sure that you install 6.5.2 and not Saxon 7.x, because the stylesheets simply don't work with the latest Saxon releases.
Creating A New DocBook Document With OpenOffice Fortunately, there is a good tutorial online called Getting Started With DocBook on OpenOffice which explains the basic functionality. Please read this introduction carefully as I will know procede based on the information provided there. Make sure that you get the DocBookTemplate.stw file referenced in this text. Do not procede as described in the section “Using OpenOffice.org to create and edit DocBook XML” at http://xml.openoffice.org/xmerge/docbook/ (this is the same document as referenced above on how to enable DocBook support), because this will leave you frustrated with the results as the export filters do not actually work with the provided DocBook sample document. If you are familiar with the logical divisions of DocBook, you can roughly relate the OpenOffice Stylists paragraph styles to DocBook block elements and the OOW character styles to DbK inline elements. The following two graphics show how to select the paragraph and character styles from the Stylist. Always draw your attention to the selected button on the top left.
There's a table available that indicates which DocBook elements are supported by OpenOffice: http://xml.openoffice.org/xmerge/docbook/DocBookTags.html Inserting a graphic is a bit tricky if you want to make sure it is exported to DocBook correctly. Currently, there are two possibilities: you either treat the image as an inlinegraphic or as a mediaobject. Here I solely deal with an inlinegraphic and show how to insert it in a way that the DocBook export filter can detect it and write the proper DocBook XML.
Choose Insert > Graphic > From File: Then, when the Insert Graphics dialog appears, make sure that the checkbox Link at
the bottom left is activated and choose an image. This way you can ensure that the binary data of the image is not written to the OpenOffice XML file, instead, the file path is saved. If you do not activate the checkbox, the file-attribute of the inlinegraphic element stays empty in the exported DocBook XML.
If you now save the document as DocBook XML, you may find the following line:
Including your images via link in OpenOffice works fine in case you move the document around as long as the image files reside in the same directory as your OOW document. Otherwise, OpenOffice will not be able to resolve the link. The same is true if you rename or delete the linked image file.
Converting MS Word Documents To DocBook Via OpenOffice [TODO]
Usability Of OpenOffice Writer Creating DocBook XML It took me about 3 days of practice to relatively easily create new DocBook files with OpenOffice Writer or to convert existing documents to useful DocBook documents. I regard myself as a DocBook aficionado and XML/XSLT expert. Hence, I quickly understood the technical concepts behind the export/import functions. Additionally, I knew, how my resulting DocBook pages should look like. Unfortunately, the current state of usability makes it very hard for untrained users to create new or even convert existing documents to work with the DocBook export filter properly. These are the basic obstacles users are currently (1.1 Beta) confronted with: •
Installation: Manually enabling DocBook support in the TypeDetection.xcu should be done automatically in the next stable OpenOffice release.
•
Templates: The need to load an OOW template is ok if you start a new document, but needs some knowledge about OpenOffice if you want to assign the template elements to an existing document (e.g. imported from MS Word).
•
Styles: For DocBook newbies, it is confusing that DocBook template elements
coexist with OpenOffice Writer template elements in the Stylist. It would be much easier, if there were only DocBook styles to choose from. Furthermore, users do not only have to keep separate OOW and DbK styles, they also have to translate the given names for DbK styles back to the related DbK element name. For example, the style Text body relates to the Dbk element para. Also, up to know, only a subset of DocBook elements is supported. Allthough they are sufficient for simple use scenarios, it would surely be nice for power users to have most elements implemented. •
Sections: The section hierarchy approach of DocBook might be hard to grasp for ordinary users. Furthermore, they have to get used to working with the OOo Navigator.
•
Saving: Repeatedly saving a longer DocBook publication in OpenOffice takes some time. Users might become frustrated with documents exceeding 10 pages – what can take up to a minute on Linux/KDE with 1800MHz, and 512MB. A document should rather be saved in the natvide OpenOffice format during editing, and upon completion exported to DocBook.
•
Articleinfo: There's a bug in OpenOffice 1.1 Beta that prevents the DocBook export filter to correctly write the document properties. These properties are set either automatically (author, created, modified) or manually in the OOW dialog File > Properties > Description (title, subject, keywords, description). The filter failes writing them to the articleinfo context of the DocBook XML file.
•
Toolbar: Unfortunately, the text formatting buttons on the Object Bar do not translate into DbK conform styles. For example, Italic should automatically be translated into emphasis and Bullets On/Off into listitem (or unorderedList, as the corresponding style is called).
•
Tabulators: Inline tabulators will get srewed up when saving to DocBook XML. For example: before_3_tabs after_3_tabs becomes: before_3_tabsafter_3_tabs
OpenOffice Bugs And Issues You can find information on bugs and open issues in the IssueZilla repository at http://www.openoffice.org/issues/query.cgi . Below you will find a link with a result list of known issues/bugs returned when searching IssueZilla for the term “docbook”.
Creating HTML Documents From DocBook Once you have saved your OpenOffice Writer document to a DocBook XML file, the DocBook XSL stylesheets can be used to transform the DocBook XML to various formats like HTML, XHTML, PDF, RTF, MS Help, etc. Here, I describe how to create HTML pages. You can download all sample files mentioned here from our Website to perform testing on your own. For more information consult http://www.zzoss.com/oowdbk. php. Let's assume your OpenOffice Writer document is called
oowdbk.sxw
and is located at ~/com.zzoss.oowdbk.doc/openoffice/oowdbk.sxw
When you save this file as DocBook XML, your DocBook XML document created with OpenOffice might be called oowdbk.docbook.xml
and is located at ~/com.zzoss.oowdbk.doc/openoffice/oowdbk-docbook.xml
Furthermore we assume that the DocBook XSL stylesheets you have downloaded reside at ~/com.zzoss.oowdbk.doc/docbook-xsl/
This means that the XSLT stylesheets to transform DocBook XML to HTML are located at ~/com.zzoss.oowdbk.doc/docbook-xsl/html/
The root XSLT stylesheet that will be passed to the XSLT processor is ~/com.zzoss.oowdbk.doc/docbook-xsl/html/docbook.xsl
We previously decide to unzip the downloaded files of the Saxon XSLT processor to ~/com.zzoss.oowdbk.doc/java/
If you have a look there, the saxon.jar file should be located at ~/com.zzoss.oowdbk.doc/java/saxon.jar
What's left is that we decide where to place our HTML output, maybe at ~/com.zzoss.oowdbk.doc/docbook/html-all.html
Now we know all the paths that we pass to Java and the Saxon XSLT processor. The general syntax to create the HTML output would be: shell>java -classpath [path_to_saxon.jar] com.icl.saxon.StyleSheet [path_to_xml] [path_to_xsl] > [path_to_output]
The part of the command setting the classpath to the Saxon JAR file can be ommitted if you placed saxon.jar in your default Java classpath. This is the command we issue for our concrete example: shell>java -classpath ~/com.zzoss.oowdbk.doc/java/saxon.jar com.icl. saxon.StyleSheet ~/com.zzoss.oowdbk.doc/openoffice/oowdbk-docbook. xml ~/com.zzoss.oowdbk.doc/docbook-xsl/html/docbook.xsl > ~/com. zzoss.oowdbk.doc/docbook/html-all.html
The result we get is one HTML page containing all sections. With the DocBook XSL stylesheets for HTML output. We can also create one page for each section and thus allow for section-by-section browsing of our document. [TODO: How to use XSLT parameters of DocBook stylsheets]
WYSIWYG DocBook Editing With OpenOffice? The question is whether DocBook editing in OpenOffice Writer is truly What You See Is What You Get (WYSIWYG) editing. This must be doubted, because HTML pages created from DocBook sometimes look very different than the respective OOW document – at least if you use the above applied DocBook stylesheets. For example, allthough OOW might show a paragraph starting on a new line after an image, the DocBook HTML output might place the paragraph right behind the image – which is the natural behaviour of images defined with the inlinegraphic DocBook element. Furthermore, line breaks inside of a list item will appear in OOW, but not in DbK
HTML. Linespacing can be quite different in OOW compared to a derived DbK HTML article. The current sample template for OOW downloadable from openoffice.org, does not create a linespace between the last line of preformatted text (aka programlisting in DbK lingua) and the first line of a text body paragraph (aka para in DbK). Most notably, content might well disappear from the exported DocBook document if false styles have been assigned. This can happen very easily if you copy&paste text from a HTML page in your Webbrowser to OpenOffice Writer, because OOW tries to preserve the format of the copied text, e.g. HTML will translate to the OOW style “Strong Emphasis” - which is not translated to DocBook when exporting.
Summary DocBook support in OpenOffice is a useful tool in connection with Content Management Systems that internally work with DocBook XML. Trained personal that knows what the CMS needs, will find the DocBook features useful to import legacy documents (like MS Word files). The developers of the Open Source CMS Typo3 already use OpenOffice for their software documentation. Allthough they do not convert it to DocBook before displaying the documentation in HTML, this is a good example of how a XML publishing workflow may look like. Currently, untrained users will feel swamped with the obstacles described above. Most notably, they may be disappointed when the WYSIWYG layout they created in OpenOffice Writer does not look the same e.g. in the HTML page created from a DocBook file which has itself been exported from OpenOffice. In fact, ordinary users should better be confronted with only a small subset of OpenOffice styles aka DocBook elements and told that any layouting like centered text bodies will not have any effect. If the DocBook feature of OpenOffice is used for document conversion only, for example in a CMS import filter, then programmers will have to deal with the problem that OpenOffice sepcific styles will not automatically be translated to DocBook elements. Currently, this means that document conversion to DocBook via OpenOffice cannot be done without human interference, at least not if you want to have proper results. This means that costs become high if you plan to migrate lots of documents available in a proprietary format (like MS Word) to DocBook XML. On the other hand, anyone can patch the XSLT stylesheets that do the DocBook import/ export in OpenOffice and furthermore write scripts that do some preprocessing with the OpenOffice XML before it is transformed to DocBook XML (e.g. extracting binary code of images). It would be nice, if one day all of which would be done by OpenOffice itself. In general, the new DocBook feature is a good step ahead for software documentation writers, and XML based Content Management Systems. Especially CMS software that puts strong emphasis on structured documents with semantically rich XML annotation, will profit from further development of the DocBook filters.
Related Resources About OpenOffice and DocBook in the DocBook Wiki:
http://docbook.org/wiki/moin.cgi/OpenOffice?action=show “ DocBook Publishing mit OpenOffice” in German: http://www.stefan-rinke.de/articles/publish/ch06.html OpenOffice.org http://www.openoffice.org The OpenOffice.org 1.1 Beta Office Suite http://www.openoffice.org/dev_docs/source/1.1beta/ DocBook: The Definitive Guide http://docbook.org/tdg/en/html/docbook.html DocBook Wiki http://docbook.org/wiki/ List of DocBook Authoring Tools http://www.docbook.org/wiki/moin.cgi/DocBookAuthoringTools Logical Divisions of DocBook http://docbook.org/tdg/en/html/ch02.html#ch02-logdiv OpenOffice Template for DocBook http://xml.openoffice.org/xmerge/downloads/DocBookTemplate.stw Table of the DocBook tags supported by OpenOffice http://xml.openoffice.org/xmerge/docbook/DocBookTags.html The XSLT transforming OpenOffice XML to DocBook XML http://xml.openoffice.org/source/browse/xml/xmerge/java/org/openoffice/xmerge/ converter/xml/xslt/docbook/sofftodocbook.xsl?only_with_tag=HEAD The XSLT transforming DocBook XML to OpenOffice XML http://xml.openoffice.org/source/browse/xml/xmerge/java/org/openoffice/xmerge/ converter/xml/xslt/docbook/docbooktosoff.xsl?only_with_tag=HEAD Infos about DocBook XSL stylesheets http://docbook.org/wiki/moin.cgi/DocBookXslStylesheets http://docbook.org/wiki/moin.cgi/DocBookXslStylesheetDocs Download of DocBook XSL Stylesheets http://sourceforge.net/project/showfiles.php?group_id=21935 List of tools for converting other formats to DocBook http://docbook.org/wiki/moin.cgi/ConvertOtherFormatsToDocBook Bugs and issues search result for “docbook” http://www.openoffice.org/issues/buglist.cgi? issue_status=NEW&issue_status=STARTED&issue_status=REOPENED&email1=& emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&email reporter2=1&issueidtype=include&issue_id=&changedin=&votes=&chfieldfrom=&c
hfieldto=Now&chfieldvalue=&short_desc=docbook&short_desc_type=substring&lon g_desc=docbook&long_desc_type=substring&issue_file_loc=&issue_file_loc_type=s ubstring&status_whiteboard=&status_whiteboard_type=substring&keywords=&keyw ords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-00=&cmdtype=doit&order=%27Importance%27 Java Runtime Environment (JRE) 1.4 http://java.sun.com/j2se/1.4.1/download.html Crimson XML Parser http://xml.apache.org/crimson/index.html Xerces XML Parser http://xml.apache.org/xerces2-j/index.html Xalan XSLT Processor http://xml.apache.org/xalan-j/index.html Saxon XSLT Processor http://saxon.sourceforge.net Open Source Initiative (OSI) http://www.opensource.org Typo3 Content Management System http://www.typo3.org Writing Documentation for Typo3 with OpenOffice http://typo3.org/doc.0.html?&tx_extrepmgm_pi1[extUid]=291&cHash=26debf4caa Slashdot: The myth of separating presentation and content http://slashdot.org/comments.pl? sid=29766&threshold=1&commentsort=0&tid=117&mode=thread&cid=3197352