If you have any spaces in any filepaths in your exec_cmd, make sure to always nest them in escaped double quotes (\"), so Greenstone can preserve the spaces in the filepath. Only when you've been able to successfully complete this step should you proceed to the next steps. You should see mostly legible text in it. Once you have the command working, inspect the output file. PATH/TO/YOUR/djvutxt /PATH/TO/GS/web/sites/localsite/collect/DjVuColl/superhero.djvu /PATH/TO/YOUR/GS/superhero.txt You may have to invoke djvutxt using it s full filepath, in which case the command would look like: You may have to invoke djvutxt using it 's full filepath, in which case the command would look like: Open a DOS prompt on Windows or a terminal on Mac/Linux and experiment to see what it takes to convert your Greenstone installation's web/sites/localsite/collect/DjVuColl/superhero.djvu file. The next step is to find out how to run DjVuLibre's djvutxt conversion tool from the commandline. If not, you can use a package manager to install it for you, or compile it up easily from source in the usual Unix manner. You can also read about djvutxt or the other DjVu tools that DjVuLibre provides in their documentation page, but for this tutorial, we'll just be using their djvutxt tool.Īs for Linux, some Linux machines may even come pre-installed with DjVuLibre. In the left pane of DjView, you can see a listing of the various tools DjVuLibre is comprised of, and read up on them. Upon successful installation, you're given the option to launch DjVuLibre's DjView tool, which will open the DjVuLibre manual (in djvu format). Grab the one for your operating system and install it somewhere sensible, somewhere you have permissions to install and run it from. DjVuLibre is available for Windows, Mac and Linux:ĭjVuLibre provides binary installers for Windows and Mac. We're in luck, because among the DjVu related tools that DjVuLibre provides a tool called "djvutxt" that can perform the text extraction for us. You can read about the other DjVu tools that DjVuLibre provide in their documentation page, but for this tutorial, we'll just be using their djvutxt tool. Grab the one for your operating system and install it somewhere sensible, where you have permissions to run it. If not, you can use a package manager to install it for you, or compile it up easily from source in the usual Unix manner.ĭjVuLibre provides binary installers for Windows and Mac. Some Linux machines may even come pre-installed with DjVuLibre. DjVuLibre is available for Windows, Mac and Linux. ![]() So in order to process the "superhero.djvu" document in our collection, such that its text content gets indexed for searching, we need to do a number of things: find out if there's a free djvu to text conversion tool out there, work out how to run it from the commandline and finally configure the UnknownConverterPlugin to run it for us, so Greenstone can take care of the rest. If the tool can convert the document to text or HTML, Greenstone can proceed as usual to index the content to make it searchable on previewing. But it offer the additional advantage of being able to extract the text of the unknown document based on an important proviso: that you have a software tool installed on your machine, one that can be run readily from the commandline, which can perform the process of converting the unknown document format into text or HTML (or a series of images). It builds on the idea of the UnknownPlugin, allowing you to work with document formats unknown to Greenstone. documentation/trunk/tutorials/xml-source/tutorial_en.xml r32035.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |