Version 0.12.1
May 2012
This is the first maintenance release of the 0.12 series.
Overview
While bug fixes and minor feature were the focus for this release, some bigger changes were included because of their importance. The parsing of documents saw a complete rewrite being now class based and allows for more than one parser per MIME type with sequential fallback. This provides the best text extraction on deployments where users have control over the installation and basic extraction when deploying on the cloud or other environments where users don’t have the ability to install OS level binaries.
Changes
Fabric file (fabfile)
A Fabric file is included to help users not very familiar with Ubuntu, Python and Django install Mayan EDMS, or for system administrators looking to automate the install whether in local or remote systems. At the moment the fabfile will install Mayan EDMS in the same configurations listed in this documentation, that is: (Ubuntu/Debian/Fedora) + virtualenv + Apache + MySQL. Feel free to submit your configuration settings and files for different databases, webserver or Linux distribution. More configurations will be added to the fabfile as more are tested.
Documentation update
The installation instructions were updated to include the installation of
the libpng-dev
and libjpeg-dev
libraries as well as the installation
of the poppler-utils
package. An additional step to help users test
their new installation of Mayan EDMS was also added.
Translations
The Italian translation has been synchronized with the source files at Transifex and finished to %100 completion.
Usability improvements
The index instance view now feature the same multi document action buttons (Submit to OCR, delete, download, etc) as the mail and recent document views.
Better office document conversion
A new method of converting office documents has been implemented, this
new method doesn’t require the use of the command line utility UNOCONV
.
If this new method proves to work better than previous solutions the use
of UNOCONV
may be deprecated in the future. The conversion method
adds just one new configuration option: CONVERTER_LIBREOFFICE_PATH
which defaults to /usr/bin/libreoffice
.
Better PDF text parsing
Brian E. submitted a patch to use the Poppler package pdftotext utility to
extract text from PDF files. This is now the default method Mayan EDMS
will execute to try to extract text from a PDF and failing that will
fallback to the previous method. This change add a new configuration
option: OCR_PDFTOTEXT_PATH
to specify the location of the pdftotext
executable, it defaults to /usr/bin/pdftotext
. Be sure to install the
poppler-utils
os package to take advantage of this new parser.
Changed defaults
The OCR queue is now active by default when first created during the
syncdb
phase and the OCR_AUTOMATIC_OCR
option now defaults
to True
. These two changes are made to reduce the steps required for
new users to start enjoying the benefits of automatic text extraction from
uploaded documents without having to read the documentation and have a more
functional default install.
Upgrading from a previous version
Start off by adding the new requirements:
$ pip install -r requirements/production.txt
Migrate existing database schema with:
$ ./manage.py migrate documents
Install the poppler-utils
package:
Ubuntu, Debian:
$ apt-get install -y poppler-utils
Fedora:
$ yum install -y poppler-utils
The upgrade procedure is now complete.
Backward incompatible changes
None
Bugs fixed
GitHub issue #25 “Office document conversion error”
Removals
None