Version 0.5

  • Added tag list view and global tag delete support

  • Added tag editing view and listing documents with an specific tag

  • Changed the previewing and deleting staging files views to required DOCUMENT_CREATE permission

  • Added no-parent-history class to document page links so that iframe clicking doesn’t affect the parent window history

    • Fixes back button issue on Chrome 9 & 10

  • Added per app version display tag

  • Added loading spinner animation

  • Messages tweaks and translation updates

  • Converter app cleanups, document pre-cache, magic number removal

  • Added OCR view displaying all active OCR tasks from all cluster nodes

  • Disabled CELERY_DISABLE_RATE_LIMITS by default

  • Implement local task locking using Django locmem cache backend

  • Added doc extension to office document format list

  • Removed redundant transformation calculation

  • Make sure OCR in processing documents cannot be deleted

  • PEP8, pylint cleanups and removal of relative imports

  • Removed the obsolete DOCUMENTS_GROUP_MAX_RESULTS setting option

  • Improved visual appearance of messages by displaying them outside the main form

  • Added link to close all notifications with one click

  • Made the queue processing interval configurable by means of a new setting: OCR_QUEUE_PROCESSING_INTERVAL

  • Added detection and reset of orphaned ocr documents being left as ‘processing’ when celery dies

  • Improved unknown format detection in the graphicsmagick backend

  • Improved document convertion API

  • Added initial support for converting office documents (only ods and docx tested)

  • Added sample configuration files for supervisor and apache under contrib/

  • Avoid duplicates in recent document list

  • Added the configuration option CONVERTER_GM_SETTINGS to pass GraphicsMagicks specific commands the the GM backend

  • Lower image convertion quality if the format is jpg

  • Inverted the rotation button, more intuitive this way

  • Merged and reduced the document page zoom and rotation views

  • Increased permissions app permission’s label field size

    • DB Update required

  • Added support for metadata group actions

  • Reduced the document pages widget size

  • Display the metadata group numeric total in the metadata group form title

  • Reorganized page detail icons

  • Added first & last page navigation links to document page view

  • Added interactive zoom support to document page detail view

  • Spanish translation updates

  • Added DOCUMENTS_ZOOM_PERCENT_STEP, DOCUMENTS_ZOOM_MAX_LEVEL, DOCUMENTS_ZOOM_MIN_LEVEL configuration options to allow detailed zoom control

  • Added interactive document page view rotation support

  • Changed the side bar document grouping with carousel style document grouping form widget

  • Removed the obsolete DOCUMENTS_TRANFORMATION_PREVIEW_SIZE and DOCUMENTS_GROUP_SHOW_THUMBNAIL setting options

  • Improved double submit prevention

  • Added a direct rename field to the local update and staging upload forms

  • Separated document page detail view into document text and document image views

  • Added grab-scroll to document page view

  • Disabled submit buttons and any buttons when during a form submit

  • Updated the page preview widget to display a infinite-style horizontal carousel of page previews

  • Added support user document folders

    • Must do a syncdb to add the new tables

  • Added support for listing the most recent accessed documents per user

  • Added document page navigation

  • Fixed diagnostics url resolution

  • Added confirmation dialog to document’s find missing document file diagnostic

  • Added a document page edit view

  • Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR

  • Fixed document description editing

  • Replaced page break text with page number when displaying document content

  • Implemented detail form readonly fields the correct way, this fixes copy & paste issues with Firefox

  • New document page view

  • Added view to add or remove user to a specific role

  • Updated the jQuery packages with the web_theme app to version 1.5.2

  • Made AVAILABLE_INDEXING_FUNCTION setting a setting of the documents app instead of the filesystem_serving app

  • Fixed document download in FireFox for documents containing spaces in the filename

  • If mime detection fails set mime type to ‘’ instead of ‘unknown’

  • Use document MIME type when downloading otherwise use ‘application/octet-stream’ if none

  • Changed the way document page count is parsed from the graphics backend, fixing issue #7

  • Optimized document metadata query and display

  • Implemented OCR output cleanups for English and Spanish

  • Redirect user to the website entry point if already logged and lands in the login template

  • Changed from using SimpleUploadedFile class to stream file to the simpler File class wrapper

  • Updated staging files previews to use sendfile instead of serve_file

  • Moved staging file preview creation logic from documents.views to staging.py

  • When deleting staging file, it’s cached preview is also deleted

  • Added a new setup option:

    • FILESYSTEM_INDEXING_AVAILABLE_FUNCTIONS - a dictionary to allow users to add custom functions

  • Made automatic OCR a function of the OCR app and not of Documents app (via signals)

    • Renamed setup option DOCUMENT_AUTOMATIC_OCR to OCR_AUTOMATIC_OCR

  • Clear node name when requeueing a document for OCR

  • Added support for editing the metadata of multiple documents at the same time

  • Added Graphics magick support by means of user selectable graphic convertion backends

    • Some settings renamed to support this change:

      • CONVERTER_CONVERT_PATH is now CONVERTER_IM_CONVERT_PATH

      • CONVERTER_IDENTIFY_PATH is now CONVERTER_IM_IDENTIFY_PATH

    • Added options:

      • CONVERTER_GM_PATH - File path to graphicsmagick’s program.

      • CONVERTER_GRAPHICS_BACKEND - Backend to use: ImageMagick or GraphicMagick

  • Raise ImportError and notify user when specifying a non existant converter graphics backend

  • Fixed issue #4, avoid circular import in permissions/__init__.py

  • Add a user to a default role only when the user is created

  • Added total page count to statistics view

  • Added support to disable the default scrolling JS code included in web_theme app, saving some KBs in transfer

  • Clear last ocr results when requeueing a document

  • Removed the ‘exists’ column in document list view, diagnostics superceded this

  • Added 3rd party sendfile app (support apache’s X-sendfile)

  • Updated the get_document_image view to use the new sendfile app

  • Fixed the issue of the strip spaces middleware conflicting with downloads

  • Removed custom IE9 tags

  • Closed Issue #6

  • Allow deletion of non existing documents from OCR queue

  • Allow OCR requeue of pending documents

  • Invalid page numbers now raise Http404, not found instead of error

  • Added an additional check to lower the chance of OCR race conditions between nodes

  • Introduce a random delay to each node to further reduce the chance of a race condition, until row locking can be implemented or is implemented by Django

  • Moved navigation code to its own app

  • Reimplemented OCR delay code, only delay new document Added a new field: delay, update your database schema accordingly

  • Made the concurrent ocr code more granular, per node, every node can handle different amounts of concurrent ocr tasks Added a new field: node_name, update your database schema acordinging

  • Reduced default ocr delay time

  • Added a new diagnostics tab under the tools menu

  • Added a new option OCR_REPLICATION_DELAY to allow the storage some time for replication before attempting to do OCR to a document

  • Added OCR multi document re-queue and delete support

  • Added simple statistics page (total used storage, total docs, etc)

  • Implemented form based and button based multi item actions (button based by default)

  • Added multi document delete

  • Fixed a few HTML validation errors

  • Issues are now tracked using github

  • Added indexing flags to ocr model

  • Small optimization in document list view

  • Small search optimization

  • Display “DEBUG mode” string in title if DEBUG variable is set to True

  • Added the fix-permissions bash script under misc/ folder

  • Plugged another file descriptor leak

  • Show class name in config settings view

  • Added missing config option from the setup menu

  • Close file descriptor to avoid leaks

  • Don’t allow duplicate documents in queues

  • Don’t raise PermissionDenied exception in PermissionDenied middleware, even while debugging

  • Fixed page number detection

  • Created ‘simple document’ for non technical users with all of a document pages content

  • Use document preview code for staging file also

  • Error picture literal name removal

  • Spanish translation updates

  • Show document file path in regards of its storage

  • Added new setting: side bar search box

  • Implemented new PermissioDenied exception middleware handler

  • Permissions app api now returns a PermissionDenied exception instead of a custom one

  • Added new 403 error template

  • Updated the 404 template to display only a not found message

  • Moved the login required middleware to the common app

  • Fixed search app’s model.objects.filter indentation, improved result count calculation

  • Added dynamic comparison types to search app

  • Separated search code from view code

  • Correctly calculate show result count for multi model searches

  • Fixed OCR queue list showing wrong thumbnail

  • Fixed staging file preview

  • Show current metadata in document upload view sidebar

  • Show sentry login for admin users

  • Do not reinitialize document queue and/or queued document on reentry

  • Try extra hard not to assign same uuid to two documents

  • Added new transformation preview size setting

  • Renamed document queue state links

  • Changed ocr status display sidebar from form based to text based

  • Added document action to clear all the document’s page transformations

  • Allow search across related fields

  • Optimzed search for speed and memory footprint

  • Added LIMIT setting to search

  • Show search elapsed time on result page

  • Converter now differentiates between unknown file format and convert errors

  • Close file descriptors when executing external programs to prevent/reduce file descriptior leaks

  • Improved exception handling of external programs

  • Show document thumbnail in document ocr queue list

  • Make ocr document date submitted column non breakable

  • Fix permissions, directories set to mode 755 and files to mode 644

  • Try to fix issue #2, “random ORM field error on search while doing OCR”

  • Added configurable location setting for file based storage

  • Prepend storage name to differentiate config options

  • Fixed duplicated document search

  • Optimized document duplicate search

  • Added locale middleware, menu bar language switching works now

  • Only show language selection list if localemiddleware is active

  • Spanish translation updates

  • Added links, views and permissions to disable or enable an OCR queue

  • Enabled Django’s template caching

  • Added document queue property side bar window to the document queue list view

  • Added HTML spaceless middleware to remove whitespace in HTML code

  • If current user is superuser or staff show thumbnail & preview generation error messages

  • Added a setting to show document thumbnail in metadata group list

  • Started adding configurations setting descriptions

  • Initial GridFS storage support

  • Implemented size and delete methods for GridFS

  • Implement GridFS storage user settings

  • Added document link in the OCR document queue list

  • Link to manually re queue failed OCR

  • Don’t separate links (encose object list links with white-space: nowrap;)

  • Added document description to the field search list

  • Sort OCR queued documents according to submitted date & time

  • Document filesystem serving is now a separate app

    • Steps to update (Some warnings may be returned, but these are not fatal as they might be related to missing metadata in some documents):

      • rename the following settings:

        • DOCUMENTS_FILESYSTEM_FILESERVING_ENABLE to FILESYSTEM_FILESERVING_ENABLE

        • DOCUMENTS_FILESYSTEM_FILESERVING_PATH to FILESYSTEM_FILESERVING_PATH

        • DOCUMENTS_FILESYSTEM_SLUGIFY_PATHS to FILESYSTEM_SLUGIFY_PATHS

        • DOCUMENTS_FILESYSTEM_MAX_RENAME_COUNT to FILESYSTEM_MAX_RENAME_COUNT

      • Do a ./manage.py syncdb

      • Execute ‘Recreate index links’ locate in the tools menu

      • Wait a few minutes

  • Added per document duplicate search and a tools menu option to seach all duplicated documents

  • Added document tool that deletes and re-creates all documents filesystem links

  • Increased document’s and document metadata index filename field’s size to 255 characters

  • Added sentry to monitor and store error for later debugging

  • Zip files can now be uncompressed in memory and their content uploaded individually in one step

  • Added support for concurrent, queued OCR processing using celery

  • Apply default transformations to document before OCR

  • Added unpaper to the OCR convertion pipe

  • Added views to create, edit and grant/revoke permissions to roles

  • Added multipage documents support (only tested on pdfs)

    • To update a previous database do: [d.update_page_count() for d in Document.objects.all()]

  • Added support for document page transformation (no GUI yet)

  • Added permissions and roles support

  • Added python-magic for smarter MIME type detection (https://github.com/ahupp/python-magic).

  • Added a new Document model field: file_mime_encoding.

  • Show only document metadata in document list view.

  • If one document type exists, the create document wizard skips the first step.

  • Changed to a liquid css grid

  • Added the ability to group documents by their metadata

  • New abstracted options to adjust document conversion quality (default, low, high)