Version 0.5
Added tag list view and global tag delete support.
Added tag editing view and listing documents with an specific tag.
Changed the previewing and deleting staging files views to required
DOCUMENT_CREATE
permission.Added no-parent-history class to document page links so that iframe clicking doesn’t affect the parent window history.
Fixes back button issue on Chrome 9 & 10.
Added per app version display tag.
Added loading spinner animation.
Messages tweaks and translation updates.
Converter app cleanups, document pre-cache, magic number removal.
Added OCR view displaying all active OCR tasks from all cluster nodes.
Disabled
CELERY_DISABLE_RATE_LIMITS
by default.Implement local task locking using Django locmem cache backend.
Added doc extension to office document format list.
Removed redundant transformation calculation.
Make sure OCR in processing documents cannot be deleted.
PEP8, pylint cleanups and removal of relative imports.
Removed the obsolete
DOCUMENTS_GROUP_MAX_RESULTS
setting option.Improved visual appearance of messages by displaying them outside the main form.
Added link to close all notifications with one click.
Made the queue processing interval configurable by means of a new setting:
OCR_QUEUE_PROCESSING_INTERVAL
.Added detection and reset of orphaned ocr documents being left as ‘processing’ when celery dies.
Improved unknown format detection in the graphicsmagick backend.
Improved document conversion API.
Added initial support for converting office documents (only ods and docx tested).
Added sample configuration files for Supervisor and Apache under contrib/.
Avoid duplicates in recent document list.
Added the configuration option CONVERTER_GM_SETTINGS to pass GraphicsMagicks specific commands the the GM backend.
Lower image conversion quality if the format is JPG.
Inverted the rotation button, more intuitive this way.
Merged and reduced the document page zoom and rotation views.
Increased permissions app permission’s label field size.
DB Update required.
Added support for metadata group actions.
Reduced the document pages widget size.
Display the metadata group numeric total in the metadata group form title.
Reorganized page detail icons.
Added first & last page navigation links to document page view.
Added interactive zoom support to document page detail view.
Spanish translation updates.
Added
DOCUMENTS_ZOOM_PERCENT_STEP
,DOCUMENTS_ZOOM_MAX_LEVEL
,DOCUMENTS_ZOOM_MIN_LEVEL
configuration options to allow detailed zoom control.Added interactive document page view rotation support.
Changed the side bar document grouping with carousel style document grouping form widget.
Removed the obsolete
DOCUMENTS_TRANFORMATION_PREVIEW_SIZE
andDOCUMENTS_GROUP_SHOW_THUMBNAIL
setting options.Improved double submit prevention.
Added a direct rename field to the local update and staging upload forms.
Separated document page detail view into document text and document image views.
Added grab-scroll to document page view.
Disabled submit buttons and any buttons when during a form submit.
Updated the page preview widget to display a infinite-style horizontal carousel of page previews.
Added support user document folders.
Must do a
syncdb
to add the new tables.
Added support for listing the most recent accessed documents per user
Added document page navigation.
Fixed diagnostics URL resolution.
Added confirmation dialog to document’s find missing document file diagnostic.
Added a document page edit view.
Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR.
Fixed document description editing.
Replaced page break text with page number when displaying document content.
Implemented detail form readonly fields the correct way, this fixes copy & paste issues with Firefox.
New document page view.
Added view to add or remove user to a specific role.
Updated the jQuery packages with the web_theme app to version 1.5.2.
Made
AVAILABLE_INDEXING_FUNCTION
setting a setting of the documents app instead of the filesystem_serving app.Fixed document download in FireFox for documents containing spaces in the filename.
If mime detection fails set mime type to ‘’ instead of ‘unknown’.
Use document MIME type when downloading otherwise use ‘application/octet-stream’ if none.
Changed the way document page count is parsed from the graphics backend, fixing issue #7.
Optimized document metadata query and display.
Implemented OCR output cleanups for English and Spanish.
Redirect user to the website entry point if already logged and lands in the login template.
Changed from using SimpleUploadedFile class to stream file to the simpler File class wrapper.
Updated staging files previews to use sendfile instead of serve_file.
Moved staging file preview creation logic from documents.views to staging.py.
When deleting staging file, it’s cached preview is also deleted.
Added a new setup option:
FILESYSTEM_INDEXING_AVAILABLE_FUNCTIONS
- a dictionary to allow users to add custom functions.
Made automatic OCR a function of the OCR app and not of Documents app (via signals).
Renamed setup option
DOCUMENT_AUTOMATIC_OCR
toOCR_AUTOMATIC_OCR
.
Clear node name when requeueing a document for OCR.
Added support for editing the metadata of multiple documents at the same time.
Added GraphicsMagick support by means of user selectable graphic conversion backends
Some settings renamed to support this change:
CONVERTER_CONVERT_PATH
is nowCONVERTER_IM_CONVERT_PATH
CONVERTER_IDENTIFY_PATH
is nowCONVERTER_IM_IDENTIFY_PATH
Added options:
CONVERTER_GM_PATH
- File path to graphicsmagick’s program.CONVERTER_GRAPHICS_BACKEND
- Backend to use:ImageMagick
orGraphicMagick
Raise ImportError and notify user when specifying a non existent converter graphics backend.
Fixed issue #4, avoid circular import in permissions/__init__.py.
Add a user to a default role only when the user is created.
Added total page count to statistics view.
Added support to disable the default scrolling JS code included in web_theme app, saving some KBs in transfer.
Clear last OCR results when requeueing a document.
Removed the ‘exists’ column in document list view, diagnostics superseded this.
Added 3rd party sendfile app (support apache’s X-sendfile).
Updated the get_document_image view to use the new sendfile app.
Fixed the issue of the strip spaces middleware conflicting with downloads.
Removed custom IE9 tags.
Closed Issue #6.
Allow deletion of non existing documents from OCR queue.
Allow OCR requeue of pending documents.
Invalid page numbers now raise Http404, not found instead of error.
Added an additional check to lower the chance of OCR race conditions between nodes.
Introduce a random delay to each node to further reduce the chance of a race condition, until row locking can be implemented or is implemented by Django.
Moved navigation code to its own app.
Re-implemented OCR delay code, only delay new document. Added a new field: delay, update your database schema accordingly
Made the concurrent OCR code more granular, per node, every node can handle different amounts of concurrent OCR tasks Added a new field: node_name, update your database schema accordingly.
Reduced default OCR delay time.
Added a new diagnostics tab under the tools menu.
Added a new option
OCR_REPLICATION_DELAY
to allow the storage some time for replication before attempting to do OCR to a document.Added OCR multi document re-queue and delete support.
Added simple statistics page (total used storage, total docs, etc).
Implemented form based and button based multi item actions (button based by default).
Added multi document delete.
Fixed a few HTML validation errors.
Issues are now tracked using GitHub.
Added indexing flags to OCR model.
Small optimization in document list view.
Small search optimization.
Display “DEBUG mode” string in title if
DEBUG
variable is set to True.Added the fix-permissions bash script under misc/ folder.
Plugged another file descriptor leak.
Show class name in config settings view.
Added missing config option from the setup menu.
Close file descriptor to avoid leaks.
Don’t allow duplicate documents in queues.
Don’t raise
PermissionDenied
exception inPermissionDenied
middleware, even while debugging.Fixed page number detection.
Created ‘simple document’ for non technical users with all of a document pages content.
Use document preview code for staging file also.
Error picture literal name removal.
Spanish translation updates.
Show document file path in regards of its storage.
Added new setting: side bar search box.
Implemented new
PermissioDenied
exception middleware handler.Permissions app API now returns a
PermissionDenied
exception instead of a custom one.Added new 403 error template.
Updated the 404 template to display only a not found message.
Moved the login required middleware to the common app.
Fixed search app’s model.objects.filter indentation, improved result count calculation.
Added dynamic comparison types to search app.
Separated search code from view code.
Correctly calculate show result count for multi model searches.
Fixed OCR queue list showing wrong thumbnail.
Fixed staging file preview.
Show current metadata in document upload view sidebar.
Show sentry login for admin users.
Do not reinitialize document queue and/or queued document on reentry.
Try extra hard not to assign same UUID to two documents.
Added new transformation preview size setting.
Renamed document queue state links.
Changed OCR status display sidebar from form based to text based.
Added document action to clear all the document’s page transformations.
Allow search across related fields.
Optimized search for speed and memory footprint.
Added
LIMIT
setting to search.Show search elapsed time on result page.
Converter now differentiates between unknown file format and convert errors.
Close file descriptors when executing external programs to prevent/reduce file descriptor leaks.
Improved exception handling of external programs.
Show document thumbnail in document OCR queue list.
Make OCR document date submitted column non breakable.
Fix permissions, directories set to mode 755 and files to mode 644.
Try to fix issue #2, “random ORM field error on search while doing OCR”.
Added configurable location setting for file based storage.
Prepend storage name to differentiate config options.
Fixed duplicated document search.
Optimized document duplicate search.
Added locale middleware, menu bar language switching works now.
Only show language selection list if localemiddleware is active.
Spanish translation updates.
Added links, views and permissions to disable or enable an OCR queue.
Enabled Django’s template caching.
Added document queue property side bar window to the document queue list view.
Added HTML spaceless middleware to remove whitespace in HTML code.
If current user is superuser or staff show thumbnail & preview generation error messages.
Added a setting to show document thumbnail in metadata group list.
Started adding configurations setting descriptions.
Initial GridFS storage support.
Implemented size and delete methods for GridFS.
Implement GridFS storage user settings.
Added document link in the OCR document queue list.
Link to manually re queue failed OCR.
Don’t separate links (enclose object list links with white-space: nowrap;).
Added document description to the field search list.
Sort OCR queued documents according to submitted date & time.
Document filesystem serving is now a separate app.
Steps to update (Some warnings may be returned, but these are not fatal as they might be related to missing metadata in some documents):
rename the following settings:
DOCUMENTS_FILESYSTEM_FILESERVING_ENABLE
toFILESYSTEM_FILESERVING_ENABLE
DOCUMENTS_FILESYSTEM_FILESERVING_PATH
toFILESYSTEM_FILESERVING_PATH
DOCUMENTS_FILESYSTEM_SLUGIFY_PATHS
toFILESYSTEM_SLUGIFY_PATHS
DOCUMENTS_FILESYSTEM_MAX_RENAME_COUNT
toFILESYSTEM_MAX_RENAME_COUNT
Do a ./manage.py syncdb.
Execute ‘Recreate index links’ locate in the tools menu.
Wait a few minutes.
Added per document duplicate search and a tools menu option to search all duplicated documents.
Added document tool that deletes and re-creates all documents filesystem links.
Increased document’s and document metadata index filename field’s size to 255 characters.
Added sentry to monitor and store error for later debugging.
Zip files can now be uncompressed in memory and their content uploaded individually in one step.
Added support for concurrent, queued OCR processing using Celery.
Apply default transformations to document before OCR.
Added unpaper to the OCR conversion pipe.
Added views to create, edit and grant/revoke permissions to roles.
Added multipage documents support (only tested on PDFs).
To update a previous database do: [d.update_page_count() for d in Document.objects.all()]
Added support for document page transformation (no GUI yet).
Added permissions and roles support.
Added python-magic for smarter MIME type detection (https://github.com/ahupp/python-magic).
Added a new Document model field: file_mime_encoding.
Show only document metadata in document list view.
If one document type exists, the create document wizard skips the first step.
Changed to a liquid CSS grid.
Added the ability to group documents by their metadata.
New abstracted options to adjust document conversion quality (default, low, high).