Version 0.5¶
Added tag list view and global tag delete support
Added tag editing view and listing documents with an specific tag
Changed the previewing and deleting staging files views to required
DOCUMENT_CREATE
permissionAdded no-parent-history class to document page links so that iframe clicking doesn’t affect the parent window history
Fixes back button issue on Chrome 9 & 10
Added per app version display tag
Added loading spinner animation
Messages tweaks and translation updates
Converter app cleanups, document pre-cache, magic number removal
Added OCR view displaying all active OCR tasks from all cluster nodes
Disabled
CELERY_DISABLE_RATE_LIMITS
by defaultImplement local task locking using Django locmem cache backend
Added doc extension to office document format list
Removed redundant transformation calculation
Make sure OCR in processing documents cannot be deleted
PEP8, pylint cleanups and removal of relative imports
Removed the obsolete
DOCUMENTS_GROUP_MAX_RESULTS
setting optionImproved visual appearance of messages by displaying them outside the main form
Added link to close all notifications with one click
Made the queue processing interval configurable by means of a new setting:
OCR_QUEUE_PROCESSING_INTERVAL
Added detection and reset of orphaned ocr documents being left as ‘processing’ when celery dies
Improved unknown format detection in the graphicsmagick backend
Improved document convertion API
Added initial support for converting office documents (only ods and docx tested)
Added sample configuration files for supervisor and apache under contrib/
Avoid duplicates in recent document list
Added the configuration option CONVERTER_GM_SETTINGS to pass GraphicsMagicks specific commands the the GM backend
Lower image convertion quality if the format is jpg
Inverted the rotation button, more intuitive this way
Merged and reduced the document page zoom and rotation views
Increased permissions app permission’s label field size
DB Update required
Added support for metadata group actions
Reduced the document pages widget size
Display the metadata group numeric total in the metadata group form title
Reorganized page detail icons
Added first & last page navigation links to document page view
Added interactive zoom support to document page detail view
Spanish translation updates
Added
DOCUMENTS_ZOOM_PERCENT_STEP
,DOCUMENTS_ZOOM_MAX_LEVEL
,DOCUMENTS_ZOOM_MIN_LEVEL
configuration options to allow detailed zoom controlAdded interactive document page view rotation support
Changed the side bar document grouping with carousel style document grouping form widget
Removed the obsolete
DOCUMENTS_TRANFORMATION_PREVIEW_SIZE
andDOCUMENTS_GROUP_SHOW_THUMBNAIL
setting optionsImproved double submit prevention
Added a direct rename field to the local update and staging upload forms
Separated document page detail view into document text and document image views
Added grab-scroll to document page view
Disabled submit buttons and any buttons when during a form submit
Updated the page preview widget to display a infinite-style horizontal carousel of page previews
Added support user document folders
Must do a
syncdb
to add the new tables
Added support for listing the most recent accessed documents per user
Added document page navigation
Fixed diagnostics url resolution
Added confirmation dialog to document’s find missing document file diagnostic
Added a document page edit view
Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR
Fixed document description editing
Replaced page break text with page number when displaying document content
Implemented detail form readonly fields the correct way, this fixes copy & paste issues with Firefox
New document page view
Added view to add or remove user to a specific role
Updated the jQuery packages with the web_theme app to version 1.5.2
Made
AVAILABLE_INDEXING_FUNCTION
setting a setting of the documents app instead of the filesystem_serving appFixed document download in FireFox for documents containing spaces in the filename
If mime detection fails set mime type to ‘’ instead of ‘unknown’
Use document MIME type when downloading otherwise use ‘application/octet-stream’ if none
Changed the way document page count is parsed from the graphics backend, fixing issue #7
Optimized document metadata query and display
Implemented OCR output cleanups for English and Spanish
Redirect user to the website entry point if already logged and lands in the login template
Changed from using SimpleUploadedFile class to stream file to the simpler File class wrapper
Updated staging files previews to use sendfile instead of serve_file
Moved staging file preview creation logic from documents.views to staging.py
When deleting staging file, it’s cached preview is also deleted
Added a new setup option:
FILESYSTEM_INDEXING_AVAILABLE_FUNCTIONS
- a dictionary to allow users to add custom functions
Made automatic OCR a function of the OCR app and not of Documents app (via signals)
Renamed setup option
DOCUMENT_AUTOMATIC_OCR
toOCR_AUTOMATIC_OCR
Clear node name when requeueing a document for OCR
Added support for editing the metadata of multiple documents at the same time
Added Graphics magick support by means of user selectable graphic convertion backends
Some settings renamed to support this change:
CONVERTER_CONVERT_PATH
is nowCONVERTER_IM_CONVERT_PATH
CONVERTER_IDENTIFY_PATH
is nowCONVERTER_IM_IDENTIFY_PATH
Added options:
CONVERTER_GM_PATH
- File path to graphicsmagick’s program.CONVERTER_GRAPHICS_BACKEND
- Backend to use:ImageMagick
orGraphicMagick
Raise ImportError and notify user when specifying a non existant converter graphics backend
Fixed issue #4, avoid circular import in permissions/__init__.py
Add a user to a default role only when the user is created
Added total page count to statistics view
Added support to disable the default scrolling JS code included in web_theme app, saving some KBs in transfer
Clear last ocr results when requeueing a document
Removed the ‘exists’ column in document list view, diagnostics superceded this
Added 3rd party sendfile app (support apache’s X-sendfile)
Updated the get_document_image view to use the new sendfile app
Fixed the issue of the strip spaces middleware conflicting with downloads
Removed custom IE9 tags
Closed Issue #6
Allow deletion of non existing documents from OCR queue
Allow OCR requeue of pending documents
Invalid page numbers now raise Http404, not found instead of error
Added an additional check to lower the chance of OCR race conditions between nodes
Introduce a random delay to each node to further reduce the chance of a race condition, until row locking can be implemented or is implemented by Django
Moved navigation code to its own app
Reimplemented OCR delay code, only delay new document Added a new field: delay, update your database schema accordingly
Made the concurrent ocr code more granular, per node, every node can handle different amounts of concurrent ocr tasks Added a new field: node_name, update your database schema acordinging
Reduced default ocr delay time
Added a new diagnostics tab under the tools menu
Added a new option
OCR_REPLICATION_DELAY
to allow the storage some time for replication before attempting to do OCR to a documentAdded OCR multi document re-queue and delete support
Added simple statistics page (total used storage, total docs, etc)
Implemented form based and button based multi item actions (button based by default)
Added multi document delete
Fixed a few HTML validation errors
Issues are now tracked using github
Added indexing flags to ocr model
Small optimization in document list view
Small search optimization
Display “DEBUG mode” string in title if
DEBUG
variable is set to TrueAdded the fix-permissions bash script under misc/ folder
Plugged another file descriptor leak
Show class name in config settings view
Added missing config option from the setup menu
Close file descriptor to avoid leaks
Don’t allow duplicate documents in queues
Don’t raise
PermissionDenied
exception inPermissionDenied middleware
, even while debuggingFixed page number detection
Created ‘simple document’ for non technical users with all of a document pages content
Use document preview code for staging file also
Error picture literal name removal
Spanish translation updates
Show document file path in regards of its storage
Added new setting: side bar search box
Implemented new
PermissioDenied
exception middleware handlerPermissions app api now returns a
PermissionDenied
exception instead of a custom oneAdded new 403 error template
Updated the 404 template to display only a not found message
Moved the login required middleware to the common app
Fixed search app’s model.objects.filter indentation, improved result count calculation
Added dynamic comparison types to search app
Separated search code from view code
Correctly calculate show result count for multi model searches
Fixed OCR queue list showing wrong thumbnail
Fixed staging file preview
Show current metadata in document upload view sidebar
Show sentry login for admin users
Do not reinitialize document queue and/or queued document on reentry
Try extra hard not to assign same uuid to two documents
Added new transformation preview size setting
Renamed document queue state links
Changed ocr status display sidebar from form based to text based
Added document action to clear all the document’s page transformations
Allow search across related fields
Optimzed search for speed and memory footprint
Added
LIMIT
setting to searchShow search elapsed time on result page
Converter now differentiates between unknown file format and convert errors
Close file descriptors when executing external programs to prevent/reduce file descriptior leaks
Improved exception handling of external programs
Show document thumbnail in document ocr queue list
Make ocr document date submitted column non breakable
Fix permissions, directories set to mode 755 and files to mode 644
Try to fix issue #2, “random ORM field error on search while doing OCR”
Added configurable location setting for file based storage
Prepend storage name to differentiate config options
Fixed duplicated document search
Optimized document duplicate search
Added locale middleware, menu bar language switching works now
Only show language selection list if localemiddleware is active
Spanish translation updates
Added links, views and permissions to disable or enable an OCR queue
Enabled Django’s template caching
Added document queue property side bar window to the document queue list view
Added HTML spaceless middleware to remove whitespace in HTML code
If current user is superuser or staff show thumbnail & preview generation error messages
Added a setting to show document thumbnail in metadata group list
Started adding configurations setting descriptions
Initial GridFS storage support
Implemented size and delete methods for GridFS
Implement GridFS storage user settings
Added document link in the OCR document queue list
Link to manually re queue failed OCR
Don’t separate links (encose object list links with white-space: nowrap;)
Added document description to the field search list
Sort OCR queued documents according to submitted date & time
Document filesystem serving is now a separate app
Steps to update (Some warnings may be returned, but these are not fatal as they might be related to missing metadata in some documents):
rename the following settings:
DOCUMENTS_FILESYSTEM_FILESERVING_ENABLE
toFILESYSTEM_FILESERVING_ENABLE
DOCUMENTS_FILESYSTEM_FILESERVING_PATH
toFILESYSTEM_FILESERVING_PATH
DOCUMENTS_FILESYSTEM_SLUGIFY_PATHS
toFILESYSTEM_SLUGIFY_PATHS
DOCUMENTS_FILESYSTEM_MAX_RENAME_COUNT
toFILESYSTEM_MAX_RENAME_COUNT
Do a ./manage.py syncdb
Execute ‘Recreate index links’ locate in the tools menu
Wait a few minutes
Added per document duplicate search and a tools menu option to seach all duplicated documents
Added document tool that deletes and re-creates all documents filesystem links
Increased document’s and document metadata index filename field’s size to 255 characters
Added sentry to monitor and store error for later debugging
Zip files can now be uncompressed in memory and their content uploaded individually in one step
Added support for concurrent, queued OCR processing using celery
Apply default transformations to document before OCR
Added unpaper to the OCR convertion pipe
Added views to create, edit and grant/revoke permissions to roles
Added multipage documents support (only tested on pdfs)
To update a previous database do: [d.update_page_count() for d in Document.objects.all()]
Added support for document page transformation (no GUI yet)
Added permissions and roles support
Added python-magic for smarter MIME type detection (https://github.com/ahupp/python-magic).
Added a new Document model field: file_mime_encoding.
Show only document metadata in document list view.
If one document type exists, the create document wizard skips the first step.
Changed to a liquid css grid
Added the ability to group documents by their metadata
New abstracted options to adjust document conversion quality (default, low, high)