Version 0.5

  • Added tag list view and global tag delete support
  • Added tag editing view and listing documents with an specific tag
  • Changed the previewing and deleting staging files views to required DOCUMENT_CREATE permission
  • Added no-parent-history class to document page links so that iframe clicking doesn’t affect the parent window history
    • Fixes back button issue on Chrome 9 & 10
  • Added per app version display tag
  • Added loading spinner animation
  • Messages tweaks and translation updates
  • Converter app cleanups, document pre-cache, magic number removal
  • Added OCR view displaying all active OCR tasks from all cluster nodes
  • Disabled CELERY_DISABLE_RATE_LIMITS by default
  • Implement local task locking using Django locmem cache backend
  • Added doc extension to office document format list
  • Removed redundant transformation calculation
  • Make sure OCR in processing documents cannot be deleted
  • PEP8, pylint cleanups and removal of relative imports
  • Removed the obsolete DOCUMENTS_GROUP_MAX_RESULTS setting option
  • Improved visual appearance of messages by displaying them outside the main form
  • Added link to close all notifications with one click
  • Made the queue processing interval configurable by means of a new setting: OCR_QUEUE_PROCESSING_INTERVAL
  • Added detection and reset of orphaned ocr documents being left as ‘processing’ when celery dies
  • Improved unknown format detection in the graphicsmagick backend
  • Improved document convertion API
  • Added initial support for converting office documents (only ods and docx tested)
  • Added sample configuration files for supervisor and apache under contrib/
  • Avoid duplicates in recent document list
  • Added the configuration option CONVERTER_GM_SETTINGS to pass GraphicsMagicks specific commands the the GM backend
  • Lower image convertion quality if the format is jpg
  • Inverted the rotation button, more intuitive this way
  • Merged and reduced the document page zoom and rotation views
  • Increased permissions app permission’s label field size
    • DB Update required
  • Added support for metadata group actions
  • Reduced the document pages widget size
  • Display the metadata group numeric total in the metadata group form title
  • Reorganized page detail icons
  • Added first & last page navigation links to document page view
  • Added interactive zoom support to document page detail view
  • Spanish translation updates
  • Added DOCUMENTS_ZOOM_PERCENT_STEP, DOCUMENTS_ZOOM_MAX_LEVEL, DOCUMENTS_ZOOM_MIN_LEVEL configuration options to allow detailed zoom control
  • Added interactive document page view rotation support
  • Changed the side bar document grouping with carousel style document grouping form widget
  • Removed the obsolete DOCUMENTS_TRANFORMATION_PREVIEW_SIZE and DOCUMENTS_GROUP_SHOW_THUMBNAIL setting options
  • Improved double submit prevention
  • Added a direct rename field to the local update and staging upload forms
  • Separated document page detail view into document text and document image views
  • Added grab-scroll to document page view
  • Disabled submit buttons and any buttons when during a form submit
  • Updated the page preview widget to display a infinite-style horizontal carousel of page previews
  • Added support user document folders
    • Must do a syncdb to add the new tables
  • Added support for listing the most recent accessed documents per user
  • Added document page navigation
  • Fixed diagnostics url resolution
  • Added confirmation dialog to document’s find missing document file diagnostic
  • Added a document page edit view
  • Added support for the command line program pdftotext from the poppler-utils packages to extract text from PDF documents without doing OCR
  • Fixed document description editing
  • Replaced page break text with page number when displaying document content
  • Implemented detail form readonly fields the correct way, this fixes copy & paste issues with Firefox
  • New document page view
  • Added view to add or remove user to a specific role
  • Updated the jQuery packages with the web_theme app to version 1.5.2
  • Made AVAILABLE_INDEXING_FUNCTION setting a setting of the documents app instead of the filesystem_serving app
  • Fixed document download in FireFox for documents containing spaces in the filename
  • If mime detection fails set mime type to ‘’ instead of ‘unknown’
  • Use document MIME type when downloading otherwise use ‘application/octet-stream’ if none
  • Changed the way document page count is parsed from the graphics backend, fixing issue #7
  • Optimized document metadata query and display
  • Implemented OCR output cleanups for English and Spanish
  • Redirect user to the website entry point if already logged and lands in the login template
  • Changed from using SimpleUploadedFile class to stream file to the simpler File class wrapper
  • Updated staging files previews to use sendfile instead of serve_file
  • Moved staging file preview creation logic from documents.views to staging.py
  • When deleting staging file, it’s cached preview is also deleted
  • Added a new setup option:
    • FILESYSTEM_INDEXING_AVAILABLE_FUNCTIONS - a dictionary to allow users to add custom functions
  • Made automatic OCR a function of the OCR app and not of Documents app (via signals)
    • Renamed setup option DOCUMENT_AUTOMATIC_OCR to OCR_AUTOMATIC_OCR
  • Clear node name when requeueing a document for OCR
  • Added support for editing the metadata of multiple documents at the same time
  • Added Graphics magick support by means of user selectable graphic convertion backends
    • Some settings renamed to support this change:
      • CONVERTER_CONVERT_PATH is now CONVERTER_IM_CONVERT_PATH
      • CONVERTER_IDENTIFY_PATH is now CONVERTER_IM_IDENTIFY_PATH
    • Added options:
      • CONVERTER_GM_PATH - File path to graphicsmagick’s program.
      • CONVERTER_GRAPHICS_BACKEND - Backend to use: ImageMagick or GraphicMagick
  • Raise ImportError and notify user when specifying a non existant converter graphics backend
  • Fixed issue #4, avoid circular import in permissions/__init__.py
  • Add a user to a default role only when the user is created
  • Added total page count to statistics view
  • Added support to disable the default scrolling JS code included in web_theme app, saving some KBs in transfer
  • Clear last ocr results when requeueing a document
  • Removed the ‘exists’ column in document list view, diagnostics superceded this
  • Added 3rd party sendfile app (support apache’s X-sendfile)
  • Updated the get_document_image view to use the new sendfile app
  • Fixed the issue of the strip spaces middleware conflicting with downloads
  • Removed custom IE9 tags
  • Closed Issue #6
  • Allow deletion of non existing documents from OCR queue
  • Allow OCR requeue of pending documents
  • Invalid page numbers now raise Http404, not found instead of error
  • Added an additional check to lower the chance of OCR race conditions between nodes
  • Introduce a random delay to each node to further reduce the chance of a race condition, until row locking can be implemented or is implemented by Django
  • Moved navigation code to its own app
  • Reimplemented OCR delay code, only delay new document Added a new field: delay, update your database schema accordingly
  • Made the concurrent ocr code more granular, per node, every node can handle different amounts of concurrent ocr tasks Added a new field: node_name, update your database schema acordinging
  • Reduced default ocr delay time
  • Added a new diagnostics tab under the tools menu
  • Added a new option OCR_REPLICATION_DELAY to allow the storage some time for replication before attempting to do OCR to a document
  • Added OCR multi document re-queue and delete support
  • Added simple statistics page (total used storage, total docs, etc)
  • Implemented form based and button based multi item actions (button based by default)
  • Added multi document delete
  • Fixed a few HTML validation errors
  • Issues are now tracked using github
  • Added indexing flags to ocr model
  • Small optimization in document list view
  • Small search optimization
  • Display “DEBUG mode” string in title if DEBUG variable is set to True
  • Added the fix-permissions bash script under misc/ folder
  • Plugged another file descriptor leak
  • Show class name in config settings view
  • Added missing config option from the setup menu
  • Close file descriptor to avoid leaks
  • Don’t allow duplicate documents in queues
  • Don’t raise PermissionDenied exception in PermissionDenied middleware, even while debugging
  • Fixed page number detection
  • Created ‘simple document’ for non technical users with all of a document pages content
  • Use document preview code for staging file also
  • Error picture literal name removal
  • Spanish translation updates
  • Show document file path in regards of its storage
  • Added new setting: side bar search box
  • Implemented new PermissioDenied exception middleware handler
  • Permissions app api now returns a PermissionDenied exception instead of a custom one
  • Added new 403 error template
  • Updated the 404 template to display only a not found message
  • Moved the login required middleware to the common app
  • Fixed search app’s model.objects.filter indentation, improved result count calculation
  • Added dynamic comparison types to search app
  • Separated search code from view code
  • Correctly calculate show result count for multi model searches
  • Fixed OCR queue list showing wrong thumbnail
  • Fixed staging file preview
  • Show current metadata in document upload view sidebar
  • Show sentry login for admin users
  • Do not reinitialize document queue and/or queued document on reentry
  • Try extra hard not to assign same uuid to two documents
  • Added new transformation preview size setting
  • Renamed document queue state links
  • Changed ocr status display sidebar from form based to text based
  • Added document action to clear all the document’s page transformations
  • Allow search across related fields
  • Optimzed search for speed and memory footprint
  • Added LIMIT setting to search
  • Show search elapsed time on result page
  • Converter now differentiates between unknown file format and convert errors
  • Close file descriptors when executing external programs to prevent/reduce file descriptior leaks
  • Improved exception handling of external programs
  • Show document thumbnail in document ocr queue list
  • Make ocr document date submitted column non breakable
  • Fix permissions, directories set to mode 755 and files to mode 644
  • Try to fix issue #2, “random ORM field error on search while doing OCR”
  • Added configurable location setting for file based storage
  • Prepend storage name to differentiate config options
  • Fixed duplicated document search
  • Optimized document duplicate search
  • Added locale middleware, menu bar language switching works now
  • Only show language selection list if localemiddleware is active
  • Spanish translation updates
  • Added links, views and permissions to disable or enable an OCR queue
  • Enabled Django’s template caching
  • Added document queue property side bar window to the document queue list view
  • Added HTML spaceless middleware to remove whitespace in HTML code
  • If current user is superuser or staff show thumbnail & preview generation error messages
  • Added a setting to show document thumbnail in metadata group list
  • Started adding configurations setting descriptions
  • Initial GridFS storage support
  • Implemented size and delete methods for GridFS
  • Implement GridFS storage user settings
  • Added document link in the OCR document queue list
  • Link to manually re queue failed OCR
  • Don’t separate links (encose object list links with white-space: nowrap;)
  • Added document description to the field search list
  • Sort OCR queued documents according to submitted date & time
  • Document filesystem serving is now a separate app
    • Steps to update (Some warnings may be returned, but these are not fatal as they might be related to missing metadata in some documents):
      • rename the following settings:
        • DOCUMENTS_FILESYSTEM_FILESERVING_ENABLE to FILESYSTEM_FILESERVING_ENABLE
        • DOCUMENTS_FILESYSTEM_FILESERVING_PATH to FILESYSTEM_FILESERVING_PATH
        • DOCUMENTS_FILESYSTEM_SLUGIFY_PATHS to FILESYSTEM_SLUGIFY_PATHS
        • DOCUMENTS_FILESYSTEM_MAX_RENAME_COUNT to FILESYSTEM_MAX_RENAME_COUNT
      • Do a ./manage.py syncdb
      • Execute ‘Recreate index links’ locate in the tools menu
      • Wait a few minutes
  • Added per document duplicate search and a tools menu option to seach all duplicated documents
  • Added document tool that deletes and re-creates all documents filesystem links
  • Increased document’s and document metadata index filename field’s size to 255 characters
  • Added sentry to monitor and store error for later debugging
  • Zip files can now be uncompressed in memory and their content uploaded individually in one step
  • Added support for concurrent, queued OCR processing using celery
  • Apply default transformations to document before OCR
  • Added unpaper to the OCR convertion pipe
  • Added views to create, edit and grant/revoke permissions to roles
  • Added multipage documents support (only tested on pdfs)
    • To update a previous database do: [d.update_page_count() for d in Document.objects.all()]
  • Added support for document page transformation (no GUI yet)
  • Added permissions and roles support
  • Added python-magic for smarter MIME type detection (https://github.com/ahupp/python-magic).
  • Added a new Document model field: file_mime_encoding.
  • Show only document metadata in document list view.
  • If one document type exists, the create document wizard skips the first step.
  • Changed to a liquid css grid
  • Added the ability to group documents by their metadata
  • New abstracted options to adjust document conversion quality (default, low, high)