Version 4.5
Released: September 12, 2023
Status: Stable
Changes
Converter
The PyPDF2
package was replaced with the original pypdf
package and updated
to the latest version. Several PDF parsing and page introspection issues
were solved due to this change.
Transformation layers now support icon classes and don’t expect a FontAwesome identifier anymore.
Credentials
The commercial credentials app was generalized and added to the core apps. This app provides a centralized location to store and protect external authentication credentials. By default two credential backends are provided: token, username and password. The credential backend system is extensible and other credential systems can be added.
Apps that use external authentication, like the mailer and sources, were updated to use credentials in their setup forms. In the case of features that use optional external credentials or where the credentials are the result of a template, like the HTTP workflow action, staging storage source, and watch storage source, the credential is selected and passed as a variable to the template.
Dependencies
Update Python dependencies version:
PIP from 22.2 to 23.2.1
Redis from 4.2.0 to 4.6.0
Wheel from 0.37.0 to 0.41.0
Bleach from 4.1.0 to 6.0.0
django-auth-ldap from 4.0.0 to 4.4.0
PyYAML from 6.0 to 6.0.1
importlib-metadata from 5.0.0 to 6.8.0
requests from 1.14.3 to 2.0.4
django-extensions from 3.1.5 to 3.2.3
django-rosetta from 0.9.8 to 0.9.9
django-silk from 4.3.0 to 5.0.3
flake8 from 4.0.1 to 6.1.0
ipython from 7.32.0 to 8.14.0
twine from 3.8.0 to 4.0.2
Pillow from 9.4.0 to 10.0.0
dateparser from 1.1.1 to 1.1.8
elasticsearch from 7.17.1 to 7.17.9
elasticsearch-dsl from 7.4.0 to 7.4.1
python-magic from 0.4.26 to 0.4.27
gunicorn from 20.1.0 to 21.2.0
sentry-sdk from 1.12.1 to 1.29.0
whitenoise from 6.2.0 to 6.5.0
django-cors-headers from 3.10.0 to 4.2.0
drf-yasg from 1.21.4 to 1.21.7
jsonschema from 4.4.0 to 4.18.0
swagger-spec-validator from 2.7.4 to 3.0.3
boto3 from 1.24.70 to 1.28.16
django-storages from 1.13.1 to 1.13.2
extract-msg from 0.36.4 to 0.37.1
pycryptodome from 3.10.4 to 3.18.0
celery from 5.2.7 to 5.3.1
django-celery-beat from 2.2.1 to 2.3.0
django-formtools from 2.3 to 2.4.1
psycopg2 from 2.9.3 to 2.9.6
The Mayan EDMS version checking was improved.
Support was added for comparing versions. This is relevant for custom versions of Mayan EDMS when deciding when to merge back from upstream.
When reporting version mismatches the different versions numbers are shown.
Improved how dependencies copyright and license information is extracted.
Development
Support was added for PEP-0440 local versions.
The release exporter was updated to use pathlib
for internal path
computations. Support for bbcode
was removed.
The release exporter no longer requires Mayan EDMS or Django to work.
Support for configurable release directory location was added.
Custom version parsing code was replaced with a wrapper for the Python
packaging
library.
Support to extract and manipulate more parts of the version string like the pre-release and post release parts was added.
Docker
The Debian Docker base image from version 11.5-slim to 12.1 “Bookworm”.
Environment variables were added to expand configuration of services and
function of the Docker Compose file. The environment variable
MAYAN_DOCKER_RABBITMQ_HOSTNAME
was added to control the the default host
for the Celery broker URL.
The environment variable MAYAN_DOCKER_RABBITMQ_HOSTNAME
also sets
the hostname of the RabbitMQ service container which enables persistent
queue content even after shutting down the Docker Compose stack.
The Redis Celery result database was made configurable via
MAYAN_REDIS_RESULT_DATABASE
and defaults to 1
.
The Mayan EDMS Redis lock database was made configurable via
MAYAN_REDIS_LOCK_MANAGER_DATABASE
and defaults to 2
.
A note regarding opening up RabbitMQ data port was added.
Docker image versions were updated:
mysql from 8.0.32 to 8.0.34
docker from version 20.10.21-dind to 23.10.6-dind
postgresql from 13.10-alpine to 13.11-alpine
python 3.10.11-slim to 3.11.4-slim
rabbitmq from 3.11.13-alpine to 3.12.2-alpine
redis from 7.0.10-alpine 7.0.12-alpine
Add container dependency was added to ensure containers are started only
after the setup_or_upgrade
containers finishes which properly initializes
the database and does migrations on subsequent startups. This results
is more predictable and repeatable deployments at the expense of an small
increase in startup time.
Support for Django’s CONN_MAX_AGE
was added via the new environment
variable MAYAN_DATABASE_CONN_MAX_AGE
.
The the PostgreSQL container command arguments were updated to increase the performance and scalability of the out-of-the-box installation.
Added a maximum Docker logging size for all Mayan EDMS containers. The values are 3 log files of a maximum 100MB each.
Added the multi_container
profile. This allows easy switching from a
single all-in-one container Docker Compose deployment to a multi container
deployment. The multi container profile uses substantially more resources and
is not meant for basic installations, or for installations running on NAS
or other small appliances.
The Keycloak database name is now ensured to be the same as the Keycloak PostreSQL one.
Renamed all environment variables containing POSTGRES
to use the full
name POSTGRESQL
. These are: MAYAN_DOCKER_KEYCLOAK_POSTGRES_TAG
,
MAYAN_KEYCLOAK_POSTGRES_VOLUME
, MAYAN_DOCKER_POSTGRES_IMAGE
,
MAYAN_POSTGRES_VOLUME
.
Added support for changing the worker log level via the new environment
variable MAYAN_WORKER_LOG_LEVEL
which defaults to ERROR
.
Documentation
Support per app documentation. Each app can include its own documentation which is collected at build time.
Documents
The document file deletion operation was updated to now be a background task. Requesting a document file deletion will no longer return a 204 status but instead code 202 (Accepted).
Updated the type of the document file size field to a
PositiveBigIntegerField
to allow tracking document files bigger than
2GB in quota queries.
Added support for per document type document stub pruning. This change adds
the document type fields document_stub_pruning_enabled
,
document_stub_expiration_interval
, and removes the setting
DOCUMENTS_STUB_EXPIRATION_INTERVAL
which is now configured per
document type. By default pruning of document stubs is enabled to preserve
the existing behavior. Disabling document stub pruning can be used to
support document archiving where the document files are deleted but the
document database information is kept for reference.
All references to document type deletion policies were renamed to document type retention policies.
The active version and latest file attributes of documents were updated to be stored fields instead of computed values. This increases performance is large installations by a significant margin. Migrations for large installation will take longer in order to populate these fields, account for these extended times during the upgrade.
Added a new document file introspection link and view. This view re-scans the document file and populates the size, checksum, and mimetype of files. It also updates the document file page count and creates a new document version linking all discovered file pages. This view replaces the previous document file page count update link and view.
Deleting a document file page will now also delete any document version pages that link to it. This avoid left over invalid page references.
New document versions create manually will no longer become active by default. Only new document versions created as a result of a document file upload will become active by default.
The new document file pages and document version pages code was updated to create the objects as a single bulk operation. The documents queues where also split into more smaller queues that can be spread over more workers increasing scalability.
Removed the unused signal signal_post_document_created
. Use the
signal_post_document_file_upload
instead or Django’s default
post_save
signal for the Document
sender.
The document tasks callback interface was updated to accept a single dictionary for all the task and subtasks callbacks instead of having to pass different structures for each callback.
Document downloads and document exports were moved to their own queues.
Downloads
The API for the download file area was added. This API allows listing, deleting, and download actions.
Support was added for document download message templating. This allows customizing the message users receive when their document or document bundle is ready for download.
The document and document file creation code paths were updated to occur in smaller units that can be spread over multiple workers easier to increase parallel upload speeds and scalability. This is very important when performing multi million initial document imports or migrations from existing systems.
Duplicates
Update duplicate bulk creation was updated to work in batches of 100 entries in order to avoid overwhelming the database cache when a large set of documents is uploaded.
Duplicates task queue was moved to the C worker.
Events
The EventManager
classes were moved to their own module. Update your
custom app imports to handle this change.
The event system was updated to work in asynchronous mode. This increase the performance of the event system by several orders of magnitude but has the disadvantage that events might not be processed in the same order they occurred. Update any automation or workflow to account for this change. Future improvement are planned to provide order-of-execution support.
To allow disabling asynchronous mode and reverting to the previous behavior
the setting EVENTS_DISABLE_ASYNCHRONOUS_MODE
was added.
The events app queues were split into two to process fast and slow operations separately.
File caching
Updated the file cache maximum_size
field from a BigIntegerField
to
a PositiveBigIntegerField
. This removes the need to do validation for
negative numbers.
The cache model field maximum_size
was promoted to a database index to
speed up cache calculations.
Widgets for file cache were added to the administrator dashboard.
Updated the cache and cache partition purge loop to continue executing even when there are files that cannot be purged. Cache partition files that could not be deleted will be skipped and retried on the next purge execution.
Indexing
The logic of the indexes was updated to remove documents even if the index is disabled.
Another wave of indexing algorithm optimization updates was merged for increased update performance for centi million document installations and for installations under heavy load.
The size of the document indexing value field was updated from 128 to 255 characters to allow for large template results. This field is indexed and the increase does not affect performance.
Search
Several redundant fields were removed from the search index. This includes document level fields from document files, document file pages, document version, and document version pages.
The advanced search form was improve to use fieldsets. It will also now sort the search fields.
A new worker E was added and devoted to search tasks. This isolates the search index uploads and prevents them from slowing down other tasks.
Settings
Added a new setting named CONFIGURATION_FILE_IGNORE
which cause the
setting system to not load settings from the config.yml
file or save the
current configuration to the config_backup.yml
file. This is used on
immutable installations and now during testing.
Custom setting cache implementation was removed in favor of Python’s
functools.cache
.
A set_value
method was added to allow overriding a bootstrap
setting’s value.
Support was added for passing a global_symbol_table
argument when
updating the setting namespace global symbol table. This is used by
enterprise installations that require setting partitioning based on
deployment or platform logic.
Sources
When documents are upload, information about their source, like where they came from and who sent them is now stored in a new structure that continues to be accessible as long as the document exists. This helps access this information easily via the new `` {{ document.source_metadata_value_of.source_id }}`` template helper.
By default all uploads store the ID of the source used. Other backends like the email sources store more information like the sender, receiver, subject, message ID.
This feature replaces the metadata assignment in the email sources, works for all source types, and allows for more values to be stored other than just the previous sender, subject, and message ID.
The next wave of sources app refactor patches were merged. These split the single sources app into separate apps per source type. This allows adding custom sources much easier. Individual source apps can also be disabled independently.
Two new source were added: staging storage and watch storage. These work like the staging folder and watch folder but use a configure storage system that can access storages like object storage or remote storages. This allows the staging storage and watch storage to monitor files in remote location without having to first mount NFS or SMB directories as the OS level.
Specific source backend functionality was consolidated into reusable mixins. These work in a modular fashion and can be used to add capabilities to any built in or custom defined source very easily.
Fieldsets support was added to the source backend setup forms.
Support was added for single or multiple document API uploads. This feature is called immediate mode and allow receiving a document ID immediately after a file is submitted to a source via the API. This document ID allows working with the document while it is being processed. Since only one document ID can be returned per upload, using immediate mode will disable processing compressed files.
The entire source system was updated to work in asynchronous manner. Submitting large files via the staging or watch sources will work very fast regardless of the file being submitted even if the file is a multi gigabyte file. This change combined with object storage support for sources allows for uploading very large files and bypass HTTP and reverse proxy upload limits.
A new source actions system was merged. This change unifies and exposes the capabilities of each source via the API so that they can be inspected and used via the API or via Python code exactly as they are used via the HTTP user interface. Each action defines what interface they support along with the arguments that each interface requires. In effect this is a minimal custom MVC framework for sources.
Reduces the size of the secondary icon of the
FontAwesomeDualClassesDriver
drive to make the source metadata
icon more readable.
Updated the source backend’s get_upload_form_class
attribute to be an
instance method and allow backends to dynamically change the form fields.
Custom source can now enable or disable fields based on user access or other
system state.
Staging folder file image cache errors are now ignored if the image cache is not already generated when deleting the staging folder file.
The SANE scanner source was updated to perform the scan as a background task. This means that the scan might not happen immediately when the scan form is submitted but won’t block the user interface and allows scanning documents of unlimited pages that would normally be interrupted by the HTTP timeout.
The document upload wizard was updated to support user access filtering. This feature is used to filter cabinets, metadata and tags during upload based on the access of the logged user.
Support was added to disable the wizard next button conditionally. This is used to stop uploads when a required metadata type is not available to the user due to missing access.
Storage
The stale shared uploaded file and download file deletion loop was updated to continue executing even when there are files that cannot be deleted. Remaining files are skipped and retried on the next iteration.
Apps that handled shared upload files can now offload deletion of these files after use to the storage app. The documents app makes use of this new capability resulting in faster document processing during large document batch uploads.
Storage task queue was moved to the B worker.
Task manager
The shared “Tools” queue was removed. Each app is now responsible of defining its own queue for slow tools tasks.
Tasks queues were balanced among workers to avoid slower tasks from blocking faster tasks.
The task manager was updated to provide more information. It will now list workers, queue, and task types along with help information for each.
Increase default maximum worker tasks by 10x.
The default maximum memory per Celery worker child was increased from 300000 to 400000.
The options --without-gossip
and --without-heartbeat
were removed
from the run_worker
script. This undo a this previous Celery optimization
to ensure task queue processing does not freeze under some situations when
using RabbitMQ. The tasks queue processing is now more resilient to
communication failures, sudden shutdowns, and high latency network
situations.
Templating
Improved template initialization to support custom tag loading.
Testing
Added a system check named check_app_tests
to ensure Mayan apps tests
flag matches the actual state of the app’s tests existence.
Add improved test case tag inheritance to workaround a limitation in Django’s test tag inheritance that prevents tags from test mixins to propagate to the child classes.
The test system will now create a temporary MEDIA_ROOT
folder when
running tests. This change allows further isolation of testing artifacts.
Support was added for running test label/tag from the make file with
make test TAG=
.
User interface
On small screens, the main menu will now when a link is clicked.
Return links were added to the “Tools” and “Setup” areas to speed up navigation.
The AJAX loading spinner is shown in mobile devices.
The way the project title setting works was updated. The code was updated to reflect the actual purpose of the setting which is to identify an installation and not to do rebranding.
The way Tools and Setup view buttons are rendered was improved to ensure they do so with consistent heights always.
The Dropzone.js usage was improved and turned into to a Django widget for cleaner form integration.
The HTML widgets modules were split into HTML and column widget modules.
User management
All usage of the term “superuser” was updated to “super user” or “super_user”.
A warning message will now be displayed informing that it is not possible to change password of staff or super user accounts from the user interface.
Super user column was added to user list view.
Workflows
Workflow previews were updated to change the symbol to identify transitions, actions, and escalations with conditions from the math arrow to a math symbol for function (fn of).
Escalations are now presented in the workflow previews. Escalation hashes are now used to invalidate workflow previews caches.
Ensure the workflow state action column is not shown for the workflow state runtime proxies where is does not make sense to show.
An escalation list column was added to workflow states list view.
Transition are now shown in the workflow template state escalation list view.
Only the correct transitions can now be select for the corresponding workflow template state escalation in the user interface and the API.
Other
Add
ContentType
API detail view.PropertyHelper
updates:Move all
PropertyHelper
usage to their own modules.Add property helper
file_metadata_value_of
to document files.Formalize
PropertyHelper
behaviors and testing.Tag all
PropertyHelper
withclasses_property_helper
.
Removals
Dependencies
PyPDF2 is replaced with the original pypdf.
Documents
The queue named uploads has been split into multiple new queues. Ensure
Removal the unused signal signal_post_document_created
. Use the
signal_post_document_file_upload
instead of Django’s default
post_save
signal for the Document
sender.
Sources
Removal of the email metadata attachment support. Removal of the email message attribute to metadata support. These are replaced with the new source metadata feature.
Task manager
Tools” queue was removed Each app is now responsible of defining its own queue for slow tasks.
Removal of unused sources_fast queue.
Backward incompatible changes
Docker
Renamed all environment variables containing POSTGRES
to use the full
name POSTGRESQL
. These are: MAYAN_DOCKER_KEYCLOAK_POSTGRES_TAG
,
MAYAN_KEYCLOAK_POSTGRES_VOLUME
, MAYAN_DOCKER_POSTGRES_IMAGE
,
MAYAN_POSTGRES_VOLUME
.
Rename any usage of these variables in your .env
file.
Documents
Removal of the setting DOCUMENTS_STUB_EXPIRATION_INTERVAL
. Document
stub expiration interval is now configure on a per document type basis via
the user interface or API.
Removal of the unused signal signal_post_document_created
. Use Django’s
default post_save
signal for documents or
signal_post_document_file_upload
to handle new file uploads.
Search fields removed:
Cabinet membership of document removed from:
Document file
Document file page
Document version
Document version page
Document comments removed from:
Document file
Document file page
Document version
Document version page
Workflow transition comments removed from:
Document file
Document file page
Document version
Document version page
Document description removed from:
Document file page
Document version page
Document file MIME type removed from:
Document file page
Document version comment removed from:
Document version page
Metadata removed from:
Document file
Document file page
Document version
Document version page
Tags removed from:
Document file
Document file page
Document version
Document version page
Deleting a document file is now a background task and not an immediate operation. The document file view and API view are updated to reflect this. The result code of the document file deletion API view has changed from 204 (No content) to 202 (Accepted).
Document file actions now use a short name instead of a non descriptive number. The new action names are as follow:
1 is now ‘replace’ 2 is now ‘append’ 3 is now ‘keep’
Likewise the API field has changed from ‘action’ to ‘action_name’.
The queue named uploads has been split into multiple new queues. Ensure that the uploads queue is empty (there are no pending uploads or uploads in progress) before performing the upgrade. Failure to do so will cause pending uploads to be ignored and then deleted at the next garbage collection cycle.
Sources
Removal of YAML metadata attachments in email sources.
Removal of sender, subject, and message ID metadata collection.
Staging folder and watch folder now only work with real files and ignore symlinks.
Task manager
Removal of the “Tools” queue. Each app must now define its own queue for slow, long lived operations.
Workflows
The WorkflowTemplateStateActionSerializer
field action_path
is
now called backend_path
, and the field action_data
is now called
backend_data
.
The workflow template state action template no longer has direct access to
the document to which the workflow is attached via the variable
{{ document }}
. To access the document, do so indirectly via the
{{ workflow_instance }}
variable with
{{ workflow_instance.document }}
.
Deprecations
None
Issues closed
GitLab issue #664 Possible incorrect work of “metadata_value_of” with absent and unset metadata
GitLab issue #1037 Show which version of Mayan EDMS when checking for updates
GitLab issue #1113 main menu not collapsing after choice on small screens
GitLab issue #1135 Template tags and filters from app in COMMON_EXTRA_APPS no longer available
GitLab issue #1140 Loading spinner is not shown in mobile layout
GitLab issue #1148 Upgrade Documentation: need to include db name when restoring from SQL dump
GitLab issue #1152 Old PyPDF2 dependency: cannot read PDF with modern encryption