Administration

Languages

The list of languages choices in the language dropdown used for documents is based on the current ISO 639 list. This list can be quite extensive. To reduce the number of languages available use the setting DOCUMENTS_LANGUAGE_CODES, and set it to a nested list of abbreviations. This setting can be found in the System ‣ Setup -> Settings -> Common menu.

For example, to reduce the list to just English and Spanish use

DOCUMENTS_LANGUAGE_CODES = ('eng', 'spa')

The default language to appear on the dropdown can also be configured using:

DOCUMENTS_LANGUAGE = 'spa'

Use the correct ISO 639-3 language abbreviation (https://en.wikipedia.org/wiki/ISO_639) as this code is used in several subsystems in Mayan EDMS such as the OCR app to determine how to interpret the document.

If using the Docker image, these settings can also be passed to the container as environment variables by prepending the MAYAN_ suffix.

Example:

-e MAYAN_DOCUMENTS_LANGUAGE_CODES='["eng", "spa"]'

For more information check out the environment variables chapter of the Settings topic.

Password reset

To use the password reset feature, administrative emails need to be configured. These are sent by the system itself and not by the users. Their usage and configuration is different than the email system used to share documents via email.

Sending administrative emails

To be able to send password reset emails configure the Django email settings via the configuration file.

Example:

EMAIL_BACKEND: django.core.mail.backends.smtp.EmailBackend
EMAIL_HOST: '<your smtp ip address or hostname>'
EMAIL_HOST_PASSWORD: '<your smtp password>'
EMAIL_HOST_USER: '<your smtp username>'
EMAIL_PORT: 25  # or 587 or your server's SMTP port
EMAIL_TIMEOUT:
EMAIL_USE_SSL: true
EMAIL_USE_TLS: false

To change the reference URL in the password reset emails on in the default document mailing template modify the COMMON_PROJECT_URL setting. For information on the different ways to change a setting check the Settings topic.

To test the email settings use the management command sendtestemail. Example:

mayan-edms.py sendtestemail myself@example.com

OCR backend

Mayan EDMS ships an OCR backend that uses the FLOSS engine Tesseract (https://github.com/tesseract-ocr/tesseract/), but it can use other engines. To support other engines crate a wrapper that subclasses the OCRBackendBase class defined in mayan/apps/ocr/classes. This subclass should expose the execute method. For an example of how the Tesseract backend is implemented take a look at the file mayan/apps/ocr/backends/tesseract.py

Once you create you own backend, in your local.py settings add the option OCR_BACKEND and point it to your new OCR backend class path.

The default value of OCR_BACKEND is "ocr.backends.tesseract.Tesseract"

To add support to OCR more languages when using Tesseract, install the corresponding language file. If using a Debian based OS, this command will display the available language files:

apt-cache search tesseract-ocr

If using the Docker image, pass the environment variable MAYAN_APT_INSTALLS with the corresponding Tesseract language option. Example:

-e MAYAN_APT_INSTALLS='tesseract-ocr-deu'

Backups

To backup your install of Mayan EDMS just copy the actual document files and the database content. If you are using the default storage backend, the document files should be found in the media folder of your installation.

To dump the content of your database manager refer to the documentation chapter regarding database data “dumping”.

Here is an example of how to perform a backup and a restore of a PostgreSQL database.

To dump the database into an SQL text file:

pg_dump -h <host> -U <database user> -c <database name> -W > `date +%Y-%m-%d"_"%H-%M-%S`.sql

Example:

pg_dump -h 127.0.0.1 -U mayan -c mayan -W > `date +%Y-%m-%d"_"%H-%M-%S`.sql

To restore the database from the SQL text file:

psql -h <host> -U <database user> -d <database name> -W -f <sql dump file>

Example:

psql -h 127.0.0.1 -U mayan -d mayan -W -f 2018-06-07_18-10-56.sql

Here is an example of how to perform a backup and a restore of a PostgreSQL Docker container using a compressed dump file. A dump file is not compatible or can be used interchangeable with an SQL text file.

To backup a PostgreSQL Docker container:

docker exec <container name> pg_dump -U <database user> -Fc -c <database name> > `date +%Y-%m-%d"_"%H-%M-%S`.dump

Example:

docker exec mayan-edms-db pg_dump -U mayan -Fc -c mayan > `date +%Y-%m-%d"_"%H-%M-%S`.dump

This will produce a compressed dump file with the current date and time as the filename.

To restore a PostgreSQL Docker container:

docker exec -i <container name> pg_restore -U <database user> -d <database name> < <dump file>

Since it is not possible to drop a currently open PostgreSQL database, this command must be used on a new and empty PostsgreSQL container.

Example:

docker run -d \
--name mayan-edms-pg-new \
--restart=always \
-p 5432:5432 \
-e POSTGRES_USER=mayan \
-e POSTGRES_DB=mayan \
-e POSTGRES_PASSWORD=mayanuserpass \
-v /docker-volumes/mayan-edms/postgres-new:/var/lib/postgresql/data \
-d postgres:9.5

docker exec -i mayan-edms-pg-new pg_restore -U mayan -d mayan < 2018-06-07_17-09-34.dump

More information at:

Scaling up

The default installation method fits most use cases. If you use case requires more speed or capacity here are some suggestion that can help you improve the performance of your installation.

Change the database manager

Use PostgreSQL or MySQL as the database manager. Tweak the memory setting of the database manager to increase memory allocation. More PostgreSQL specific examples are available in their wiki page: https://wiki.postgresql.org/wiki/Performance_Optimization

Increase the number of Gunicorn workers

The Gunicorn workers process HTTP requests and affect the speed at which the website responds.

If you are using the Docker image, change the value of the MAYAN_GUNICORN_WORKERS environment variable (check the Docker image chapter: Environment Variables). Normally this variable defaults to 2. Increase this number to match the number of CPU cores + 1.

If you are using the direct deployment methods, change the line that reads:

command = /opt/mayan-edms/bin/gunicorn -w 2 mayan.wsgi --max-requests 500 --max-requests-jitter 50 --worker-class gevent --bind 0.0.0.0:8000 --timeout 120

And increase the value of the -w 2 argument. This line is found in the [program:mayan-gunicorn] section of the supervisor configuration file.

Background task processing

The Celery workers are system processes that take care of the background tasks requested by the frontend interactions like document image rendering and periodic tasks like OCR. There are several dozen tasks defined in the code. These tasks are divided into queues based on the app of the relationship between the tasks. The queues by default are divided into three groups based on the speed at which they need to be processed. The document page image rendering for example is categorized as a high volume, short duration task. The OCR is a high volume, long duration task. Email checking is a low volume, medium duration tasks. It is not advisable to have the same worker processing OCR to process image rendering too. If the worker is processing several OCR tasks it will not be able to provide fast images when an user is browsing the user interface. This is why by default the queues are split into 3 workers: fast, medium, and slow.

The fast worker handles the queues:

  • converter: Handles document page rendering
  • sources_fast: Does staging file image rendering

The medium worker handles the queues:

  • checkouts_periodic: Scheduled tasks that check if a document’s checkout period has expired
  • documents_periodic:
  • indexing: Does reindexing of documents in the background when their properties change
  • metadata:
  • sources:
  • sources_periodic: Checking email accounts and watch folders for new documents.
  • uploads: Processes files to turn the into Mayan documents. Processing encompasses MIME type detection, page count detection.
  • documents:

The slow worker handles the queues:

  • mailing: Does the actual sending of documents via email as requested by users via the mailing profiles
  • tools: Executes in the background maintenance requests from the options in the tools menu
  • statistics: Recalculates statistics and charts
  • parsing: Parses documents to extract actual text content
  • ocr: Performs OCR to transcribe page images to text

Optimizations

  • Increase the number of workers and redistribute the queues among them (only possible with direct deployments).
  • Launch more workers to service a queue. For example for faster document image generation launch 2 workers to process the converter queue only possible with direct deployments).
  • By default each worker process uses 1 thread. You can increase the thread count of each worker process with the Docker environment options:
    • MAYAN_WORKER_FAST_CONCURRENCY
    • MAYAN_WORKER_MEDIUM_CONCURRENCY
    • MAYAN_WORKER_SLOW_CONCURRENCY
  • If using direct deployment, increase the value of the –concurrency=1 argument of each worker in the supervisor file. You can also remove this argument and let the Celery algorithm choose the number of threads to launch. Usually this defaults to the number of CPU cores + 1.

Change the message broker

Messages are the method of communication between front end interactive code and background tasks. In this regard messages can be thought as homologous to tasks requests. Improving how many messages can be sent, stored and sorted will impact the number of tasks the system can handle. To save on memory, the basic deployment method and the Docker image default to using Redis as a message broker. To increase capacity and reduce volatility of messages (pending tasks are not lost during shutdown) use RabbitMQ to shuffle messages.

For direct installs refer to the Advanced deployment documentation section for the required changes.

For the Docker image, launch a separate RabbitMQ container (https://hub.docker.com/_/rabbitmq/):

docker run -d --name mayan-edms-rabbitmq -e RABBITMQ_DEFAULT_USER=mayan -e RABBITMQ_DEFAULT_PASS=mayanrabbitmqpassword -e RABBITMQ_DEFAULT_VHOST=mayan rabbitmq:3

Pass the MAYAN_BROKER_URL environment variable (https://kombu.readthedocs.io/en/latest/userguide/connections.html#connection-urls) to the Mayan EDMS container so that it uses the RabbitMQ container the message broker:

-e MAYAN_BROKER_URL="amqp://mayan:mayanrabbitmqpassword@localhost:5672/mayan",

When tasks finish, they leave behind a return status or the result of a calculation, these are stored for a while so that whoever requested the background task, is able retrieve the result. These results are stored in the result storage. By default a Redis server is launched inside the Mayan EDMS container. You can launch a separate Docker Redis container and tell the Mayan EDMS container to use this via the MAYAN_CELERY_RESULT_BACKEND environment variable. The format of this variable is explained here: http://docs.celeryproject.org/en/3.1/configuration.html#celery-result-backend

Deployment type

Docker provides a faster deployment and the overhead is not high on modern systems. It is however memory and CPU limited by default and you need to increase this limits. The settings to change the container resource limits are here: https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory

For the best performance possible use the advanced deployment method on a host dedicated to serving only Mayan EDMS.

Storage

For best input and output speed use a block based local filesystem on an SSD drive for the /media sub folder. The location of the /media folder will be specified by the MEDIA_ROOT setting.

If capacity is your bottom line, switch to an object storage system.

Use additional hosts

When one host is not enough you can use multiple hosts and share the load. Make sure that all hosts share the /media folder as specified by the MEDIA_ROOT setting, also the database, the broker, and the result storage. One setting that needs to be changed in this configuration is the lock manager backend.

Resource locking is a technique to avoid two processes or tasks to modify the same resource at the same time causing a race condition. Mayan EDMS uses its own lock manager. By default the lock manager with use a simple file based lock backend ideal for single host installations. For multiple hosts installation the database backend must be used in other to coordinate the resource locks between the different hosts over a share data medium. This is accomplished by modifying the environment variable LOCK_MANAGER_BACKEND in both the direct deployment or the Docker image. Use the value lock_manager.backends.model_lock.ModelLock to switch to the database resource lock backend. If you can also write your own lock manager backend for other data sharing mediums with better performance than a relational database like Redis, Memcached, Zoo Keeper.

Database conversion

Version 3.1.x added a new management command to help convert data residing in an SQLite database to other database managers like PostgreSQL. Here is the conversion procedure.

Direct install

  • Make a backup of your existing SQLite database and documents by copying the /opt/mayan-edms/media folder.

  • Upgrade to at least version 3.1.3.

  • Migrate the existing SQLite database with the command performupgrade:

    sudo -u mayan MAYAN_MEDIA_ROOT=/opt/mayan-edms/media /opt/mayan-edms/bin/mayan-edms.py performupgrade
    
  • Install PostgreSQL:

    sudo apt-get install postgresql libpq-dev
    
  • Provision a PostgreSQL database:

    sudo -u postgres psql -c "CREATE USER mayan WITH password 'mayanuserpass';"
    sudo -u postgres createdb -O mayan mayan
    
  • Install the Python client for PostgreSQL:

    sudo -u mayan /opt/mayan-edms/bin/pip install --no-cache-dir psycopg2==2.7.3.2
    
  • Copy the newly created fallback config file:

    cp /opt/mayan-edms/media/config_backup.yml /opt/mayan-edms/media/config.yml
    
  • Edit the configuration file to add the entry for the PostgreSQL database and rename the SQLite database to ‘old’:

    # Before
    DATABASES:
      default:
        ATOMIC_REQUESTS: false
        AUTOCOMMIT: true
        CONN_MAX_AGE: 0
        ENGINE: django.db.backends.sqlite3
        HOST: ''
        NAME: /opt/mayan-edms/media/db.sqlite3
        OPTIONS: {}
        PASSWORD: ''
        PORT: ''
        TEST: {CHARSET: null, COLLATION: null, MIRROR: null, NAME: null}
        TIME_ZONE: null
        USER: ''
    
    # After
    DATABASES:
      old:
        ATOMIC_REQUESTS: false
        AUTOCOMMIT: true
        CONN_MAX_AGE: 0
        ENGINE: django.db.backends.sqlite3
        HOST: ''
        NAME: /opt/mayan-edms/media/db.sqlite3
        OPTIONS: {}
        PASSWORD: ''
        PORT: ''
        TEST: {CHARSET: null, COLLATION: null, MIRROR: null, NAME: null}
        TIME_ZONE: null
        USER: ''
      default:
        ATOMIC_REQUESTS: false
        AUTOCOMMIT: true
        CONN_MAX_AGE: 0
        ENGINE: django.db.backends.postgresql
        HOST: '127.0.0.1'
        NAME: /opt/mayan-edms/media/db.sqlite3
        OPTIONS: {}
        PASSWORD: 'mayanuserpass'
        PORT: ''
        TEST: {CHARSET: null, COLLATION: null, MIRROR: null, NAME: null}
        TIME_ZONE: null
        USER: 'mayan'
    
  • Migrate the new database to create the empty tables:

    sudo -u mayan MAYAN_DATABASE_ENGINE=django.db.backends.postgresql MAYAN_DATABASE_NAME=mayan MAYAN_DATABASE_PASSWORD=mayanuserpass MAYAN_DATABASE_USER=mayan MAYAN_DATABASE_HOST=127.0.0.1 MAYAN_MEDIA_ROOT=/opt/mayan-edms/media /opt/mayan-edms/bin/mayan-edms.py migrate
    
  • Convert the data in the SQLite and store it in the PostgreSQL database:

    sudo -u mayan MAYAN_DATABASE_ENGINE=django.db.backends.postgresql MAYAN_DATABASE_NAME=mayan MAYAN_DATABASE_PASSWORD=mayanuserpass MAYAN_DATABASE_USER=mayan MAYAN_DATABASE_HOST=127.0.0.1 MAYAN_MEDIA_ROOT=/opt/mayan-edms/media /opt/mayan-edms/bin/mayan-edms.py convertdb --from=old --to=default --force
    
  • Update the supervisor config file to have Mayan EDMS run from the PostgreSQL database:

    [supervisord]
    environment=
        <...>
        MAYAN_DATABASE_ENGINE=django.db.backends.postgresql,
        MAYAN_DATABASE_HOST=127.0.0.1,
        MAYAN_DATABASE_NAME=mayan,
        MAYAN_DATABASE_PASSWORD=mayanuserpass,
        MAYAN_DATABASE_USER=mayan,
        MAYAN_DATABASE_CONN_MAX_AGE=360,
        <...>