Manual Installation

This section describes manual installation, if you cannot or do not want to use Vagrant as indicated above. It also serves as installation guide for production. Dissemin is split in two parts:

  • the web frontend, powered by Django;
  • the tasks backend, powered by Celery.

Installing the tasks backend requires additional dependencies and is not necessary if you want to do light dev that does not require harvesting metadata or running author disambiguation. The next subsections describe how to install the frontend; the last one explains how to install the backend or how to bypass it in case you do not want to install it.

Frontend

Install Packages and Create Virtualenv

First, install the following dependencies (debian packages):

*postgresql postgresql-server-dev-all postgresql-client python3-venv build-essential libxml2-dev libxslt1-dev python3-dev gettext libjpeg-dev libffi-dev libmagickwand-dev gdal-bin*

Note

On Debian 10+ and Ubuntu 18+, libmagickwand has dropped PDF processing for security reason. To reenable you have to change the config to at least read access, e.g. with:

sudo sed -i 's/<policy domain="coder" rights="none" pattern="PDF" \/>/<policy domain="coder" rights="read" pattern="PDF" \/>/' /etc/ImageMagick-6/policy.xml

Make also sure to have pdftk installed.

Then, build a virtual environment to isolate all the python dependencies:

python3 -m venv .virtualenv
source .virtualenv/bin/activate
pip install --upgrade setuptools
pip install --upgrade pip
pip install -r requirements.txt

In case you want to use the development packages, run addiotionally:

pip install -r requirements-dev.txt

Database

Choose a unique database and user name (they can be identical), such as dissemin_myuni. Choose a strong password for your user:

sudo su postgres
psql
CREATE USER dissemin_myuni WITH PASSWORD 'b3a55787b3adc3913c2129205821765d';
ALTER USER dissemin_myuni CREATEDB;
CREATE DATABASE dissemin_myuni WITH OWNER dissemin_myuni;

Search Engine

Dissemin uses the Elasticsearch backend for Haystack. The current supported version is 2.x.x.

Download Elasticsearch and unzip it:

cd elasticsearch-<version>
./bin/elasticsearch    # Add -d to start elasticsearch in the background

Alternatively you can install the .rpm or .deb package, see the documentation of Elasticsearch for further information.

Make sure to set the initial heapsize accordingly.

Backend

Some features in Dissemin rely on an asynchronous tasks backend, celery. If you want to simplify your installation and ignore this asynchronous behaviour, you can add CELERY_ALWAYS_EAGER = True to your dissemin/settings/__init__.py. This way, all asynchronous tasks will be run from the main thread synchronously.

Otherwise, you need to run celery in a separate process. The rest of this subsection explains how.

Redis

The backend communicates with the frontend through a message passing infrastructure. We recommend redis for that (and the source code is configured for it). This serves also as a cache backend (to cache template fragments) and provides locks (to ensure that we do not fetch the publications of a given researcher twice, for instance).

First, install the redis server:

apt-get install redis-server

Celery

You can run Celery either in the shell or as daemon. The letter is recommend for production.

Shell

To run the backend (still in the virtualenv):

celery --app=dissemin.celery:app worker -B -l INFO

The -B option starts the scheduler for periodic tasks, the -l option sets the debug level to INFO.

Daemon

In production you want to run celery and celerybeat as a daemon and be controlled by systemd. celery and celerybeat are installed in the virtual environment of dissemin, so you have to take care to use this environment. In particular you should use the same user for Dissemin and Celery.

You should use the following sample files that are similar to the official sample files. The main differences are a different PYTHONPATH, respect of the virtual environment and stop command for celerybeat. Put this into /etc/default/celery and change CELERY_BIN path.:

# See
# http://docs.celeryproject.org/en/latest/userguide/daemonizing.html

CELERY_APP="dissemin.celery:app"
CELERYD_NODES="dissem"
CELERYD_OPTS=""
CELERY_BIN="/path/to/venv/bin/celery"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
CELERYD_LOG_FILE="/var/log/celery/%n.log"
CELERYD_LOG_LEVEL="INFO"

CELERYBEAT_SCHEDULE_FILE="/var/run/celery/beat-schedule"
CELERYBEAT_PID_FILE="/var/run/celery/beat.pid"
CELERYBEAT_LOG_FILE="/var/log/celery/beat.log"

For the celeryd systemd service put the following in /etc/systemd/system/celery.service and change WorkingDirectory to your dissemin root.:

[Unit]
Description=Celery service
After=network.target

[Service]
Type=forking
User=dissemin
Group=dissemin
Restart=always
EnvironmentFile=-/etc/default/celery
WorkingDirectory=/path/to/dissemin/
ExecStart=/bin/sh -c '${CELERY_BIN} -A ${CELERY_APP} multi start ${CELERYD_NODES} --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait ${CELERYD_NODES} --pidfile=${CELERYD_PID_FILE}'
ExecReload=/bin/sh -c '${CELERY_BIN} multi restart ${CELERYD_NODES} -A ${CELERY_APP} --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} ${CELERYD_OPTS}'

[Install]
WantedBy=multi-user.target

For the celerybeatd systemd service put the following in /etc/systemd/system/celerybeat.service and change WorkingDirectory to your dissemin root.:

[Unit]
Description=Celerybeat service
After=network.target

[Service]
Type=simple
User=dissemin
Group=dissemin
Restart=always
EnvironmentFile=-/etc/default/celery
WorkingDirectory=/path/to/dissemin/
ExecStart=/bin/sh -c 'PYTHONPATH=$(pwd) ${CELERY_BIN} -A ${CELERY_APP} beat --pidfile=${CELERYBEAT_PID_FILE} --logfile=${CELERYBEAT_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} -s ${CELERYBEAT_SCHEDULE_FILE}'
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

Note that we use /bin/sh -c to process the PYTHONPATH and ${CELERY_BIN}.

To make systemd create the necessary directories with permissions put the follwing into /etc/tmpfiles.d/celery.conf:

d /var/run/celery 0755 dissemin dissemin
d /var/log/celery 0755 dissemin dissemin

After that run systemctl daemon-reload to reload systemd service files and you are ready to use celery and celerybeat with systemd by calling:

systemctl start celery.service
systemctl start celerybeat.service

To make them start on boot call:

systemctl enable celery.service
systemctl enable celerybeat.service

Logrotate

Over time the logfiles of celery tend to get rather big, so you should enable log rotation. Celery does not complain if the log file is removed, it just opens it again.