Defaultsettings
snoop.defaultsettings
#
Default settings file.
This file gets imported both on the Docker image and on the testing configuration.
Attributes#
ALLOWED_HOSTS
#
List of domains to allow requests for.
Loaded from environment variable SNOOP_HOSTNAME
, default is *
(no restrictions).
ALWAYS_QUEUE_NOW
#
Setting this to True disables the Task queueing system and executes Task functions in the foregrond. Used for testing.
base_dir
#
Helper pointing to root dir of repository.
CELERY_DB_REUSE_MAX
#
Instruct Celery to not reuse database connections.
CHILD_QUEUE_LIMIT
#
Limit for queueing large counts of children tasks.
DATABASE_ROUTERS
#
Activate our database router under snoop.data.collections.CollectionsRouter.
DATABASES
#
Django databases configuration.
Gets populated from the SNOOP_COLLECTIONS
constant at import
time.
DEBUG
#
Enable debug logging.
Loaded from environment variabe with same name.
DEFAULT_AUTO_FIELD
#
Define type for automatically generated primary keys. This is needed since Django 3.2.
DISPATCH_MAX_QUEUE_SIZE
#
Don't queue anything on a queue if its length is greater than this value.
DISPATCH_MIN_QUEUE_SIZE
#
If the task count on the queue is less than this value (70%), and if we would queue at least another DISPATCH_QUEUE_LIMIT, then dispatch more tasks. This is used to reduce waiting between batches.
DISPATCH_QUEUE_LIMIT
#
Count of pending tasks to trigger per collection when finding an empty queue.
A single worker core running zero-length tasks gets at most around 40 tasks/s, so to keep them all occupied for 5min: 12000
INSTALLED_APPS
#
List of Django apps to load.
LANGUAGE_CODE
#
Django locale.
MIDDLEWARE
#
List of Django middleware to load.
NLP_TEXT_LENGTH_LIMIT
#
Truncate text sent to NLP service after this many characters.
OCR_ENABLED
#
Flag to enable/disable OCR processing.
OCR_PROCESSES_PER_DOC
#
Number of parallel OCR processes used by this task with pdf2pdfocr.py
REST_FRAMEWORK
#
Configuration for Django Rest Framework.
Disables authentication, allows all access. Sets JSON as the default input and output.
RETRY_LIMIT_TASKS
#
Number BROKEN/ERROR tasks to retry every minute, while their fail count has not reached the limit.
See TASK_RETRY_FAIL_LIMIT
.
SECRET_KEY
#
Django secret key.
Loaded from environment variabe with same name.
SILENCED_SYSTEM_CHECKS
#
Used to disable Django warnings.
SNOOP_CLEAR_MOUNTS_EVERY_TASK
#
Run "killall" on the various mount sub-processes started by the system. Only useful when running one worker per task, otherwise tasks will interfere with each other.
SNOOP_COLLECTIONS
#
Static configuration for the collections list and settings.
Provided througn environment variable at server boot time.
The DATABASES is expanded with the databases for all these collections here.
SNOOP_COLLECTIONS_ELASTICSEARCH_URL
#
URL pointing to Elasticsearch server.
SNOOP_DOCUMENT_CHILD_QUERY_LIMIT
#
Limit page size when listing directory children.
SNOOP_DOCUMENT_LOCATIONS_QUERY_LIMIT
#
Limit page size when listing document locations.
SNOOP_FEED_PAGE_SIZE
#
Pagination size for the /feed URLs.
Todo
remove this value, as the API is not used anymore.
SNOOP_NLP_URL
#
URL pointing to NLP server
SNOOP_RABBITMQ_HTTP_PASSWORD
#
Password for rabbitmq HTTP interface. Default 'guest'
SNOOP_RABBITMQ_HTTP_URL
#
URL pointing to RabbitMQ message queue.
Of the form "1.2.3.4:1234/_path/" (no "http://" prefix). Used to query queue lengths.
Username and password configs follow.
SNOOP_RABBITMQ_HTTP_USERNAME
#
Username for rabbitmq HTTP interface. Default 'guest'
SNOOP_S3FS_MOUNT_DIR
#
Location ono disk where s3fs mounts are stored.
SNOOP_S3FS_MOUNT_LIMIT
#
Global limit of parallel S3 mounts (buckets).
SNOOP_TASK_DISABLE_TAIL_QUEUE
#
Flag to disable queueing more tasks of same/different type after a task completes.
Useful for running tests with the Celery Eager executor, to avoid infinite loops.
SNOOP_TEMP_STORAGE
#
Full disk path pointing to temp storage.
SNOOP_TIKA_URL
#
URL pointing to Apache Tika server.
SNOOP_TOTAL_WORKER_COUNT
#
Rough total number of executors to be run on the system.
STATIC_ROOT
#
Full disk path to static directory on disk, for Django.
STATIC_URL
#
Url path pointing to static files, for Django.
SYNC_RETRY_LIMIT_DIRS
#
If there are no pending tasks, this is how many directories will be retried by sync every minute.
SYSTEM_QUEUES
#
List of "system queues" - celery that must be executed periodically.
One execution of any of these functions will work on all collections under a for
loop.
TABLES_SPLIT_FILE_ROW_COUNT
#
Number of rows inside each table splt. Limits the time spent by a single unarchive task to a few minutes, increasing parallelism.
This limits the number of children (row) documents for a given table to the inode performance limit of 4000 files per dir.
TASK_PREFIX
#
Prefix to add to all snoop task queues.
Todo
Remove this value, as it's not used anymore.
TASK_RETRY_AFTER_MINUTES
#
Errored tasks are retried at most every this number of minutes.
TASK_RETRY_FAIL_LIMIT
#
Errored tasks are retried at most this number of times.
The actual value is higher, since we retry very old tasks more times.
UNARCHIVE_THREADS
#
Number of threads that will be used by 7z to unarchive.
URL_PREFIX
#
Configuration to set the URL prefix for all service routes. For example: "snoop/".
WORKER_PREFETCH
#
Celery-rabbitmq prefetch count.
WORKER_TASK_LIMIT
#
Max tasks count to be finished by 1 worker process before restarting it.
WSGI_APPLICATION
#
Configure which WSGI application to use, for Django.