Tuning and Configuring Server

Combine is designed to handle sets of metadata small to large, 400 to 4,000,000 Records. Some of the major associated server components include:

MySQL

store Records and their associated, full XML documents

store Transformations, Validations, and most other enduring, user defined data

store transactions from Validations, OAI requests, etc.

ElasticSearch

used for indexing mapped fields from Records

main engine of field-level analysis

Apache Spark

the workhorse for running Jobs, including Harvests, Transformations, Validations, etc.

Apache Livy

used to send and queue Jobs to Spark

Django

the GUI

Django Background Tasks

for long running tasks that may that would otherwise prevent the GUI from being responsive

includes deleting, re-indexing, exporting Jobs, etc.

Given the relative complexity of this stack, and the innerconnected nature of the components, Combine is designed to be deployed via an Ansible playbook, which you can read more about here. The default build requires 8g of RAM, with the more CPU cores the better.

This part of the documentation aims to explain, and indicate how to modify of configure, some of the these critical components.

Tuning and Configuring Server

MySQL

ElasticSearch

Apache Spark

Apache Livy

Django

Django Background Tasks