Writing Tests

This guide is for contributors writing tests for the QHAna plugin runner core or for individual plugins. It covers the pytest configuration, the test file locations, the shared fixtures and helpers, the recommended pattern for testing Celery tasks, how to run the suite locally, and how it is executed in CI.

The testing strategy is anchored in two architecture decision records:

Celery Task Testing Strategy. Celery tasks are tested with an in-process worker on an in-memory broker.
Co-locate plugin tests with plugin code. Plugin tests are co-located with the plugin code.

For background on why plugins use Celery in the first place, see Use Celery Task Queue.

Pytest setup

The pytest configuration is defined in pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests", "plugins", "stable_plugins"]
pythonpath = ["tests"]
addopts = "--import-mode=importlib"
python_files = ["test_*.py"]

testpaths collects tests from the runner-core tests/ directory and from both plugin trees.
pythonpath = ["tests"] puts tests/ on sys.path so shared helpers can be imported as from utils import ….
addopts = "--import-mode=importlib" switches pytest from the default prepend import mode to importlib. This is required because plugin test modules can collide on names (for example several plugins each shipping their own test_routes.py). See Co-locate plugin tests with plugin code for the rationale.
python_files = ["test_*.py"] restricts collection to files starting with test_.

The unit tests depend on pytest and hypothesis, both pulled in by the dev dependency group.

Test File Locations

Test files reside in three valid locations:

tests/ for runner-core tests. Examples: tests/test_db.py, tests/test_entity_marshalling.py, tests/test_plugin_imports.py.
plugins/<name>/ for plugin tests, in either a nested or a flat layout (see below).
stable_plugins/<theme>/<plugin>/ for stable-plugin tests, following the same nested-or-flat convention.

A plugin chooses one of two layouts for its tests:

Nested layout. Test files reside in a dedicated tests/ subdirectory within the plugin package:

plugins/foo/
├── __init__.py
├── routes.py
├── tasks.py
└── tests/
    ├── __init__.py
    └── test_routes.py

Flat layout. Test files reside directly next to the source files, prefixed with test_:

plugins/bar/
├── __init__.py
├── routes.py
├── tasks.py
└── test_routes.py

Pytest discovers both layouts. The nested layout suits plugins with many test files or shared fixtures specific to the plugin. The flat layout suits small plugins where one or two test modules sit comfortably alongside the source. Module-name collisions across plugins (for example two plugins each having test_routes.py) are handled by --import-mode=importlib.

Shared fixtures and helpers

Pytest fixtures are functions that prepare test state (a database row, a temp file, a Flask app) and inject it into tests by parameter name. Pytest runs the setup once per fixture scope (function, module, or session), then tears it down afterwards. They replace the setUp/tearDown boilerplate of class-based test frameworks and let each test declare exactly the dependencies it needs. To use a fixture, name it as a parameter of the test function:

def test_something(task_data):
    assert task_data.task_name == "test-data"

See the pytest fixture guide for the full feature set (scopes, parametrization, yield teardown, autouse, indirect fixtures).

Pytest auto-discovers a repo-root conftest.py that provides shared fixtures usable from any test in any of the three test locations. Tests do not need to import these fixtures explicitly. Pytest injects them by parameter name.

The `task_data` fixture

task_data builds an in-memory SQLite Flask app via create_app(), creates the database schema, and yields a saved ProcessingTask. Use it in any test that needs a real Flask app context plus a ProcessingTask row to operate on:

from qhana_plugin_runner.db.models.tasks import ProcessingTask


def test_task_persists(task_data):
    reloaded = ProcessingTask.get_by_id(task_data.id)
    assert reloaded.task_name == "test-data"

The fixture is function-scoped, so each test gets a fresh database.

The `app` and `client` fixtures

app is a module-scoped Flask application built via create_app() with the same in-memory SQLite configuration as task_data. Plugin discovery runs as part of create_app, so every blueprint declared under PLUGIN_FOLDERS is registered on the returned app. Module scope amortises the startup cost of plugin discovery across the test cases in a file. Use app whenever a test needs the full configured application, an application context, or flask.url_for() without a request context (the test configuration sets SERVER_NAME so url_for can build URLs outside a request).

client is a function-scoped flask.Flask.test_client() bound to app. It is the standard entry point for HTTP-level tests and removes the need for a plugin-local Flask fixture:

from http import HTTPStatus
from flask import url_for


def test_metadata_endpoint(client):
    response = client.get(url_for("data-creator-v0-1-1.PluginsView"))
    assert response.status_code == HTTPStatus.OK

Both fixtures are defined in the repo-root conftest.py and are auto-discovered.

Assertion helpers in `tests/utils.py`

tests/utils.py holds reusable assertions. Import as from utils import …, which works because tests/ is on the python path.

assert_sequence_equals(expected, actual): element-wise equality with index-aware error messages.
assert_sequence_partial_equals(expected, actual, attributes_to_test): checks only the listed attributes of dict or namedtuple elements.

Why fixtures?

The shared fixtures give plugin authors a real Flask app with a real database without booting Redis, Postgres, or Docker. Tests stay fast, deterministic, and run on plain CI runners with no extra services.

Writing tests for the plugin runner

Tests for runner-core code go in tests/ and follow standard pytest patterns. The existing modules are good templates:

tests/test_db.py: minimal fixture usage with the task_data fixture.
tests/test_entity_marshalling.py: exercises CSV/JSON entity round-trips with assert_sequence_equals.
tests/test_plugin_imports.py: validates the plugin import contract enforced on plugin source files.

Use task_data whenever a test needs a Flask app context or DB access.

Writing tests for plugins

To test a plugin:

Place test files at plugins/<name>/tests/test_*.py or plugins/<name>/test_*.py.
Reuse the shared task_data fixture and the helpers from tests/utils.py, with no boilerplate and no duplication.
Plugin source files must use relative imports (this is enforced by tests/test_plugin_imports.py so plugins remain relocatable, see Writing Plugins). Test files are excluded from this check, so plugin tests can use absolute imports.

Test module names can collide across plugins (multiple plugins each having a test_routes.py is fine). --import-mode=importlib handles the disambiguation.

Examples

The stable_plugins/data_synthesis/data_creator/tests/ directory demonstrates the test types described in this guide. It uses the nested layout and relies on the client fixture from the repo-root conftest.py. Each file covers one aspect of the plugin:

stable_plugins/data_synthesis/data_creator/tests/test_datasets.py

Pure unit tests and hypothesis property tests for the numpy-based dataset generators in stable_plugins/data_synthesis/data_creator/backend/datasets.py. Demonstrates pytest.mark.parametrize() for shape checks across every DataTypeEnum member, and @given strategies for invariants (output length, finite values, integer label dtype, label range bounded by centers). No fixtures are required because the generators have no Flask, DB, or Celery dependencies.
stable_plugins/data_synthesis/data_creator/tests/test_schemas.py

Marshmallow schema tests for InputParametersSchema. Covers the round-trip from JSON payload to InputParameters dataclass (including the camelCase rewriting performed by MaBaseSchema), the per-type required-field rules in REQUIRED_FIELDS_BY_TYPE, range validators on num_train_points / noise / turns / centers, and rejection of unknown dataset_type values. Schemas are pure Python, so these tests also run without a Flask app.
stable_plugins/data_synthesis/data_creator/tests/test_routes.py

HTTP-level tests using the shared client fixture. Covers the metadata endpoint (GET /plugins/<id>/), the micro frontend form rendering and default values, and the form’s behavior on invalid input (the route uses validate_errors_as_result=True and re-renders rather than returning a 400). The /process/ endpoint enqueues a Celery task and is therefore covered by Celery-aware tests instead, see Celery Task Testing Strategy.
stable_plugins/data_synthesis/data_creator/tests/test_tasks.py

End-to-end Celery tests for the calculation_task enqueued by /process/. Persists a ProcessingTask the way routes.py does, calls calculation_task.apply_async against the in-memory broker, and asserts on the four output files written by the worker (file names, file_type, mimetype, and JSON payload shape). Also covers the centers parameter for DataTypeEnum.blobs and the KeyError raised when the db_id does not resolve to a row. Uses the broker_app and celery_worker fixtures from the repo-root conftest.py, following the pattern described in Testing Celery tasks.

Testing Celery tasks

Plugins use Celery for long-running work (see Use Celery Task Queue). The recommended testing strategy, set out in Celery Task Testing Strategy, is to run a real Celery worker thread inside the test process against an in-memory broker. This exercises the full apply_async → broker → worker round-trip (including task registration, argument serialization, result serialization, and worker-side error handling) without requiring Redis or Docker.

The fixtures and test config required for this pattern can be found in the repo-root conftest.py and are auto-discovered by pytest. Plugin authors do not need to copy them. Import the plugin’s tasks at module level in the test file so the CELERY singleton picks up the registration when broker_app builds the app.

Configuration

The Flask + Celery test config in conftest.py combines an in-memory SQLite database (with a thread-safe pool) with an in-memory Celery broker:

    "DEFAULT_FILE_STORE": "local_filesystem",
    "FILE_STORE_ROOT_PATH": "files",
    "OPENAPI_VERSION": "3.0.2",
    "OPENAPI_JSON_PATH": "api-spec.json",
    "OPENAPI_URL_PREFIX": "",
    # ``SERVER_NAME`` lets ``flask.url_for`` build URLs without a request
    # context, which the route-level tests in plugin test suites rely on.
    "SERVER_NAME": "localhost.localdomain",
    # StaticPool keeps a single connection alive across threads so the
    # in-memory SQLite database is visible from both the test thread and
    # the worker thread.
    "SQLALCHEMY_ENGINE_OPTIONS": {
        "connect_args": {"check_same_thread": False},
        "poolclass": StaticPool,
    },
    "CELERY": {
        "task_default_queue": "qhana_plugin_runner",
        "broker_url": "memory://",
        "result_backend": "cache+memory://",
        "task_always_eager": False,
        "broker_connection_retry_on_startup": True,
    },
    "PLUGIN_FOLDERS": [
        f for f in dotenv_values(".flaskenv")["PLUGIN_FOLDERS"].split(":") if f
    ],
}


@pytest.fixture(scope="function")
def task_data():
    test_config = {}

Three parts are critical:

SQLALCHEMY_ENGINE_OPTIONS uses StaticPool and check_same_thread=False so the in-memory SQLite database is visible from both the test thread and the worker thread.
The CELERY block uses broker_url = "memory://" and result_backend = "cache+memory://" and keeps task_always_eager = False so calls actually go through the broker.
PLUGIN_FOLDERS allowes to use all plugins, that are registered in the .flaskenv file.

Fixtures

Two module-scoped fixtures in conftest.py set up the app and the worker thread:

    """Flask test client bound to the plugin-runner app."""
    return app.test_client()


@pytest.fixture(scope="module")
def broker_app():
    """App configured with a real Celery broker (in-memory)."""
    test_config = dict(DEFAULT_TEST_CONFIG)
    test_config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///:memory:"
    app = create_app(test_config, silent_log=True)
    with app.app_context():
        create_db_function(app)
        yield app


@pytest.fixture(scope="module")
def celery_worker(broker_app):
    """Start an in-process Celery worker thread for the test module.

    ``broker_app`` is required so the CELERY singleton is reconfigured
    against the memory broker before the worker boots. The fixture
    is module-scoped because spinning the worker up and down per test
    is slow.
    """
    from celery.contrib.testing.worker import start_worker

    with start_worker(  # pyright: ignore[reportGeneralTypeIssues]

broker_app builds the Flask app with the test config and creates the database schema.
celery_worker starts a real in-process Celery worker via celery.contrib.testing.worker.start_worker. pool="solo" keeps the worker single-threaded for simpler debugging. The fixture is module-scoped because spinning the worker up and down per test is slow.

Use both fixtures in every Celery test, either by naming them as parameters or via @pytest.mark.usefixtures("broker_app", "celery_worker") when the test body does not reference them directly.

Example tests

stable_plugins/data_synthesis/data_creator/tests/test_tasks.py demonstrates this pattern for a plugin.

@pytest.mark.usefixtures("broker_app", "celery_worker")
def test_calculation_task_persists_four_files():
    db_id = _enqueue_processing_task(...)
    result = calculation_task.apply_async(kwargs={"db_id": db_id}).get(timeout=30)
    assert result == "Result stored in file"

Errors propagate through the result backend and can be asserted with pytest.raises:

@pytest.mark.usefixtures("broker_app", "celery_worker")
def test_calculation_task_missing_db_id_raises():
    async_result = calculation_task.apply_async(kwargs={"db_id": 99999})
    with pytest.raises(KeyError, match="Could not load task data"):
        async_result.get(timeout=30)

A test that exercises a DB-mutating task must expire the test session before re-reading the row, otherwise SQLAlchemy returns the cached identity-mapped instance from before the worker committed:

@pytest.mark.usefixtures("broker_app", "celery_worker")
def test_reads_worker_mutation():
    db_id = _enqueue_processing_task(...)
    calculation_task.apply_async(kwargs={"db_id": db_id}).get(timeout=30)
    DB.session.expire_all()
    task = ProcessingTask.get_by_id(db_id)
    assert task.outputs  # written by the worker thread

Gotchas

Warning

Use StaticPool with check_same_thread=False for any in-memory SQLite database that the worker thread will touch. Without this, the test thread and the worker thread see different databases.
Call DB.session.expire_all() before re-reading rows that the worker mutated. The test session caches identity-mapped instances and will otherwise return stale state.
Do not use task_always_eager = True. It bypasses the broker, the worker, and the serialization layer, so the code path under test does not match production. Celery Task Testing Strategy explicitly rejects this option.
The in-memory broker does not model Redis-specific behavior (visibility timeouts, persistence, priorities). Tests that depend on those features need a real broker.

Property-based testing with hypothesis

Hypothesis is a property-based testing library. Instead of writing example-driven assertions, you describe the property a function should satisfy and hypothesis generates many inputs to try and falsify it. When it finds a counter-example, it shrinks the input to a minimal failing case before reporting it. This is well-suited for code with structured input domains (entity marshalling, attribute parsers, serialization round-trips), where hand-picking examples tends to miss edge cases.

A round-trip property looks like:

from hypothesis import given, strategies as st


@given(st.dictionaries(st.text(), st.integers()))
def test_roundtrip(data):
    assert deserialize(serialize(data)) == data

Hypothesis is pulled in by the dev dependency group, so no extra setup is needed. poetry run pytest --hypothesis-explain prints the example-shrinking trail when a property fails, which helps when the minimized counter-example is not self-explanatory.

See the hypothesis quickstart and the strategies reference for the full API.

Running tests

The full set of pytest commands is documented in the A Runner for QHAna Plugins. The most-used invocations:

# Run the whole suite
poetry run pytest

# Run a single test
poetry run pytest path/to/test_x.py::test_name

# Re-run only failures from the last run
poetry run pytest --last-failed

# Coverage with a terminal summary and an HTML report under htmlcov/
poetry run pytest -p pytest_cov --cov=qhana_plugin_runner --cov-report=html --cov-report=term

Continuous integration

Unit tests run on every push to main and on every pull request via .github/workflows/pytest.yml. The job sets up Python 3.10, installs dependencies with poetry install --no-interaction --with dev, runs poetry run pytest --cov=qhana_plugin_runner --cov-report=html --cov-report=term, and uploads the HTML coverage report as a build artifact. No external services are started. The in-memory SQLite database and the in-memory Celery broker keep the suite self-contained.

A separate workflow at .github/workflows/integration-tests.yml runs the full QHAna integration suite (UST-QuAntiL/qhana-integration-tests) on a weekly schedule and on manual dispatch. That workflow exercises the runner against a real broker, registry, backend, and UI. It is out of scope for this guide.