QIIME 2 provides formal support for parallel computing of Pipelines through Parsl.[1]
This allows for faster execution of QIIME 2 Pipelines, assuming the compute resources are available, by ensuring that pipeline steps that can run simultaneously do run simultaneously.
Parallel Pipeline execution is accessible in different ways depending on which interface you’re using.
Here we illustrate how to run Pipelines in parallel using [q2cli and {term}QIIME 2’s Python 3 API](xref:rachis-news-target#term-python 3 API).
q2cli¶
Review the help text for a QIIME 2 Pipeline.
Pay special attention to the usage examples at the bottom of the help text.
qiime dwq2 search-and-summarize --helpHave QIIME 2 generate example data that can be used to run the usage example.
qiime dwq2 search-and-summarize --example-data ss-usageThis will create a new directory for search-and-summarize usage example data.
Change into that new directory by running:
cd ss-usage/SerialRun the usage example serially first.
Note that in the following commands the output filenames are adapted from the usage example to prepend serial- to each file name.
qiime dwq2 search-and-summarize \
--i-query-seqs query-seqs.qza \
--i-reference-seqs reference-seqs.qza \
--m-reference-metadata-file reference-metadata.tsv \
--p-split-size 1 \
--o-hits serial-hits.qza \
--o-hits-table serial-hits-table.qzvTo re-run this Pipeline in parallel, append the --parallel flag.
This will run this command in parallel using a default parallel configuration (learn more about this in Parallel Pipeline configuration).
Note that the output filenames this time are adapted to prepend parallel- to each file name.
qiime dwq2 search-and-summarize \
--i-query-seqs query-seqs.qza \
--i-reference-seqs reference-seqs.qza \
--m-reference-metadata-file reference-metadata.tsv \
--p-split-size 1 \
--o-hits parallel-hits.qza \
--o-hits-table parallel-hits-table.qzv \
--parallelIf you’re using a system with parallel computing capabilities (e.g., at least six cores) the parallel execution of this command should have run faster than the serial execution.
Python 3 API¶
Parallel Pipeline execution through the Python API is done using a ParallelConfig object as a context manager.
These objects take a parsl.Config object and an optional dictionary mapping action names to executor names as input.
If no config is provided your default configuration will be used (see QIIME 2 configuration file precedence).
from qiime2.sdk.parallel_config import ParallelConfig
from qiime2.plugins.dwq2.pipelines import search_and_summarize
from qiime2 import Artifact, Metadata
query_seqs = Artifact.load('query-seqs.qza')
reference_seqs = Artifact.load('reference-seqs.qza')
reference_metadata = Metadata.load('reference-metadata.tsv')
with ParallelConfig():
future = search_and_summarize.parallel(query_seqs=query_seqs,
reference_seqs=reference_seqs,
reference_metadata=reference_metadata,
split_size=1)
# call future._result() inside of the context manager
result = future._result()To use a specific configuration, you can create it directly, or load one from file. For example:
from qiime2.sdk.parallel_config import ParallelConfig, get_config_from_file
from qiime2.plugins.dwq2.pipelines import search_and_summarize
from qiime2 import Artifact, Metadata
query_seqs = Artifact.load('query-seqs.qza')
reference_seqs = Artifact.load('reference-seqs.qza')
reference_metadata = Metadata.load('reference-metadata.tsv')
path_to_config_file = # set this to the path to the file you'd like to load
c, m = get_config_from_file(path_to_config_file)
with ParallelConfig(parallel_config=c, action_executor_mapping=m):
future = search_and_summarize.parallel(query_seqs=query_seqs,
reference_seqs=reference_seqs,
reference_metadata=reference_metadata,
split_size=1)
# call future._result() inside of the context manager
result = future._result()Parsl configuration¶
To learn how to configure Parsl for your own usage, refer to Parallel Pipeline configuration.
QIIME 2 Actions can provide formal (i.e., Parsl-based) or informal (e.g., multi-threaded execution of a third party program) parallel computing support. To learn more about the distinction, see Types of parallel computing support.