Parallel Pipeline configuration#
QIIME 2 provides formal support for parallel computing of Pipelines through Parsl.
Parsl configuration#
A Parsl configuration tells Parsl what resources are available and how to use them, and is required to use Parsl. The Parsl documentation provides full detail on Parsl configuration.
In the context of QIIME 2, Parsl configuration information is maintained in a QIIME 2 configuration file. QIIME 2 configuration files are stored on disk in TOML files.
Default Parsl configuration#
For basic multi-processor usage, QIIME 2 writes a default configuration file the first time it’s needed (e.g., if you instruct QIIME 2 to execute in parallel without a particular configuration).
The default qiime2_config.toml
file, as of QIIME 2 2024.10, looks like the following:
[parsl]
strategy = "None"
[[parsl.executors]]
class = "ThreadPoolExecutor"
label = "tpool"
max_threads = ...
[[parsl.executors]]
class = "HighThroughputExecutor"
label = "default"
max_workers = ...
[parsl.executors.provider]
class = "LocalProvider"
When this file is written to disk, the max_threads
and max_workers
values (represented above by ...
) are computed by QIIME 2 as one less than the CPU count on the computer where it is running (max(psutil.cpu_count() - 1, 1)
).
This configuration defines two Executors
.
The
ThreadPoolExecutor
that parallelizes jobs across multiple threads in a process.The
HighThroughputExecutor
that parallelizes jobs across multiple processes.
In this case, the HighThroughputExecutor
is designated as the default by nature of it’s label
value being default
.
Your parsl configuration must define an executor with the label default
, and this is the executor that QIIME 2 will use to dispatch your jobs to if you do not specify an alternative.
The parsl.Config object
This parsl configuration is ultimately read into a parsl.Config
object internally in QIIME 2.
The parsl.Config
object that corresponds to the above example would look like the following:
config = parsl.Config(
executors=[
ThreadPoolExecutor(
label='tpool',
max_threads=... # will be an integer value
),
HighThroughputExecutor(
label='default',
max_workers=..., # will be an integer value
provider=LocalProvider()
)
],
strategy=None
)
Parsl configuration, line-by-line#
This first line of the default configuration file presented above indicates that this is the parsl section (or table, to use TOML’s terminology) of our configuration file.
[parsl]
The next line:
strategy = "None"
is a top-level Parsl configuration parameter that you can read more about in the Parsl documentation. This may need to be set differently depending on your system.
Next, the first executor is added.
[[parsl.executors]]
class = "ThreadPoolExecutor"
label = "tpool"
max_threads = 7
The double square brackets ([[ ... ]]
) indicates that this is an array, executors
, that is nested under the parsl
table.
class
indicates the specific parsl class that is being configured (parsl.executors.ThreadPoolExecutor
in this case); label
provides a label that you can use to refer to this executor elsewhere; and max_threads
is a configuration value for the ThreadPoolExecutor class which corresponds to a parameter name for the class’s constructor.
In this example a value of 7 is specified for max_threads
, but as noted above this will be computed specifically for your machine when this file is created.
Parsl’s ThreadPoolExecutor
runs on a single node, so we provide a second executor which can utilize up to 2000 nodes.
[[parsl.executors]]
class = "HighThroughputExecutor"
label = "default"
max_workers = 7
[parsl.executors.provider]
class = "LocalProvider"
The definition of this executor, parsl.executors.HighThroughputExecutor
, looks similar to the definition of the ThreadPoolExecutor
, but it additionally defines a provider
.
The provider class provides access to computational resources.
In this case, we use parsl.providers.LocalProvider
, which provides access to local resources (i.e., on the laptop or workstation).
Other providers are available as well, including for Slurm, Amazon Web Services, Kubernetes, and more.
Mapping Actions to executors#
An executor mapping can be added to your parsl configuration that defines which actions should run on which executors. If an action is unmapped, it will run on the default executor. This can be specified as follows:
[parsl.executor_mapping]
action_name = "tpool"
Warning
The mechanism for specifying action names at present does not handle the case of different plugins defining actions with the same name. This mechanism will likely change soon, and may be a breaking change. You can track progress on this here.
Viewing the current configuration#
Using q2cli, you can see your current qiime2_config.toml
file by running:
qiime info --config-level 2
QIIME 2 configuration file precedence#
When QIIME 2 needs configuration information, the following precedence order is followed to load a configuration file:
The path specified in the environment variable
$QIIME2_CONFIG
.The file at
<user_config_dir>/qiime2/qiime2_config.toml
The file at
<site_config_dir>/qiime2/qiime2_config.toml
The file at
$CONDA_PREFIX/etc/qiime2_config.toml
If no configuration is found after checking those four locations, QIIME 2 writes a default configuration file to $CONDA_PREFIX/etc/qiime2_config.toml
and uses that.
This implies that after your first time running QIIME 2 in parallel without a config in at least one of the first 3 locations, the path referenced in step 4 will exist and contain a configuration file.
Alternatively, when using q2cli, you can provide a specific configuration for use in configuring parsl using the --parallel-config
option.
If provided, this overrides the priority order above.
Similarly, when using the Python 3 API, you can provide a specific configuration by passing a parsl.Config
object into your ParallelConfig
context manager.
user_config_dir and site_config_dir
On Linux, user_config_dir
will usually be $HOME/.config/qiime2/
.
On macOS, it will usually be $HOME/Library/Application Support/qiime2/
.
You can get find the directory used on your system by running the following command:
python -c "import appdirs; print(appdirs.user_config_dir('qiime2'))"
On Linux site_config_dir
will usually be something like /etc/xdg/qiime2/
, but it may vary based on Linux distribution.
On macOS it will usually be /Library/Application Support/qiime2/
.
You can get find the directory used on your system by running the following command:
python -c "import appdirs; print(appdirs.site_config_dir('qiime2'))"