Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Using an Artifact Cache

An artifact cache allows users to have finer control over where and how QIIME 2 Results are stored on disk.

Artifact caches serve two primary purposes:

  1. Providing the user with control over where QIIME 2 stores its working (temporary) files.

  2. Avoiding the overhead of unzipping QIIME 2 Results every time they’re used.

Users can create and interact with artifact caches via both the Python API and the CLI. This tutorial will provide instructions for both.

An artifact cache is created in a specific location on your file system. After an artifact cache is created, it can be used to store Results as unzipped directories, as opposed to as .qza or .qzv files. Results in a cache are referred to by the path to the cache followed by a user-defined key.

A use case for the artifact cache

Consider a use case where you have a very large artifact, say an 80 gigabyte reference sequence database, and you are regularly using this database as an input to QIIME 2 Actions on a cluster. It would be ideal to avoid constantly unzipping this large database into and out of a .qza file. It may also be ideal to store this artifact in a location that all users and all worker nodes on the cluster have access to, to avoid multiple copies of this file being stored on the system (e.g., under different users’ home directories).

These issues can be resolved by putting the reference data artifact in a cache in a location on the cluster’s file system that is globally accessible by the users and worker nodes. By nature of being stored in a cache, the artifact will be stored unzipped, so the action will not need to unzip the artifact before using it. As long as the cache is stored in a location that all worker nodes can access, it will not need to be moved around the filesystem before the action can execute.

Tutorial

The following steps will illustrate how to create a new cache, add an artifact to it, use an artifact from it, and more. Note that the steps in the tutorial may require that some or all of the preceding steps have been run.

First, have QIIME 2 generate some data that we can use.

qiime dwq2 search-and-summarize --example-data ss-usage

Then, change to the directory containing the data.

cd ss-usage/Serial

Creating a cache

Now, let’s create a new artifact cache in the current working directory:

Command line interface
Python 3 API
qiime tools cache-create --cache my-cache

Loading entries in a cache

Next, let’s simulate the example described above where you have a reference data set that you’d like to store in the cache, so it doesn’t need to be unzipped every time it’s used.

This will store an artifact in the specified cache (my-cache) with the specified key (my-reference).

Command line interface
Python 3 API
qiime tools cache-store \
   --cache my-cache \
   --artifact-path ./reference-seqs.qza \
   --key my-reference

Reviewing entries in a cache

You can confirm that the Artifact was added to the cache as follows.

Command line interface
Python 3 API
qiime tools cache-status --cache my-cache

Using caches with Actions

Now, you can reference the artifact from the cache when calling Actions. Artifacts in a cache are referenced as path-to-cache:key. So, as long as you’re working in the same directory as where your cache is stored, the following command should run. (This command does take a couple of minutes to run. If you want it to go faster, you can run it in parallel](#parallel-tutorial).)

Command line interface
Python 3 API
qiime dwq2 search-and-summarize \
    --i-query-seqs query-seqs.qza \
    --i-reference-seqs my-cache:my-reference \
    --m-reference-metadata-file reference-metadata.tsv \
    --p-split-size 1 \
    --o-hits hits.qza \
    --o-hits-table hits-table.qzv

Remove a Result from the cache

If there’s an item in your cache that you no longer need, you can remove. We can remove our reference data.

Command line interface
Python 3 API
qiime tools cache-remove \
   --cache my-cache \
   --key my-reference

After removing the artifact, check the status of your cache to confirm that it was removed.

There are some other command line tools and APIs accessible to help you interact with your cache(s). You can learn about these as follows.

Command line interface
Python 3 API
qiime tools --help

See the tools that begin with cache-.

Remove a cache

If you no longer need your cache, you can just remove the directory from disk as follows.

rm -r my-cache