PECANS Ensemble Runners

PECANS includes the capability to run an ensemble of models, all based on an initial configuration, but with changes made for each ensemble member. This is done through the pecans.ensembles.api module, documented below. You will need to import this API and write a script using it; there is no command line interface for it.

Tutorial

To begin import the API in your script and create and instance of the EnsembleRunner:

from pecans.ensembles.api import EnsembleRunner

runner = EnsembleRunner(
    base_config_file = 'examples/one_box_ideal/one_box_ideal.toml',
    ensemble_mode='iterations',
    root_output_dir='test_output'
)

This has created a runner that will use the “one_box_ideal” example as the baseline configuration and which will output its data under the test_output directory. (Note, this assumes we will run this script from the root of the PECANS repo, so these paths are relative to that.) We’ll come back to the ensemble mode in a bit.

Next we need to define what options should change for each member of the ensemble. Let’s say that we wanted to test the effect of different lifetimes and initial concentration on the output. To do that, we would do:

runner.add_ens_var_by_string('CHEMISTRY/mechanism_opts/lifetime_seconds', [3600, 7200, 10800])
runner.add_ens_var_by_string('CHEMISTRY/initial_cond/0/concentration', [2, 20, 200])

The first argument to this method is the path to the option we want to vary. The CHEMISTRY section of our config file is:

[CHEMISTRY]
do_chemistry = true
mechanism = "ideal_first_order"
mechanism_opts = {lifetime_seconds = 3600}

[[CHEMISTRY.initial_cond]]
specie = "A"
initial_type = "point"
center_x = 500
concentration = 1

so the first option, 'CHEMISTRY/mechanism_opts/lifetime_seconds' contains the keys for each of the dictionaries we need to access separated by slashes: “CHEMISTRY” in the top dictionary, “mechanism_opts” in the CHEMISTRY dict, and “lifetime_seconds” in the mechanism_opts dict. Likewise, the second option, 'CHEMISTRY/initial_cond/0/concentration' has keys for the first, second, and fourth parts of the path , but the initial_cond value is a list, so the third index is the numeric list index 0.

The second argument to the add_ens_var_by_string function is the values that the option should have in each of the ensemble members. This is where the ensemble_mode argument for the EnsembleRunner comes it. It can have two values:

  • 'iterations' means that each option modified must have the same number of values, \(n\). The ensemble will have \(n\) members, and for each member \(i\) (where \(0 \leq i < n\)), the modified options will have the value at index \(i\).

  • 'combinations' means that each option modified can have any number of values, and the ensemble will consist of all possible combinations of those values.

In our case, we chose 'iterations', so our ensemble will have three members:

Member

Lifetime

Init. concentration

1

3600

2

2

7200

20

3

10,800

200

If instead we had chosen 'combinations', our ensemble would have nine members:

Member

Lifetime

Init. concentration

1

3600

2

2

3600

20

3

3600

200

4

7200

2

5

7200

20

6

7200

200

7

10,800

2

8

10,800

20

9

10,800

200

For each of these members, the ensemble runner will create a new directory in test_output to write to. By default, these directories will have the name pecans_ens_member_INDEX, with INDEX being the ensemble member number (starting from 0). You can change this - see the member_naming_fxn argument of EnsembleRunner. Each of those directories will have all the output files from its respective ensemble member’s run.

There’s one last step, we have to execute the ensemble by calling its run method. Put all together, this example is:

from pecans.ensembles.api import EnsembleRunner

runner = EnsembleRunner(
    base_config_file = 'examples/one_box_ideal/one_box_ideal.toml',
    ensemble_mode='iterations',
    root_output_dir='test_output'
)

runner.add_ens_var_by_string('CHEMISTRY/mechanism_opts/lifetime_seconds', [3600, 7200, 10800])
runner.add_ens_var_by_string('CHEMISTRY/initial_cond/0/concentration', [2, 20, 200])

runner.run()

Ensemble API functions

exception pecans.ensembles.api.EnsembleError[source]

Error type used for problems in setting up an ensemble run

class pecans.ensembles.api.EnsembleRunner(base_config_file: str, ensemble_variables: dict | None = None, ensemble_mode: str = 'iterations', save_in_individual_dirs: bool = True, save_final_output_only: bool = False, member_naming_fxn=<function _make_member_name>, root_output_dir: str | None = '.')[source]

Manager class to help run an ensemble of PECANS instances.

Parameters:
  • base_config_file

    the path to the configuration file to use as a starting point for the ensemble.

    Note

    The output_path argument in the configuration is ignored; the output path is set by the root_output_dir, member_naming_fxn, and save_in_individual_dirs options of this class.

  • ensemble_variables

    the configuration variables to modify for the different ensemble members. This must be a dictionary where the keys are the options given as strings and the values are lists of the values that each ensemble member will have (see the tutorial for details). The keys will have the form "key1/key2/key3" and map to the configuration as config[key1][key2][key3]. For list configuration elements, give the index as part of the key (e.g. "CHEMISTRY/initial_cond/0/concentration") and it will be automatically converted to an integer if needed.

    Alternatively, you can construct the ensemble runner without this argument and add values later using the add_ens_var_by_string() method.

  • ensemble_mode

    this determines how the ensemble_variables are varied. Possible options are:

    • 'iterations' - each ensemble variable has its values specified in an iterable (i.e. list or numpy array); the nth ensemble member will use the nth value for each variable.

    • 'combinations' - each ensemble variable has its possible values specified in an iterable, but unlike 'iterations', all possible combinations are tested.

    If running in 'iterations' mode, then the length of the value lists in ensemble_variables must all be equal.

    See the tutorial for detailed examples.

  • root_output_dir – the root directory to place the ensemble output in. Default is the current directory.

  • member_naming_fxn

    a function that accepts the ensemble member index as an integer and the member’s options as keyword arguments and returns a string that should be a unique name for that ensemble member. The default will return "pecans_ens_member_N" where N is the ensemble member index. This name will be used for the member’s output directory name (if save_in_individual_dirs is True) and, with “.nc” appended, the output file name if save_final_output_only is True).

    You can use this to set the output names to something that incorporates the varied ensemble variables into the file/directory names. For example, if dx and dy are being varied, you could do:

    def custom_name(member_index, **config_opts):
        dx = config_opts['DOMAIN/dx'] / 1000  # convert to kilometers
        dy = config_opts['DOMAIN/dy'] / 1000
    
        return 'pecans_ens_dx-{}km_dy-{}km'.format(dx, dy)
    
    ensemble = EnsembleRunner( ... , member_naming_fxn=custom_name)
    

    This would put the dx and dy values (converted to kilometers) into the file or directory names.

  • save_in_individual_dirs – optional, default is True. This creates separate directories for the output of each ensemble member, named using the member_naming_fxn. Each member’s output is saved in the corresponding directory.

  • save_final_output_only – optional, default is False, meaning that each ensemble member will save output at the output frequency defined in their configuration. If True, only the final state of the model will be saved, and it will be named by the member_naming_fxn plus the ‘.nc’ extension. If this is False, save_in_individual_dirs will automatically be set to True; save_in_individual_dirs is not already True, a warning is issued.

add_ens_var_by_string(ensemble_variable: str, values: Sequence[Any])[source]

Add an option that should be varied among the different ensemble members.

Parameters:
  • ensemble_variable – The option to vary as a string, must follow the same format as the keys for the ensemble_variables argument of the class constructor.

  • values – values describing how that option should be varied among the different ensemble members. The required form varies depending on the ensemble_mode option set during initialization. See the class documentation for specifics.

Returns:

none

run()[source]

Carry out all the ensemble simulations.

Returns:

none

run_one_member(member_index, config_opts)[source]

Run a single member of the ensemble.

Parameters:
  • member_index (int) – a unique index identifying which member of the ensemble this is.

  • config_opts

    dictionary where the keyword specifies the option to change from the base configuration. For example:

    run_one_member(0, {'DOMAIN/dx':1000, 'DOMAIN/dy':2000})
    

    will run a member of the ensemble (#0) with the dx value set to 1000 m and dy to 2000 m.

Returns:

none

This method is usually called internally by the run method, which automatically iterates over all ensemble members, but in some cases you may want more control over how the different members are run, while still taking advantage of this method’s built-in capability to only modify a few of the configuration options and redirect model output to a member-specific file or directory.