PECANS Ensemble Runners
PECANS includes the capability to run an ensemble of models, all based on an initial configuration,
but with changes made for each ensemble member. This is done through the pecans.ensembles.api
module, documented below. You will need to import this API and write a script using it; there is no
command line interface for it.
Tutorial
To begin import the API in your script and create and instance of the EnsembleRunner:
from pecans.ensembles.api import EnsembleRunner
runner = EnsembleRunner(
base_config_file = 'examples/one_box_ideal/one_box_ideal.toml',
ensemble_mode='iterations',
root_output_dir='test_output'
)
This has created a runner that will use the “one_box_ideal” example as the baseline configuration and which will
output its data under the test_output directory. (Note, this assumes we will run this script from the root
of the PECANS repo, so these paths are relative to that.) We’ll come back to the ensemble mode in a bit.
Next we need to define what options should change for each member of the ensemble. Let’s say that we wanted to test the effect of different lifetimes and initial concentration on the output. To do that, we would do:
runner.add_ens_var_by_string('CHEMISTRY/mechanism_opts/lifetime_seconds', [3600, 7200, 10800])
runner.add_ens_var_by_string('CHEMISTRY/initial_cond/0/concentration', [2, 20, 200])
The first argument to this method is the path to the option we want to vary. The CHEMISTRY section of our
config file is:
[CHEMISTRY]
do_chemistry = true
mechanism = "ideal_first_order"
mechanism_opts = {lifetime_seconds = 3600}
[[CHEMISTRY.initial_cond]]
specie = "A"
initial_type = "point"
center_x = 500
concentration = 1
so the first option, 'CHEMISTRY/mechanism_opts/lifetime_seconds' contains the keys for each of the dictionaries we need
to access separated by slashes: “CHEMISTRY” in the top dictionary, “mechanism_opts” in the CHEMISTRY dict, and
“lifetime_seconds” in the mechanism_opts dict. Likewise, the second option, 'CHEMISTRY/initial_cond/0/concentration'
has keys for the first, second, and fourth parts of the path , but the initial_cond value is a list, so the third index is
the numeric list index 0.
The second argument to the add_ens_var_by_string function is the values that the option should have in each of the ensemble members.
This is where the ensemble_mode argument for the EnsembleRunner comes it. It can have two values:
'iterations'means that each option modified must have the same number of values, \(n\). The ensemble will have \(n\) members, and for each member \(i\) (where \(0 \leq i < n\)), the modified options will have the value at index \(i\).'combinations'means that each option modified can have any number of values, and the ensemble will consist of all possible combinations of those values.
In our case, we chose 'iterations', so our ensemble will have three members:
Member |
Lifetime |
Init. concentration |
|---|---|---|
1 |
3600 |
2 |
2 |
7200 |
20 |
3 |
10,800 |
200 |
If instead we had chosen 'combinations', our ensemble would have nine members:
Member |
Lifetime |
Init. concentration |
|---|---|---|
1 |
3600 |
2 |
2 |
3600 |
20 |
3 |
3600 |
200 |
4 |
7200 |
2 |
5 |
7200 |
20 |
6 |
7200 |
200 |
7 |
10,800 |
2 |
8 |
10,800 |
20 |
9 |
10,800 |
200 |
For each of these members, the ensemble runner will create a new directory in test_output to write to.
By default, these directories will have the name pecans_ens_member_INDEX, with INDEX being the ensemble member
number (starting from 0). You can change this - see the member_naming_fxn argument of EnsembleRunner.
Each of those directories will have all the output files from its respective ensemble member’s run.
There’s one last step, we have to execute the ensemble by calling its run method. Put all together, this example is:
from pecans.ensembles.api import EnsembleRunner
runner = EnsembleRunner(
base_config_file = 'examples/one_box_ideal/one_box_ideal.toml',
ensemble_mode='iterations',
root_output_dir='test_output'
)
runner.add_ens_var_by_string('CHEMISTRY/mechanism_opts/lifetime_seconds', [3600, 7200, 10800])
runner.add_ens_var_by_string('CHEMISTRY/initial_cond/0/concentration', [2, 20, 200])
runner.run()
Ensemble API functions
- exception pecans.ensembles.api.EnsembleError[source]
Error type used for problems in setting up an ensemble run
- class pecans.ensembles.api.EnsembleRunner(base_config_file: str, ensemble_variables: dict | None = None, ensemble_mode: str = 'iterations', save_in_individual_dirs: bool = True, save_final_output_only: bool = False, member_naming_fxn=<function _make_member_name>, root_output_dir: str | None = '.')[source]
Manager class to help run an ensemble of PECANS instances.
- Parameters:
base_config_file –
the path to the configuration file to use as a starting point for the ensemble.
Note
The
output_pathargument in the configuration is ignored; the output path is set by theroot_output_dir,member_naming_fxn, andsave_in_individual_dirsoptions of this class.ensemble_variables –
the configuration variables to modify for the different ensemble members. This must be a dictionary where the keys are the options given as strings and the values are lists of the values that each ensemble member will have (see the tutorial for details). The keys will have the form
"key1/key2/key3"and map to the configuration asconfig[key1][key2][key3]. For list configuration elements, give the index as part of the key (e.g."CHEMISTRY/initial_cond/0/concentration") and it will be automatically converted to an integer if needed.Alternatively, you can construct the ensemble runner without this argument and add values later using the
add_ens_var_by_string()method.ensemble_mode –
this determines how the ensemble_variables are varied. Possible options are:
'iterations'- each ensemble variable has its values specified in an iterable (i.e. list or numpy array); the nth ensemble member will use the nth value for each variable.'combinations'- each ensemble variable has its possible values specified in an iterable, but unlike'iterations', all possible combinations are tested.
If running in
'iterations'mode, then the length of the value lists inensemble_variablesmust all be equal.See the tutorial for detailed examples.
root_output_dir – the root directory to place the ensemble output in. Default is the current directory.
member_naming_fxn –
a function that accepts the ensemble member index as an integer and the member’s options as keyword arguments and returns a string that should be a unique name for that ensemble member. The default will return
"pecans_ens_member_N"where N is the ensemble member index. This name will be used for the member’s output directory name (ifsave_in_individual_dirsisTrue) and, with “.nc” appended, the output file name ifsave_final_output_onlyis True).You can use this to set the output names to something that incorporates the varied ensemble variables into the file/directory names. For example, if dx and dy are being varied, you could do:
def custom_name(member_index, **config_opts): dx = config_opts['DOMAIN/dx'] / 1000 # convert to kilometers dy = config_opts['DOMAIN/dy'] / 1000 return 'pecans_ens_dx-{}km_dy-{}km'.format(dx, dy) ensemble = EnsembleRunner( ... , member_naming_fxn=custom_name)
This would put the dx and dy values (converted to kilometers) into the file or directory names.
save_in_individual_dirs – optional, default is
True. This creates separate directories for the output of each ensemble member, named using the member_naming_fxn. Each member’s output is saved in the corresponding directory.save_final_output_only – optional, default is
False, meaning that each ensemble member will save output at the output frequency defined in their configuration. IfTrue, only the final state of the model will be saved, and it will be named by the member_naming_fxn plus the ‘.nc’ extension. If this isFalse, save_in_individual_dirs will automatically be set toTrue; save_in_individual_dirs is not alreadyTrue, a warning is issued.
- add_ens_var_by_string(ensemble_variable: str, values: Sequence[Any])[source]
Add an option that should be varied among the different ensemble members.
- Parameters:
ensemble_variable – The option to vary as a string, must follow the same format as the keys for the
ensemble_variablesargument of the class constructor.values – values describing how that option should be varied among the different ensemble members. The required form varies depending on the ensemble_mode option set during initialization. See the class documentation for specifics.
- Returns:
none
- run_one_member(member_index, config_opts)[source]
Run a single member of the ensemble.
- Parameters:
member_index (int) – a unique index identifying which member of the ensemble this is.
config_opts –
dictionary where the keyword specifies the option to change from the base configuration. For example:
run_one_member(0, {'DOMAIN/dx':1000, 'DOMAIN/dy':2000})
will run a member of the ensemble (#0) with the dx value set to 1000 m and dy to 2000 m.
- Returns:
none
This method is usually called internally by the
runmethod, which automatically iterates over all ensemble members, but in some cases you may want more control over how the different members are run, while still taking advantage of this method’s built-in capability to only modify a few of the configuration options and redirect model output to a member-specific file or directory.