Already Implemented Benchmarks¶
SnAr Benchmark¶
-
class
summit.benchmarks.
SnarBenchmark
(noise_level=0, **kwargs)[source]¶ Benchmark representing a nucleophilic aromatic substitution (SnAr) reaction
The SnAr reactions occurs in a plug flow reactor where residence time, stoichiometry and temperature can be adjusted. Maximizing Space time yield (STY) and minimising E-factor are the objectives.
- Parameters
noise_level (float, optional) – The mean of the random noise added to the concentration measurements in terms of percent of the signal. Default is 0.
Examples
>>> b = SnarBenchmark() >>> columns = [v.name for v in b.domain.variables] >>> values = [v.bounds[0]+0.1*(v.bounds[1]-v.bounds[0]) for v in b.domain.variables] >>> values = np.array(values) >>> values = np.atleast_2d(values) >>> conditions = DataSet(values, columns=columns) >>> results = b.run_experiments(conditions)
Notes
This benchmark relies on the kinetics observerd by [Hone] et al. The mechanistic model is integrated using scipy to find outlet concentrations of all species. These concentrations are then used to calculate STY and E-factor.
References
- Hone
C. A. Hone et al., React. Chem. Eng., 2017, 2, 103–108. DOI: 10.1039/C6RE00109B
-
property
data
¶ Datast of all experiments run
-
property
domain
¶ The domain for the experiment
-
pareto_plot
(objectives=None, colorbar=False, ax=None)¶ Make a 2D pareto plot of the experiments thus far
- Parameters
objectives (array-like, optional) – List of names of objectives to plot. By default picks the first two objectives
ax (matplotlib.pyplot.axes, optional) – An existing axis to apply the plot to
- Returns
if ax is None returns a tuple with the first component
as the a new figure and the second component the axis
if ax is a matplotlib axis, returns only the axis
- Raises
ValueError – If the number of objectives is not equal to two
-
reset
()¶ Reset the experiment
This will clear all data.
-
run_experiments
(conditions, computation_time=None, **kwargs)¶ Run the experiment(s) at the given conditions
- Parameters
conditions (summit.utils.dataset.Dataset) – A dataset with columns matching the variables in the domain of a experiment(s) to run.
computation_time (float, optional) – The time used by the strategy in calculating the next experiments. By default, the time since the last call to run_experiment is used.
Cross-Coupling Emulator Benchmarks¶
-
summit.benchmarks.
get_pretrained_reizman_suzuki_emulator
(case=1)[source]¶ Get the pretrained Reziman Suzuki Emulator
- Parameters
case (int, optional, default=1) – Reizman et al. (2016) reported experimental data for 4 different cases. Each case was has a different set of substrates but the same possible catalysts. Please see their paper for more information on the cases.
Examples
>>> import matplotlib.pyplot as plt >>> from summit.benchmarks import get_pretrained_reizman_suzuki_emulator >>> from summit.utils.dataset import DataSet >>> import pandas as pd >>> b = get_pretrained_reizman_suzuki_emulator(case=1) >>> fig, ax = b.parity_plot(include_test=True) >>> plt.show() >>> columns = [v.name for v in b.domain.variables] >>> values = { "catalyst": ["P1-L3"], "t_res": [600], "temperature": [30],"catalyst_loading": [0.498],} >>> conditions = pd.DataFrame(values) >>> conditions = DataSet.from_df(conditions) >>> results = b.run_experiments(conditions, return_std=True)
-
class
summit.benchmarks.
ReizmanSuzukiEmulator
(case=1, **kwargs)[source]¶ Reizman Suzuki Emulator
Virtual experiments representing the Suzuki-Miyaura Cross-Coupling reaction similar to Reizman et al. (2016). Experimental outcomes are based on an emulator that is trained on the experimental data published by Reizman et al.
You should use get_pretrained_reizman_suzuki_emulator to get a pretrained verison.
- Parameters
case (int, optional, default=1) – Reizman et al. (2016) reported experimental data for 4 different cases. Each case was has a different set of substrates but the same possible catalysts. Please see their paper for more information on the cases.
Examples
>>> reizman_emulator = ReizmanSuzukiEmulator(case=1)
Notes
This benchmark is based on data from [Reizman] et al.
References
- Reizman
B. J. Reizman et al., React. Chem. Eng., 2016, 1, 658–666. DOI: 10.1039/C6RE00153J.
-
property
data
¶ Datast of all experiments run
-
property
domain
¶ The domain for the experiment
-
classmethod
from_dict
(d, **kwargs)¶ Create ExperimentalEmulator from a dictionary
Notes
This does not load the regressor weights and biases. After calling from_dict, call load_regressor to load the weights and biases.
-
classmethod
load
(save_dir, case=1, **kwargs)[source]¶ Load all the essential parameters of the ExperimentalEmulator from disk
- Parameters
save_dir (str or pathlib.Path) – The directory from which to load emulator files.
Notes
This loads the parameters needed to reproduce results but not the associated data. You can separately load X_test, y_test, X_train, and y_train attributes if you want to be able to reproduce splits, test results and parity plots.
Examples
>>> from summit import * >>> import pkg_resources, pathlib >>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data")) >>> model_name = f"reizman_suzuki_case_1" >>> domain = ReizmanSuzukiEmulator.setup_domain() >>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv") >>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor) >>> res = exp.train(max_epochs=10) >>> exp.save("reizman_test") >>> #Load data for new experimental emulator >>> exp_new = ExperimentalEmulator.load(model_name, "reizman_test") >>> exp_new.X_train, exp_new.y_train, exp_new.X_test, exp_new.y_test = exp.X_train, exp.y_train, exp.X_test, exp.y_test >>> res = exp_new.test() >>> fig, ax = exp_new.parity_plot(include_test=True)
-
load_regressor
(save_dir)¶ Load the weights and biases of the regressor from disk
- Parameters
save_dir (str or pathlib.Path) – The directory used for saving emulator files.
-
pareto_plot
(objectives=None, colorbar=False, ax=None)¶ Make a 2D pareto plot of the experiments thus far
- Parameters
objectives (array-like, optional) – List of names of objectives to plot. By default picks the first two objectives
ax (matplotlib.pyplot.axes, optional) – An existing axis to apply the plot to
- Returns
if ax is None returns a tuple with the first component
as the a new figure and the second component the axis
if ax is a matplotlib axis, returns only the axis
- Raises
ValueError – If the number of objectives is not equal to two
-
parity_plot
(**kwargs)¶ Produce a parity plot based for the trained model using matplotlib
- Parameters
output_variable_names (str or list, optional) – The output variables to plot. Defaults to all.
include_test (bool, optional) – Include the performance of the model on the test set. Defaults to False.
train_color (str, optional) – Hex string for the train points. Defaults to “#6f3666”
test_color (str, optional) – Hex string for the train points. Defaults to “#3c328c”
-
reset
()¶ Reset the experiment
This will clear all data.
-
run_experiments
(conditions, computation_time=None, **kwargs)¶ Run the experiment(s) at the given conditions
- Parameters
conditions (summit.utils.dataset.Dataset) – A dataset with columns matching the variables in the domain of a experiment(s) to run.
computation_time (float, optional) – The time used by the strategy in calculating the next experiments. By default, the time since the last call to run_experiment is used.
-
save
(save_dir)¶ Save all the essential parameters of the ExperimentalEmulator to disk
- Parameters
save_dir (str or pathlib.Path) – The directory used for saving emulator files.
Notes
This saves the parameters needed to reproduce results but not the associated data. You can separately save X_test, y_test, X_train, and y_train attributes if you want to be able to reproduce splits, test results and parity plots.
Examples
>>> from summit import * >>> import pkg_resources, pathlib >>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data")) >>> model_name = f"reizman_suzuki_case_1" >>> domain = ReizmanSuzukiEmulator.setup_domain() >>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv") >>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor) >>> res = exp.train(max_epochs=10) >>> exp.save("reizman_test/") >>> #Load data for new experimental emulator >>> exp_new = ExperimentalEmulator.load(model_name, "reizman_test/") >>> exp_new.X_train, exp_new.y_train, exp_new.X_test, exp_new.y_test = exp.X_train, exp.y_train, exp.X_test, exp.y_test >>> res = exp_new.test() >>> fig, ax = exp_new.parity_plot(include_test=True)
-
save_regressor
(save_dir)¶ Save the weights and biases of the regressor to disk
- Parameters
save_dir (str or pathlib.Path) – The directory used for saving emulator files.
-
test
(**kwargs)¶ Get test results
This requires that train has already been called or the ExperimentalEmulator was initialized from a pretrained model.
- Parameters
scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
X_test (np.ndarray, optional) – Test X inputs
y_test (np.ndarray, optional) – Corresponding test labels
Notes
The method loops over the predictors, so the resulting are scores averaged over all objectives for each of the predictors. In contrast, the parity_plot code gives the scores for each objective averaged over the predictors.
- Returns
scores_dict – A dictionary of scores with test_SCORE as the key and values as an array of scores for each of the models in the ensemble.
- Return type
dict
-
train
(**kwargs)¶ Train the model on the dataset
This will automatically do a train-test split and then train via cross-validation on the train set.
- Parameters
test_size (float, optional) – The size of the test as a fraction of the total dataset. Defaults to 0.1.
cv_folds (int, optional) – The number of cross validation folds. Defaults to 5.
max_epochs (int, optional) – The max number of epochs for each CV fold. Defaults to 100.
scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
search_params (dict, optional) – A dictionary with parameter values to change in a gridsearch.
regressor_kwargs (dict, optional) – You can pass extra arguments to the regressor here.
callbacks (None, "disable" or list of Callbacks) – Skorch callbacks passed to skorch.net. See: https://skorch.readthedocs.io/en/latest/net.html
verbose (int) – 0 for no logging, 1 for logging
Notes
If predictor was set in the initialization, it will not be overwritten.
- Returns
- Return type
A dictionary containing the results of the training.
Examples
>>> from summit import * >>> import pkg_resources, pathlib >>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data")) >>> model_name = f"reizman_suzuki_case_1" >>> domain = ReizmanSuzukiEmulator.setup_domain() >>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv") >>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor) >>> # Test grid search cross validation and training >>> params = { "regressor__net__max_epochs": [1, 1000]} >>> exp.train(cv_folds=5, random_state=100, search_params=params, verbose=0)
-
summit.benchmarks.
get_pretrained_baumgartner_cc_emulator
(include_cost=False, use_descriptors=False)[source]¶ Get a pretrained BaumgartnerCrossCouplingEmulator
- Parameters
include_cost (bool, optional) – Include minimization of cost as an extra objective. Cost is calculated as a deterministic function of the inputs (i.e., no model is trained). Defaults to False.
use_descriptors (bool, optional) – Use descriptors for the catalyst and base instead of one-hot encoding (defaults to False). T The descriptors been pre-calculated using COSMO-RS. To only use descriptors with a single feature, pass descriptors_features a list where the only item is the name of the desired categorical variable.
Examples
>>> import matplotlib.pyplot as plt >>> from summit.benchmarks import get_pretrained_baumgartner_cc_emulator >>> from summit.utils.dataset import DataSet >>> import pandas as pd >>> b = get_pretrained_baumgartner_cc_emulator(include_cost=True, use_descriptors=False) >>> fig, ax = b.parity_plot(include_test=True) >>> plt.show() >>> columns = [v.name for v in b.domain.variables] >>> values = { "catalyst": ["tBuXPhos"], "base": ["DBU"], "t_res": [328.717801570892],"temperature": [30],"base_equivalents": [2.18301549894049]} >>> conditions = pd.DataFrame(values) >>> conditions = DataSet.from_df(conditions) >>> results = b.run_experiments(conditions, return_std=True)
-
class
summit.benchmarks.
BaumgartnerCrossCouplingEmulator
(include_cost=False, use_descriptors=False, **kwargs)[source]¶ Baumgartner Cross Coupling Emulator
Virtual experiments representing the Aniline Cross-Coupling reaction similar to Baumgartner et al. (2019). Experimental outcomes are based on an emulator that is trained on the experimental data published by Baumgartner et al.
This is a five dimensional optimisation of temperature, residence time, base equivalents, catalyst and base.
The categorical variables (catalyst and base) contain descriptors calculated using COSMO-RS. Specifically, the descriptors are the first two sigma moments.
To use the pretrained version, call get_pretrained_baumgartner_cc_emulator
- Parameters
include_cost (bool, optional) – Include minimization of cost as an extra objective. Cost is calculated as a deterministic function of the inputs (i.e., no model is trained). Defaults to False.
use_descriptors (bool, optional) – Use descriptors for the catalyst and base instead of one-hot encoding (defaults to False). T The descriptors been pre-calculated using COSMO-RS. To only use descriptors with a single feature, pass descriptors_features a list where the only item is the name of the desired categorical variable.
Examples
>>> bemul = BaumgartnerCrossCouplingEmulator()
Notes
This benchmark is based on data from [Baumgartner] et al.
References
- Baumgartner
L. M. Baumgartner et al., Org. Process Res. Dev., 2019, 23, 1594–1601 DOI: 10.1021/acs.oprd.9b00236
-
property
data
¶ Datast of all experiments run
-
property
domain
¶ The domain for the experiment
-
classmethod
from_dict
(d, **kwargs)¶ Create ExperimentalEmulator from a dictionary
Notes
This does not load the regressor weights and biases. After calling from_dict, call load_regressor to load the weights and biases.
-
classmethod
load
(save_dir, include_cost=False, use_descriptors=False, **kwargs)[source]¶ Load all the essential parameters of the BaumgartnerCrossCouplingEmulator from disc
- Parameters
save_dir (str or pathlib.Path) – The directory from which to load emulator files.
include_cost (bool, optional) – Include minimization of cost as an extra objective. Cost is calculated as a deterministic function of the inputs (i.e., no model is trained). Defaults to False.
use_descriptors (bool, optional) – Use descriptors for the catalyst and base instead of one-hot encoding (defaults to False). T The descriptors been pre-calculated using COSMO-RS. To only use descriptors with a single feature, pass descriptors_features a list where the only item is the name of the desired categorical variable.
-
load_regressor
(save_dir)¶ Load the weights and biases of the regressor from disk
- Parameters
save_dir (str or pathlib.Path) – The directory used for saving emulator files.
-
pareto_plot
(objectives=None, colorbar=False, ax=None)¶ Make a 2D pareto plot of the experiments thus far
- Parameters
objectives (array-like, optional) – List of names of objectives to plot. By default picks the first two objectives
ax (matplotlib.pyplot.axes, optional) – An existing axis to apply the plot to
- Returns
if ax is None returns a tuple with the first component
as the a new figure and the second component the axis
if ax is a matplotlib axis, returns only the axis
- Raises
ValueError – If the number of objectives is not equal to two
-
parity_plot
(**kwargs)¶ Produce a parity plot based for the trained model using matplotlib
- Parameters
output_variable_names (str or list, optional) – The output variables to plot. Defaults to all.
include_test (bool, optional) – Include the performance of the model on the test set. Defaults to False.
train_color (str, optional) – Hex string for the train points. Defaults to “#6f3666”
test_color (str, optional) – Hex string for the train points. Defaults to “#3c328c”
-
reset
()¶ Reset the experiment
This will clear all data.
-
run_experiments
(conditions, computation_time=None, **kwargs)¶ Run the experiment(s) at the given conditions
- Parameters
conditions (summit.utils.dataset.Dataset) – A dataset with columns matching the variables in the domain of a experiment(s) to run.
computation_time (float, optional) – The time used by the strategy in calculating the next experiments. By default, the time since the last call to run_experiment is used.
-
save
(save_dir)¶ Save all the essential parameters of the ExperimentalEmulator to disk
- Parameters
save_dir (str or pathlib.Path) – The directory used for saving emulator files.
Notes
This saves the parameters needed to reproduce results but not the associated data. You can separately save X_test, y_test, X_train, and y_train attributes if you want to be able to reproduce splits, test results and parity plots.
Examples
>>> from summit import * >>> import pkg_resources, pathlib >>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data")) >>> model_name = f"reizman_suzuki_case_1" >>> domain = ReizmanSuzukiEmulator.setup_domain() >>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv") >>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor) >>> res = exp.train(max_epochs=10) >>> exp.save("reizman_test/") >>> #Load data for new experimental emulator >>> exp_new = ExperimentalEmulator.load(model_name, "reizman_test/") >>> exp_new.X_train, exp_new.y_train, exp_new.X_test, exp_new.y_test = exp.X_train, exp.y_train, exp.X_test, exp.y_test >>> res = exp_new.test() >>> fig, ax = exp_new.parity_plot(include_test=True)
-
save_regressor
(save_dir)¶ Save the weights and biases of the regressor to disk
- Parameters
save_dir (str or pathlib.Path) – The directory used for saving emulator files.
-
test
(**kwargs)¶ Get test results
This requires that train has already been called or the ExperimentalEmulator was initialized from a pretrained model.
- Parameters
scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
X_test (np.ndarray, optional) – Test X inputs
y_test (np.ndarray, optional) – Corresponding test labels
Notes
The method loops over the predictors, so the resulting are scores averaged over all objectives for each of the predictors. In contrast, the parity_plot code gives the scores for each objective averaged over the predictors.
- Returns
scores_dict – A dictionary of scores with test_SCORE as the key and values as an array of scores for each of the models in the ensemble.
- Return type
dict
-
to_dict
(**experiment_params)¶ Convert emulator parameters to dictionary
Notes
This does not save the weights and biases of the regressor. You need to use save_regressor method.
-
train
(**kwargs)¶ Train the model on the dataset
This will automatically do a train-test split and then train via cross-validation on the train set.
- Parameters
test_size (float, optional) – The size of the test as a fraction of the total dataset. Defaults to 0.1.
cv_folds (int, optional) – The number of cross validation folds. Defaults to 5.
max_epochs (int, optional) – The max number of epochs for each CV fold. Defaults to 100.
scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
search_params (dict, optional) – A dictionary with parameter values to change in a gridsearch.
regressor_kwargs (dict, optional) – You can pass extra arguments to the regressor here.
callbacks (None, "disable" or list of Callbacks) – Skorch callbacks passed to skorch.net. See: https://skorch.readthedocs.io/en/latest/net.html
verbose (int) – 0 for no logging, 1 for logging
Notes
If predictor was set in the initialization, it will not be overwritten.
- Returns
- Return type
A dictionary containing the results of the training.
Examples
>>> from summit import * >>> import pkg_resources, pathlib >>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data")) >>> model_name = f"reizman_suzuki_case_1" >>> domain = ReizmanSuzukiEmulator.setup_domain() >>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv") >>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor) >>> # Test grid search cross validation and training >>> params = { "regressor__net__max_epochs": [1, 1000]} >>> exp.train(cv_folds=5, random_state=100, search_params=params, verbose=0)