Simulated design-build-test-learn-cycles

A simulation of metabolic engineering experiments

DBTL cycles are widely used in the optimization of microorganisms for producing valuable compounds in a sustainable way. Despite the widespread use, many open questions exist when it comes to effectively using DBTL cycles. The reason for this is that due to the costly nature (in terms of time and money), effectively comparing design choices is never considered. It is for example highly unlikely that for the same optimization process, different sampling scenarios are compared, even though this might be valuable. An alternative cheap way to answer these types of questions is by simulating DBTL cycles¹. Here, we show a reimplemented software of simulated-DBTL in Jax/Diffrax². This allows users to test scenarios for their own lab settings.

sim-dbtl

Figure 1: The design-build-test-learn cycle of iterative metabolic engineering, a widely adopted paradigm for strain engineering.

Usage

Import required functions

from jaxkineticmodel.load_sbml.sbml_model import SBMLModel
import jax.numpy as jnp
from jaxkineticmodel.utils import get_logger
from jaxkineticmodel.simulated_dbtl.dbtl import DesignBuildTestLearnCycle
logger = get_logger(__name__)

We first choose a kinetic model that we consider only as a black-box that outputs some target value. From this model scenarios are simulated that might be encountered in real metabolic engineering. At the heart of this implementation is the DesignBuildTestLearnCycle class. This requires setting the initial parameters (the reference state), initial conditions, the timespan of process, and the target that we wish to simulate.

# load model (Messiha et. al (2013))
filepath = ("models/sbml_models/working_models/Messiha2013.xml")

model = SBMLModel(filepath)
# we retrieve parameters from the model


# timespan of model
ts = jnp.linspace(0, 6000, 1000)

dbtl_cycle = DesignBuildTestLearnCycle(model=model,
                                       parameters=model.parameters,
                                       initial_conditions=model.y0,
                                       timespan=ts,
                                       target=['PEP'])

Design phase

We now set up the combinatorial pathway optimization experiment. We define some parameter targets that we want to perturb. Then, each target gets some "promoter" values that perturb the parameters by multiplication. Library transformation have a fixed number of positions in the strain. The design_assign_positions can be used to assign the library 'cassettes' to each position. If no assignment is set, this defaults to all library units being set for each position. Then, we assign probabilities to each promoter-parameter values. If this function is empty, each promoter-parameter is equally probable. Finally, we generate the designs, for 50 samples and 6 chosen pro-parameter elements.

# design phase
parameter_perturbations = {'lp_ADH__Kacald': [0.2, 0.5, 1, 1.5, 2],
                           'lp_ENO__kcat_ENO1': [1.2, 1.5, 1.8],
                           'lp_FBA__Kdhap': [1.1, 1.6, 1.3],
                           'lp_HXK__kcat_HXK1': [0.6, 1.1, 2, 3],
                           'lp_PGK__kcat': [1, 2, 3],
                           'lp_HXT__Vmax': [0.5, 1, 1.5],
                           'lp_GND__Kp6g_GND1': [2, 3, 4]}

dbtl_cycle.design_establish_library_elements(parameter_perturbations=parameter_perturbations)

_ = dbtl_cycle.design_assign_positions(n_positions=6)
_ = dbtl_cycle.design_assign_probabilities(probabilities_per_position=None)
strain_designs = dbtl_cycle.design_generate_strains(samples=50)

Build/Test phase

In the build phase we simulate all the designs and add a noise model to the outcomes for the target. The first simulation will take quite long, but after that it is compiled. If you do not run the class DesignBuildTestLearnCycle again, the simulations remain fast. The values that are taken in the dataset is the average of the last 10 datapoints of the target state variable.

# build phase
values = dbtl_cycle.build_simulate_strains(strain_designs, plot=False)


# test phase
noised_values = dbtl_cycle.test_add_noise(values, percentage=0.1, noise_type='heteroschedastic')
data = dbtl_cycle.test_format_dataset(strain_designs=strain_designs,
                                      production_values=noised_values,
                                      reference_parameters=dbtl_cycle.parameters)

Learn phase

From here, the produced data can be used to compare whatever hyperparameter of the DBTL cycle you are interested: the performance of ML models, DoE v.s. Random sampling, etc.. As an example, we train an XGBoost model on the set of generated datapoints, as well as a quick validation on newly generated strain designs.

# learn phase
xgbparameters = {'tree_method': 'auto', 'reg_lambda': 1, 'max_depth': 2, "disable_default_eval_metric": 0}
alternative_params = {'num_boost_round': 10, 'early_stopping_rounds': 40}
bst, r2_scores = dbtl_cycle.learn_train_model(data=data,
                                              target="PEP",
                                              model_type="XGBoost",
                                              args=(xgbparameters, alternative_params), test_size=0.20)

dbtl_cycle.learn_validate_model(samples=50,
                                target='PEP',
                                plotting=True
                                )

plot_validate

Figure 2: Model performance (true versups predicted values).

References

van Lent, P., Schmitz, J., & Abeel, T. (2023). Simulated design–build–test–learn cycles for consistent comparison of machine learning methods in metabolic engineering. ACS Synthetic Biology, 12(9), 2588-2599. ↩
Kidger, P. (2022). On neural differential equations. arXiv preprint arXiv:2202.02435. ↩