Simulated design-build-test-learn-cycles
A simulation of metabolic engineering experiments
DBTL cycles are widely used in the optimization of microorganisms for producing valuable compounds in a sustainable way. Despite the widespread use, many open questions exist when it comes to effectively using DBTL cycles. The reason for this is that due to the costly nature (in terms of time and money), effectively comparing design choices is never considered. It is for example highly unlikely that for the same optimization process, different sampling scenarios are compared, even though this might be valuable. An alternative cheap way to answer these types of questions is by simulating DBTL cycles1. Here, we show a reimplemented software of simulated-DBTL in Jax/Diffrax2. This allows users to test scenarios for their own lab settings.
Figure 1: The design-build-test-learn cycle of iterative metabolic engineering, a widely adopted paradigm for strain engineering.
Usage
Import required functions
from jaxkineticmodel.load_sbml.sbml_model import SBMLModel
import jax.numpy as jnp
from jaxkineticmodel.utils import get_logger
from jaxkineticmodel.simulated_dbtl.dbtl import DesignBuildTestLearnCycle
logger = get_logger(__name__)
We first choose a kinetic model that we consider only as a black-box that outputs some target value.
From this model scenarios are simulated that might be encountered in real metabolic engineering.
At the heart of this implementation is the DesignBuildTestLearnCycle
class.
This requires setting the initial parameters (the reference state), initial conditions,
the timespan of process, and the target that we wish to simulate.
# load model (Messiha et. al (2013))
filepath = ("models/sbml_models/working_models/Messiha2013.xml")
model = SBMLModel(filepath)
# we retrieve parameters from the model
# timespan of model
ts = jnp.linspace(0, 6000, 1000)
dbtl_cycle = DesignBuildTestLearnCycle(model=model,
parameters=model.parameters,
initial_conditions=model.y0,
timespan=ts,
target=['PEP'])
Design phase
We now set up the combinatorial pathway optimization experiment.
We define some parameter targets that we want to perturb.
Then, each target gets some "promoter" values that perturb the parameters by
multiplication. Library transformation have a fixed number of positions in the strain. The
design_assign_positions
can be used to assign the library 'cassettes' to each position.
If no assignment is set, this defaults to all library units being set for each position.
Then, we assign probabilities to each promoter-parameter values. If this function is empty, each promoter-parameter is equally probable.
Finally, we generate the designs, for 50 samples and 6 chosen pro-parameter elements.
# design phase
parameter_target_names = ['lp_ADH__Kacald', 'lp_ENO__kcat_ENO1',
'lp_FBA__Kdhap', 'lp_HXK__kcat_HXK1',
'lp_PGK__kcat', 'lp_HXT__Vmax', 'lp_GND__Kp6g_GND1']
parameter_perturbation_value = [[0.2, 0.5, 1, 1.5, 2], # 'lp_ADH__Kacald'
[1.2, 1.5, 1.8], # 'lp_ENO__kcat_ENO1'
[1.1, 1.6, 1.3], # 'lp_FBA__Kdhap'
[0.6, 1.1, 2, 3], # 'lp_HXK__kcat_HXK1'
[1, 2, 3], # 'lp_PGK__kcat'
[0.5, 1, 1.5], # 'lp_HXT__Vmax'
[2, 3, 4]] # 'lp_GND__Kp6g_GND1'
dbtl_cycle.design_establish_library_elements(parameter_target_names,
parameter_perturbation_value)
_ = dbtl_cycle.design_assign_positions(n_positions=6)
_ = dbtl_cycle.design_assign_probabilities(probabilities_per_position=None)
# The replacement is false means that each pro-parameter
# pair can only be chosen once from the list per strain design
strain_designs = dbtl_cycle.design_generate_strains(samples=50)
Build/Test phase
In the build phase we simulate all the designs and add a noise model to the outcomes for the target. The first simulation will take quite long, but after that it is compiled. If you do not run the class DesignBuildTestLearnCycle
again, the simulations remain fast. The values that are taken in the dataset is the average of the last 10 datapoints of the target state variable.
# build phase
values = dbtl_cycle.build_simulate_strains(strain_designs, plot=False)
#
# # test phase
noised_values = dbtl_cycle.test_add_noise(values, percentage=0.1, noisetype='heteroschedastic')
data = dbtl_cycle.test_format_dataset(strain_designs=strain_designs,
production_values=noised_values,
reference_parameters=dbtl_cycle.parameters)
Learn phase
From here, the produced data can be used to compare whatever hyperparameter of the DBTL cycle you are interested: the performance of ML models, DoE v.s. Random sampling, etc.. As an example, we train an XGBoost model on the set of generated datapoints, as well as a quick validation on newly generated strain designs.
# learn phase
xgbparameters = {'tree_method': 'auto', 'reg_lambda': 1, 'max_depth': 2, "disable_default_eval_metric": 0}
alternative_params = {'num_boost_round': 10, 'early_stopping_rounds': 40}
bst, r2_scores = dbtl_cycle.learn_train_model(data=data,
target="PEP",
model_type="XGBoost",
args=(xgbparameters, alternative_params), test_size=0.20)
dbtl_cycle.learn_validate_model(samples=50,
target='PEP',
plotting=True
)
Figure 2: Model performance (true versups predicted values).