Development of a hybrid model for a partially known intracellular signaling pathway through correction term estimation and neural network modeling

Dongheon Lee, Arul Jayaraman, Joseph S. Kwon

The authors have declared that no competing interests exist.

¤Current address: Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA

https://doi.org/10.1371/journal.pcbi.1008472, Volume: 16, Issue: 12, Pages: 1-31

Article Type: research-article Article History

Publisher: Public Library of Science

- Facebook
- Twitter
- Linkedin
- Whatsapp
Altmetric

Table of Contents

Introduction
Methods
Results
Discussion
Supporting information

Abstract

Developing an accurate first-principle model is an important step in employing systems biology approaches to analyze an intracellular signaling pathway. However, an accurate first-principle model is difficult to be developed since it requires in-depth mechanistic understandings of the signaling pathway. Since underlying mechanisms such as the reaction network structure are not fully understood, significant discrepancy exists between predicted and actual signaling dynamics. Motivated by these considerations, this work proposes a hybrid modeling approach that combines a first-principle model and an artificial neural network (ANN) model so that predictions of the hybrid model surpass those of the original model. First, the proposed approach determines an optimal subset of model states whose dynamics should be corrected by the ANN by examining the correlation between each state and outputs through relative order. Second, an L2-regularized least-squares problem is solved to infer values of the correction terms that are necessary to minimize the discrepancy between the model predictions and available measurements. Third, an ANN is developed to generalize relationships between the values of the correction terms and the system dynamics. Lastly, the original first-principle model is coupled with the developed ANN to finalize the hybrid model development so that the model will possess generalized prediction capabilities while retaining the model interpretability. We have successfully validated the proposed methodology with two case studies, simplified apoptosis and lipopolysaccharide-induced NFκB signaling pathways, to develop hybrid models with in silico and in vitro measurements, respectively.

An intracellular signaling pathway is often represented by a set of nonlinear ordinary differential equations, which translate our current knowledge about the signaling pathway into a testable mathematical model. However, predictions from such models are often subject to high uncertainty since many signaling pathways are only partially known beforehand. In this study, we propose a systematic approach to develop a hybrid model to improve model accuracy by combining machine learning and the first-principle modeling. Specifically, model correction terms are learned from discrepancy between model predictions and measurements, and these terms are added to the first-principle model to enhance the prediction accuracy. Once these correction terms are learned from the data, an artificial neural network (ANN) model is developed to find an empirical relation between the model and the correction terms so that the developed ANN can be used to posses improved predictive capabilities even in new operating conditions (i.e., generalizability). The final hybrid model is then constructed by coupling the first-principle model with the developed ANN.

Lee,Jayaraman,Kwon,and Diamond: Development of a hybrid model for a partially known intracellular signaling pathway through correction term estimation and neural network modeling

Introduction

An intracellular signaling pathway is a biochemical reaction network of cells to adjust their metabolism, gene expression, and other necessary actions so that the cells can appropriately respond to perturbations present in their environment [1, 2]. Since an intracellular signaling pathway is complex involving interactions among a large number of biomolecules inside a cell, it is common to implement a systems biology approach, which integrates experimental observations and mathematical modeling, to analyze the signaling pathway comprehensively [3, 4]. As a result, one of the key tasks in systems biology is to develop a predictive mathematical model for analyzing underlying mechanisms and generating new hypotheses to be tested in the future. To this end, a first-principle based mechanistic model has been a preferred choice for modeling an intracellular signaling pathway. Specifically, a system of nonlinear ordinary differential equations (ODEs) is constructed based on the current knowledge about a system, where its differential equations are derived using kinetic laws such as mass-action and Michaelis-Menten kinetics [4–6]. Since a first-principle model represents the current understandings of a system, its ODEs are physically meaningful, and its predictions are valid over a wide range of conditions [7, 8]. However, since a signaling pathway of interest is often only partially understood due to its inherent complexity, structural mismatches often exist between dynamics of the true system and those predicted by the corresponding first-principle model [4, 8–12].

Instead of a first-principle model, a data-driven model can be developed from available experimental measurements, which can describe the input-output dynamics adequately even when mechanistic understanding of the system is limited [13–17]. However, a data-driven model has narrow applicability as it is tailored to describe input-output relationships contained in the training datasets [18, 19]. Also, available measurements of an intracellular signaling pathway are often limited both in quantity and quality, which may further limit the direct application of a data-driven approach [20, 21].

As an alternative, a hybrid modeling approach that combines first-principle and data-driven modeling techniques has been proposed to describe a process that is only partially known [19]. Here, a hybrid model refers to an improved version of a first-principle model that compensates for model-system mismatches by introducing empirical parameters or terms, which are inferred from available measurements. As a result, a hybrid model has better prediction capabilities than a first-principle one while it preserves generalizability and interpretability, which are difficult to be achieved through a data-driven model [18, 22, 23]. A classic example of the hybrid modeling approach is to model a fedbatch bioreactor, where the biomass growth rate is estimated from process data and coupled with mass conversation laws [18, 22]. In these studies, the mass conservation laws represent our prior knowledge of the system (i.e., the first-principle model), whereas its growth rate is uncertain and inferred from the measurements to improve the model’s prediction accuracy. Due to its merits, the hybrid modeling approach has been implemented to model various biological and chemical processes such as bioprocess development and optimization [24, 25], modeling propagation of fractures during hydraulic fracturing process [7], transcription factor dynamics [8], thin film deposition [26], metallurgic processes [27, 28], and flour beetles population dynamics [29, 30].

In this work, we propose a systematic hybrid modeling approach for an intracellular signaling pathway by incorporating available mechanistic knowledge and experimental measurements. Compared to previous implementation of hybrid models, unique challenges arise when a hybrid model is developed for an intracellular signaling pathway. First, a first-principle model of an intracellular signaling pathway is usually high-dimensional with a large number of states while the amount of available measurements is usually limited. Hence, it is desirable to minimize the number of components in a hybrid model that should be inferred from experimental measurements to minimize the possibility of overfitting, which may compromise the generalizability of the hybrid model. At the same time, it is usually unknown beforehand which parts of a hybrid model should be represented by a data-driven model. In modeling a bioreactor, it is known that the largest uncertainty resides in the cell growth rate, and it is inferred from experimental measurements [18, 22]. However, such knowledge is usually not available for an intracellular signaling pathway [4, 21].

Motivated by the above considerations, we propose a systematic approach to construct a hybrid model to describe the dynamics of an intracellular signaling pathway, which is only partially known. Among a few possible hybrid model structures, a hybrid model formulation proposed by Engelhardt et al. [31, 32] is adopted, where differential equations of model states are adjusted by correction terms inferred from experiments. Since an intracellular signaling pathway is often high-dimensional and its origin of prediction inaccuracy is unknown beforehand, a graphical approach is implemented to determine a subset of model states that have the highest correlations with the measurements. And only these states’ dynamics are modified by the correction terms. Specifically, the values of the correction terms are estimated at the times when the measurements are available so that the model with the estimated correction terms can reproduce the measurements accurately. Then, an empirical map between the first-principle model and the correction terms is approximated by an artificial neural network (ANN). Once an ANN is trained, the hybrid model now can be constructed by integrating the first-principle model and the ANN. The effectiveness and feasibility of the proposed methodology are demonstrated by developing hybrid models of intracellular signaling pathways for two case studies.

Methods

System description

Dynamics of an intracellular signaling pathway are described by a system of nonlinear ODEs as follows:

where

x \in R^{n_{x}}

is the state vector,

θ \in R^{n_{θ}}

is the parameter vector, x₀ is the initial value of the state vector x, and

y \in R^{n_{y}}

is the output vector.

Such ODE models for an intracellular signaling pathway are formulated based on the current understandings of the underlying signaling pathway. Hence, the accuracy of an ODE model depends on the accuracy and completeness of the prior knowledge. Unfortunately, an intracellular signaling pathway is quite complex, which involves interactions among a large number of intracellular biomolecules. As a result, it is likely to have a model-system mismatch, which prevents from utilizing the model for system analysis and prediction.

Under this circumstance, the following hybrid model can be used to minimize potential model-system mismatches while preserving the available knowledge of the system [31, 32]:

where

\tilde{x} \in R^{n_{x}}

and

\tilde{y} \in R^{n_{y}}

are the state and output vectors of the hybrid model, respectively, u is the external input, and

w (t) \in R^{n_{x}}

is the vector of correction terms introduced to improve the overall model prediction accuracy.

In order for Eq 2 to properly predict true dynamics of the system, the values of w need to be explicitly known at any arbitrary time instants so that the hybrid model (Eq 2) can be numerically integrated. Therefore, w values are inferred from available experimental measurements. On the other hand, even with experimentally inferred w(t), the hybrid model cannot be used to predict the system dynamics under a new condition since the corresponding temporal profile of w are not available. Hence, this study aims to develop a functional map $H$ that can compute the value of w at time t for given values of the model states x(t) and t; that is, we aim to develop $w (t) = H (x (t); t)$ for prediction generalizability of the hybrid model.

In summary, this study aims to develop a hybrid model by the following two subsequent steps:

Infer w(t) from available experimental measurements.

Develop the function, $H$ , that maps from x and t to w(t).

Estimation of w(t)

Suppose that measurements are obtained at N_t discrete time instants (i.e., $t_{s} = [0, \dots, t_{N_{t}}]$ ) under N_u different u. Then, the estimation of w(t) can be formulated into the following minimization problem:

where w^s(t) is the continuous temporal profile of w(t) from t = 0 to

t_{N_{t}}

under input u_s, and

{\hat{y}}_{i} (u_{s}; t)

is the i^th output measured under input u_s at time t.

However, Eq 3 is likely to be ill-conditioned because (1) it is an infinite dimensional problem in which the decision variables (i.e., w(t)) are continuous temporal profiles, and (2) the available measurements are limited in quantity (i.e., small values of n_y and N_t) [33–36]. Hence, its solution is subject to high uncertainties, and the resulted hybrid model based on the estimated w(t) will be difficult to be generalized for future predictions.

Proposed methodology for estimating w from measurements

In order to address the aforementioned ill-posedness, the following assumptions are made to reduce the dimension of the estimation problem. First, a L2-regularized least-squares problem is solved to estimate w by handling potential overfitting issues [37–39]. Second, this study aims to infer the values of w at the time instants only when the measurements are available (i.e., t_s). Third, instead of adding correction terms to all the model states, correction terms are given to only a subset of states, which will be denoted as $x_{c} \in R^{n_{s}}$ n_s < n_x, is selected a priori. Lastly, a linear interpolation is used to compute values of the correction terms at time t, when the measurements are not available (i.e., t ∉ t_s), as follows:

where w_c,i is the i^th correction term, t_l is the latest time point of t_s preceding t, and t_l+1 is the earliest time point in t_s following t.

With these assumptions, the estimation of w(t) is reformulated as follows:

where H is a n_x × n_s matrix,

x_{c} = {[x_{c, 1}, \dots, x_{c, n_{s}}]}^{T} \in R^{n_{s}}

is a subset of x whose dynamics are corrected by

w_{c} = {[w_{c, 1}, \dots, w_{c, n_{s}}]}^{T} \in R^{n_{s}}

, H_ij is the entry in H at the i^th row and the j^th column, and α is the L2-regularization tuning parameter. It should be noted that the above assumptions are introduced to estimate the dynamics of w(t) minimizing the likelihood of the overfitting by interpolating the values of w(t) at the time instants when the measurements are not available. Hence, the overall accuracy of the inferred w(t) is likely to improve if the outputs are measured more frequently, which in turn increases the overall prediction power. Also, Eqs and assume that each correction term is added to only one state’s differential equation. As mentioned before, values of w_c are linearly interpolated to solve Eqs and at time instants when measurements are not available.

In Eq 5, the optimal value of α is unknown beforehand, so its value is optimized by five-fold cross-validation. Specifically, the available measurements $Y = \hat{y}$ are divided into training and validation datasets with a 8:2 ratio in five different ways, and the regularized least-square problem (Eq 5) is solved with one particular value of α with respect to each of the five training datasets. Then, the optimal value of α is chosen by examining average model errors with respect to both the training and validation datasets.

Selection of x_c

A key step before solving Eq 5 is to identify x_c, whose trajectories are corrected by w_c. Specifically, two questions need to be addressed: first, what is the dimension of x_c (i.e., n_s), and second, when the value of n_s is known, which states in x should be selected to form x_c. In this study, we employ the idea of invertibility and a graph-theoretical approach to determine x_c.

Specifically, we choose x_c from x so that the resulted system is close to be invertible [40]. If a system is invertible, for a given value of x₀, unique values of y will correspond to unique values of inputs, so one could reconstruct the values of inputs from available output measurements [41, 42]. If w_c in the hybrid model (Eqs 5b and 5c) is viewed as an input to the system and the hybrid model is invertible, the values of w_c can be uniquely characterized from given measurements [41, 43]. Hence, it is our best interest to select the dimension of w_c as well as its placement so that the resulted hybrid model is invertible, which will attenuate the ill-posedness of the inverse problem (i.e., Eq 5). In this regard, Daoutidis and Kravaris [41] have shown that a dynamic system is invertible when the following matrix is nonsingular:

where C(x) is the characteristic matrix of the system [],

L

represents Lie derivative defined as

L_{f} g_{i} (x) = \sum_{j = 1}^{n_{x}} (\partial g_{i} / \partial x_{j}) f_{j} (x)

, h_k is the k^th column vector of the matrix H in Eqs and , where k = 1, …, n_s, and r_i is the relative order of output y_i with respect to w_c, which is defined as the smallest integer for which

or r_i = ∞, if such integer does not exist []. Additionally, the following relation holds true for r_i:

where r_ij is the relative order of y_i with respect to w_c,j, which is the smallest integer for which

L_{h_{j}} L_{f}^{r_{i j} - 1} g_{i} (x) \neq 0

or r_ij = ∞ if such integer does not exist. Based on the definition of the relative order, the relative order matrix of Eqs and can be defined as follows:

In this study, x_c is chosen so that C(x) of Eqs 5b and 5c is nonsingular, which will minimize the ill-posedness of the inverse problem (Eq 5). First, we let the size of x_c equal to the size of outputs (i.e., n_s = n_y) so that C(x) is square. Second, an optimal choice of x_c is identified systematically from x. In this regard, this study will assess the optimality of x_c through the following criteria that are based on the relative order:

The value of $\sum_{i = 1}^{n_{s}} r_{i}$ is minimum.

The value of $\sum_{i}^{n_{s}} \sum_{j \neq i} r_{i} / r_{i j}$ is minimum.

Previous studies have demonstrated that the relative order measures ‘physical closeness’ between a correction term and an output [43, 45]: a smaller value of r_i represents a stronger connection between w_c and y_i. So, the first criterion renders w_c to have the maximum correlations with the outputs. On the other hand, the second criterion is to render each w_c,j will have the strongest correlation with only one output while having the weakest correlation with the remaining outputs [45, 46]. Consequently, by meeting the above two criteria, each correction term in w_c will have the strongest correlation with only one correction term, which will minimize the likelihood of the ill-posedness of Eq 5 [47].

In the perspective of the invertibility, selecting x_c based on the above two criteria, particularly the second one, maximizes the likelihood of C(x) to be nonsingular. To understand this point more clearly, the outputs and x_c are re-ordered so that the smallest element in each row is diagonally located in the relative order matrix, and C(x) is also rearranged correspondingly. By minimizing the second criterion, the possibility that values of all r_ij, ∀j ≠ i, are larger than r_i is maximized; therefore, the non-diagonal entries in the rearranged C(x) are likely to be zero. As a result, C(x) will be close to be a square diagonal matrix. Thus, achieving the aforementioned two criteria would guarantee the hybrid model (Eqs 5b and 5c) to be invertible as well as the reliability of the solution to Eq 5.

In order to select a combination based on the above criteria, the following steps are taken:

Enumerate all n_y permutations of x.

For each candidate, construct the corresponding relative order matrix, and compute its $\sum_{i = 1}^{n_{y}} r_{i}$ .

Compute their ∑_i ∑_j≠i r_i/r_ij values.

Find the candidate that satisfies the condition.

For implementing the above procedure, a relative order matrix has to be constructed for each candidate to calculate the two criteria. However, performing Lie differentiation can be computationally expensive; hence, a graphical approach is implemented to evaluate relative orders of a system [43], which will be discussed in the next section.

A graphical approach to evaluate relative orders

A state-space model of a process (Eqs 5b and 5c) can be represented by a directed graph, which is defined by a set of vertices and a set of edges by the following rules [43, 48]:

States ( $x \in R^{n_{x}}$ ), outputs ( $y \in R^{n_{y}}$ ), and manipulated inputs ( $w_{c} \in R^{n_{s}}$ ) are represented by a set of vertices in a graph.

If ∂f_i(x)/∂x_j ≠ 0, i, j = 1, …, n_x, there is a unidirectional edge pointing from the vertex of x_j to that of x_i.

If ∂f_i(x)/∂w_c,k ≠ 0, k = 1, …, n_s, there is a unidirectional edge pointing from the vertex of w_c,k to that of x_i.

If ∂y_l/∂x_j ≠ 0, l = 1, …, n_y, there is a unidirectional edge pointing from the vertex of x_j to that of y_l.

A path from one vertex to another is a sequence of edges without repeating vertices, and the path length is the number of edges included in one particular path.

Previously, Daoutidis and Kravaris [43] demonstrated that r_ij can be calculated by computing the shortest path length from an input w_c,j to an output y_i as follows:

where l_ij is the shortest path length from an input w_c,j to an output y_i. Therefore, the relative order matrix can be easily computed once a graph of a state-space model is constructed. As an example, Fig 1 provides an example on how a state-space model is translated into its corresponding directional graph; specifically, Fig 1 is a representation of the following dynamic system:

The relative order of this system is two, which can be easily computed from its graph.

Fig 1

Schematic illustration of the directional graph of a state-space model.

In summary, the procedures for selecting x_c, whose dynamics will be corrected by w_c, are as follows:

Set n_s, the size of x_c, to be equal to the dimension of outputs (i.e., n_y).

Enumerate all n_s permutations of x as candidates for x_c.

Construct a directional graph by adding correction terms to each x_c candidate enumerated in the previous step.

Construct the corresponding relative order matrix based on Eq 10.

Find a configuration that has the lowest $\sum_{i = 1}^{n_{y}} r_{i}$ and ∑_i ∑_j≠i r_i/r_ij values.

Once the optimal x_c is chosen, is solved to infer W.

It should be noted that the identification of x_c by the relative order and graph theory can also guide the future model refinement. Specifically, since the identified x_c has the highest correlations with the output, further literature survey and experimentation on these states can be implemented to improve the differential equations for these states and thus increase the overall prediction accuracy of the first-principle model.

Development of artificial neural network models

Once W is estimated by solving Eq 5, the available (imperfect) first-principle model coupled with the estimated W is now able to predict the system dynamics under the experimental measurements more accurately. However, it cannot predict system dynamics under a new operating condition since its corresponding w_c(t) is not available. Hence, the next step is to infer $H$ , which maps x and current time t, to w_c(t), to generalize the model prediction so that the resulted hybrid model can predict the system dynamics under new conditions. In this regard, multiple w_c(t) should be obtained under a few different operating conditions (i.e., N_u > 1) so that the empirical function $H$ mapping x and t to w_c(t) is accurate.

However, the functional form of $H$ is usually unknown a priori. Although there are some methods proposed in the literature to identify functional forms from the data, inferring functional forms usually requires a large amount of data and can be computationally expensive [49–54]. Instead, we assume $H$ to be an ANN model. Here, an ANN is chosen here due to its proven ability to represent any arbitrary input-output relations with sufficient accuracy [55, 56].

An ANN consists of an input layer, multiple hidden layers, and an output layer. Specifically, each layer contains multiple neurons, and each neuron in each layer is fully connected to all the neurons in the next layer (Fig 2). In each hidden layer, the following hyperbolic tangent sigmoid transfer function is used:

where

o_{i}^{(k)}

is the output from the i^th neuron in the k^th hidden layer,

{\hat{u}}_{i}^{(k)}

is the weighted sum of inputs given to the i^th neuron,

N_{n}^{(k)}

is the number of neurons in the k^th hidden layer,

α_{i j}^{(k)}

is the weightage for the input

z_{j}^{(k)}

to the i^th neuron,

b_{j}^{(k)}

is the bias term given to the i^th neuron, and N_h is the number of hidden layers in an ANN. It should be noted that o^(k−1) for the first hidden layer is the inputs to an ANN.

Fig 2

Schematic illustration of an ANN model.

On the other hand, the ANN outputs are computed as follows:

where

{\hat{w}}_{c, l}

is the l^th element of

{\hat{w}}_{c}

{\hat{w}}_{c}

is the predicted w_c from the ANN, β_li is the weighthage given to

o_{i}^{(N_{h})}

for w_l, and c_l is the bias term of w_l.

Model selection and training

The goal of the ANN training is to estimate hyperparameters of an ANN model, which include α, b, β, and c in Eqs 12 and 13 from available datasets. Here, the datasets include ANN input and output datasets, where the ANN inputs refer to t and x(t) from simulating the original first-principle model (Eq 1) while the ANN outputs refer to w estimated from solving Eq 5. All the ANN training sessions in this study are performed in the MATLAB Neural Network Toolbox with the Levenberg-Marquardt algorithm.

Since the structure of an ANN (i.e., the numbers of hidden layers and neurons in each hidden layer) is unknown beforehand, a number of ANN models with different structures are trained and compared to find the best one through evaluating their average corrected Akaike information criterion (AIC_c) [57, 58]. For a model with p number of parameters, AIC_c can be calculated as follows:

where n is the number of data points in the dataset, SSE is the sum of squared errors between the observations and ANN predictions, and p is the number of the ANN hyperparameters.

To find an optimal structure, datasets are randomly partitioned into the training, testing, and validation sets with a 70:15:15 ratio 100 times, and ANN models with different structures are trained 100 times to compute their average AIC_c [59, 60]. Then, the ANN structure with the minimum AIC_c is selected as the optimal one.

Once an ANN is developed, the final mathematical form of a hybrid model can be described as follows:

where

{\hat{w}}_{c}

is the predicted value of w_c from the developed ANN,

H

is the ANN developed as above, and h is a vector containing the ANN’s hyperparameters. It should be noted that the ANN inputs are t and x, which are the model states simulated from the original first-principle model ().

Results

In this section, two case studies are presented to demonstrate how the proposed methodology can be implemented to develop a hybrid model of an intracellular signaling pathway.

Case study 1: TNFα signaling pathway

In this case study, the proposed scheme is first used to construct a hybrid model to describe a tumor necrosis factor-α (TNFα) signaling pathway, which is illustrated in Fig 3a. Specifically, this system describes how TNFα, which is an important inflammatory cytokine, can initiate apoptotic and nuclear factor-κB (NFκB) signaling pathways as well as crosstalks between these two pathways. In this system, the apoptotic signaling pathway is described by dynamics of caspase 3 (C3a) and caspase 8 (C8a), where C3a is a protein, whose high activity leads to apoptosis, and TNFα-activated C8a increases the C3a activity. On the other hand, TNFα activates NFκB protein by suppressing inhibitor of NFκB (IκB), which inhibits the NFκB activity. Since the NFκB activation in turn increases the IκB activity, the NFκB activity will naturally decay over time. Furthermore, the NFκB activation suppresses the C3a and C8a activities and thus promotes cellular survival, while the increase in the apoptotic signaling pathway lowers the NFκB activity. More details on this system can be found in [61, 62].

Fig 3

Schematic diagrams of the simplified NFκB signaling pathway models.

(a) The correct model that is used for generating in silico experimental measurements (adopted from Chaves et al. [61]). (b) The incorrect model that is used as the first-principle model for constructing a hybrid model. (c) A graph representation of the available (incorrect) first-principle model (Eq 19).

In silico measurements

The TNFα signaling pathway represented in Fig 3a can be described by the following mathematical model [61, 62]:

where x_i, i = 1, …, 4, represents the relative activities of C8a, C3a, NFκB, and IκB, respectively, and u represents the TNFα. Also, the functions, inh_i and act_i, in the model are rational functions given by:

where a_i, i = 1, …, 5, and b_i, i = 1, …, 4, are the model parameters whose values are shown in Table 1. The initial concentrations are [x₁(0) x₂(0) x₃(0) x₄(0)] = [0, 0, 0.29, 0.63].

Table 1

Nominal parameter values of the correct TNFα signaling model shown in Fig 3a.

Parameter	Value	Parameter	Value
a₁	0.6	b₁	0.4
a₂	0.2	b₂	0.7
a₃	0.2	b₃	0.3
a₄	0.5	b₄	0.5
		b₅	0.4

In this case study, the model (Eq 16) is considered as the true system, and it is used to generate in silico experimental measurements. Specifically, the NFκB activity (i.e., x₃) is measured every hour from 0 to 14 hours under three different input conditions (i.e., u = 0.5, 1, 2). It should be noted that the value of u is fixed while generating the in silico measurements. Also, experimental noise is introduced as follows:

where y(t) is the measurement at time t, and μ_× is the multiplicative experimental noise term that is randomly sampled from a log-normal distributions (i.e.,

ln μ_{\times} \sim N (0, 0.01)

), and μ₊ is the additive noise term that is randomly sampled from a normal distribution

N (0, 0.01)

. This particular formulation is used since it is a realistic representation for the noise in measurements. Particularly, multiplicative noise (i.e., μ_×) is often observed for non-negative data, and it is suggested to be one of main sources of variability in the biological data [–].

Available first-principle model

On the other hand, we assume that the system is only partially understood, and Fig 3b represents the current understanding of the system, which is described by the following ODE model:

Compared with the accurate system dynamics described by , the imperfect first-principle model () misses two mechanisms in the true system: the C3a-induced suppression of NFκB activity and NFκB-induced IκB activation. Due to such system-model mismatches, there is a considerable degree of discrepancy between measured and predicted dynamics as shown in Fig 4. Therefore, the proposed methodology is implemented to construct a hybrid model that can compensate for the model-system mismatches.

Fig 4

System-model mismatches in the TNFα signaling pathway model.

Comparison between experimental measurements (empty circles) and model predictions (solid lines) with 0.5 (blue), 1 (red), and 2 (black) of TNFα. The measurements are generated by simulating Eq 16 while the model predictions are obtained from Eq 19.

Hybrid model development

The first step of the proposed methodology is to determine a set of states whose trajectories need to be corrected by the addition of w_c. Since there is only one output (i.e., x₃), the number of states to be corrected by correction terms is one as described earlier. Then, a graphical approach is implemented to determine which state should be corrected by w_c. Fig 3c is a graphical representation of the first-principle model (Eq 19) to visualize the interconnections among the states. Since the number of the correction term to be added is determined to be one, the correction term can be added to one of x₁, x₂, and x₄ as shown in S1 Fig. In Fig 3, it is clear that there is only one directed edge pointing to the output (x₃), which stems from x₄. Consequently, the only feasible configuration for the correction term in this system is the first one in S1 Fig, where the correction term is placed to adjust the dynamics of x₄ and eventually the system output. It should be noted that the correction term is added to the differential equation of x₄ only.

Next, the regularized least-squares problem is solved to estimate the values of w_c at the time instants when the measurements are taken under three different input conditions. As the α value in Eq 5 for this system is unknown, its optimal value is found by the five-fold cross-validation. Table 2 shows the average normalized mean squared errors (MSE) between model predictions and the measurements for five different α values, and the optimal α value is determined to be one. Hence, the w_c estimation results corresponding to α = 1 are used for the subsequent analysis and ANN development. Before constructing an ANN, the accuracy of the estimated values of w_c is assessed by comparing the experimental measurements and the dynamics predicted by the available (incorrect) first-principle model (Eq 19) coupled with the estimated w_c. Fig 5 shows that the discrepancy between the predicted dynamics and the experimental measurements is diminished and thus validates the w_c estimation results.

Table 2

Comparison of the normalized MSE at different α values for the first case study.

α	normalized MSE
0.001	0.777
0.01	0.150
0.1	0.072
1	5.75 × 10⁻³
10	9.77 × 10⁻³

Fig 5

Validation of the accuracy of inferred w_c.

Comparison between predicted (red solid line) and measured (blue empty circle) NFκB dynamics when the activity of TNFα equals to (a) 0.5, (b) 1, and (c) 2. Blue dash lines represent the model predictions without the correction terms (w_c).

As the last step of the hybrid model construction, an ANN is developed to compute the w_c value. Here, inputs to the ANN are selected to be the values of states and input as well as the current time (i.e., [x₁ x₂ x₃ x₄ u t]), and the ANN output is the value of w_c.

The optimal number of hidden layers as well as the number of neurons in each hidden layer is to be optimized. As outlined earlier in the methodology, the AIC_c criterion is used to determine the ANN structure. To reduce the combinatorial complexity, the number of neurons in each hidden layer and the number of hidden layers are limited to ten and two, respectively. Then, each ANN is trained 100 times to compute the average AIC_c value with 100 different initial conditions for its hyperparameters, and the optimal ANN structure is determined by finding an ANN structure which results in the minimum average AIC_c value. Fig 6 plots the average AIC_c values for all possible ANN structures, and the AIC_c value reaches its minimum with eight and four neurons in the first and second hidden layers, respectively; hence, this particular structure is used for the subsequent analysis.

Fig 6

The average AIC_c values for different ANN structures for the first case study.

The filled circle represents the minimum average AIC_c value.

Among the 100 different ANNs with the optimal structure that have been trained in the previous step, the best ANN is chosen based on its R² statistic value. Specifically, an ANN with the highest R² value is chosen. As shown in Table 3, the R² statistics of the best ANN are sufficiently high to ensure its accuracy. Then, this ANN is coupled with the available (incorrect) first-principle model (Eq 19) to finalize the hybrid model development. In order to validate the prediction accuracy of this hybrid model, it is simulated under three input conditions and compared with the experimental measurements. As shown in Fig 7, the developed hybrid model can describe the true system dynamics fairly accurately under all the conditions, and the MSE of the model prediction is reduced to 0.00027 from 0.12 which is the MSE value of the original first-principle model (Eq 19). This result shows that the hybrid model constructed by the proposed methodology can accurately describe the system dynamics even when there is limited knowledge on the underlying system (i.e., only a model with model-system mismatches is available from the literature).

Table 3

The R² statistic values of the best ANN.

Training dataset	Validation dataset	Test dataset	Overall dataset
0.997	0.994	0.987	0.994

Fig 7

Validation of the developed hybrid model for the first case study.

Comparison between experimental measurements (empty circles) and hybrid model predictions (solid lines) with 0.5 (blue), 1 (red), and 2 (black) of TNFα.

Prediction capability of the hybrid model

The hybrid model is analyzed further in this subsection to assess whether the developed hybrid model possess the desired features of a hybrid model: intrepretability and generalized prediction capability.

First, the intrepretability of the developed hybrid model is tested by simulating and examining the temporal profiles of unmeasured states. Specifically, the dynamics of x₁, x₂, and x₄ predicted by the developed hybrid model are compared with those of the true system (Eq 16) to assess whether the hybrid model can be used in predicting unmeasured states. As shown in Fig 8, the predictions from the developed hybrid model agree well with the true system dynamics (Eq 16) and show significant improvement in the prediction accuracy compared with the available (incorrect) first principle model (Eq 19). Particularly, it is remarkable to note that the hybrid model can predict the dynamics of x₁ and x₂ quite well even though w_c is added only to x₄ for correcting its dynamics. Such improvement is possible since the hybrid model has incorporated the existing (known) interactions among the model states via the first-principle model (Eq 19).

Fig 8

Validation of the interpretability of the hybrid model.

The developed hybrid model is used to infer the dynamics of unmeasured states (i.e., (a) x₁, (b) x₂, and (c) x₄) under two input conditions: u = 0.5 (red lines) and u = 2 (black lines). The hybrid model predictions are compared with the true system dynamics (Eq 16) and the available (incorrect) first-principle model (Eq 19).

Additionally, it is of great interest to know whether the developed hybrid model has a generalized prediction capability. To this end, the hybrid model is used to predict the dynamics under three different input conditions (i.e., u = 0.7, 1.3, 1.7), and the model predictions are compared with the true system dynamics obtained from Eq 16. Here, these three particular input conditions are chosen since they lie within the input range used for training the ANN, but they are not identical to the inputs. As shown in Fig 9, the first-principle model as expected fails to capture the NFκB dynamics accurately. However, the hybrid model is capable of predicting the dynamics of x₄ fairly accurately. These results show that the hybrid model possesses the generalized prediction capability due to the incorporation of an ANN. In summary, this case study highlights advantages of using a hybrid model over a purely data-driven model or first-principle one since a hybrid model can essentially improve the overall model prediction accuracy while maintaining the model interpretability.

Fig 9

Generalized prediction capability of the developed hybrid model for the first case study.

The prediction accuracy of the hybrid model is assessed by comparing with true system dynamics when the activity of TNFα equals to (a) 0.7 and (b) 1.7. The dynamics predicted by the original first-principle model (Eq 19) are also plotted for the comparison.

Robustness of hybrid model

Additionally, since it is imperative to understand how the presence of measurement noise might impact the performance of the proposed methodology, different levels of noise are introduced by varying the distributions for μ_× and μ₊ in Eq 18 to examine the impact of the measurement noise. Specifically, a noiseless dataset (i.e., μ_× = 1 and μ₊ = 0) and that with a higher noise level ( $ln μ_{\times} \sim N (0, 0.05)$ and $μ_{+} \sim N (0, 0.05)$ ) are generated, and their corresponding hybrid models are constructed. The additional datasets as well as the original dataset are plotted in S2 Fig to illustrate their difference in the noise level. With these additional datasets, the proposed methodology is implemented to develop the corresponding hybrid models, and S3 Fig illustrates the model accuracy of the hybrid models developed based on the noiseless and more noisy datasets, respectively. MSE of the developed hybrid models based on the noiseless and more noisy measurements is 0.00015 and 0.0048, respectively, while that of the hybrid model in Fig 7 is 0.00027. Overall, regardless of the noise level in the measurements, the developed hybrid models had significantly improved prediction capability as their predicted dynamics show reasonable agreements with the measurements. To further assess the impacts of the measurement noise, the interpretability of the hybrid models is assessed by using them to predict the dynamics of unobserved states as shown in S4 and S5 Figs. Overall, all the hybrid models were able to predict the dynamics of the unobserved states with reasonable accuracy.

However, it should be noted that both in the MSE and intrepretability analysis, the oscillations in the predicted dynamics become more noticeable with a higher level of noise. Since the oscillations are not present in the true dynamics, this comparison shows that the noise level may negatively influence the accuracy of a hybrid model as well as its interpretability. Also, even the hybrid model developed from the noiseless measurements produces the dynamics with nontrivial oscillations, which indicates that the ANN might have been overfitted. In order to mitigate such a problem, the future work in this direction could incorporate the following ideas to improve the hybrid model performance. First, a de-noise technique can be implemented prior to the hybrid model development so that the noise in the measurements will become less influential. In the literature, various methods such as finite difference with polynomial spline [67], spectral transformation [68], sparse Bayesian regression [69], and neural networks [70] have been proposed. Second, alternative ANN training mechanisms can be implemented to improve the performance of an ANN. So far, an ANN is trained with the Levenberg-Marquardt algorithm through Matlab Neural Network Toolbox. Since the presence of the oscillations suggests the potential overfitting issues in the ANN training, Bayesian regularization training method [71] may be implemented to mitigate this issue.

In a systems biology study, the parameter estimation is usually performed to quantitatively calibrate an uncertain model based on available measurements. In the literature, numerous studies have proposed a wide variety of methods to efficiently estimate parameter values, and these methods have been implemented to successfully model various biological systems [2, 72, 73]. However, if there is significant model-system mismatch, the parameter estimation alone may not be enough for calibrating a model. In such a circumstance, the proposed hybrid modeling approach can be implemented. As an example, the parameter estimation is performed for this system to see whether it is enough to train the model (Eq 19). Here, the parameter estimation is preceded by global sensitivity analysis to determine a subset of parameters that are identifiable from available measurements (see [74, 75] for details). The result of the sensitivity analysis is shown in Table 4, where ϕ_D represents the sensitivity index of a parameter set. Based on the magnitude of the sensitivity index, it is determined that b₁ and b₅ are identifiable. Then, a least-squares problem is solved to estimate their values by minimizing the difference between the model predictions and the measurements (i.e., in silico measurements simulated from Eq 16). The estimated values of b₁ and b₅ are 1 and 0.30, respectively, and the accuracy of the calibrated model is assessed by comparing the model predictions with the measurements as show in S6 Fig. It is clear that the parameter estimation alone is not enough to overcome the underlying structural mismatch between the model and the true system for this signaling pathway. In summary, the parameter estimation alone may not be enough for quantitatively calibrating the model if there is a significant degree of model-system mismatch. And, the proposed hybrid modeling approach will be a valuable option to construct an accurate and physically meaningful model.

Table 4

The sensitivity analysis results.

Parameter subsets	ϕ_D
b₁	17.25
b₅	15.18
b₁, b₅	137.4

Case study 2: NFκB signaling pathway dynamics induced by LPS and BFA

While the previous case study demonstrates how the proposed methodology can be applicable to a relatively simple system and the in silico measurements, the second case study will examine how the proposed scheme can be used to develop a hybrid model for a much more complex system with real in vitro measurements. Specifically, a hybrid model is developed to describe the lipopolysaccharide (LPS)-induced NFκB signaling pathway in the presence of brefeldin A (BFA).

As briefly described in the previous case study, the NFκB signaling pathway is involved in the apoptotic signaling pathway, but it is also involved in a number of different cellular processes such as inflammation and differentiation [76, 77]. Our previous study [75] aimed to model how the NFκB signaling pathway can be activated by LPS, an endotoxin derived from gram-negative bacteria [78], in the presence of BFA. One of the major products of the signaling pathway is TNFα protein, which is a pro-inflammatory cytokine and propagates inflammatory signals to adjacent cells [79, 80]. While the LPS-induced NFκB signaling pathway is relatively well studied and has been previously modeled in the literature [81, 82], the impact of BFA on the overall signaling pathway is less known. It is suggested that BFA activates the NF-κB signaling pathway by activating another signaling pathway called the endoplasmic reticulum (ER)-stress pathway, which will subsequently initiate the NF-κB [75, 83]. Since the ER-stress pathway itself and how it activates the NFκB signaling pathway have not been fully elucidated yet [84, 85], it is very difficult to formulate an accurate mechanistic model to describe the overall signaling dynamics. In our previous study, we introduced time-varying functions to represent the effects of the BFA addition on the NFκB signaling pathway. By the introduction of new mechanisms and subsequent parameter estimation, the model accuracy was improved significantly, but there was still noticeable discrepancy between the model prediction and the measurements. Also, developing the time-varying components was also a nontrivial task, which involved further literature review and experimentation. In this study, the proposed hybrid modeling approach is to develop a hybrid model to infer the effects of the BFA addition on the dynamics of the LPS-induced NFκB signaling pathway from the measurements.

The first-principle model that will serve as the basis of the hybrid model to be developed is adopted from our previous study [75]. This model describes how LPS can induce the NFκB activation and TNFα synthesis (Fig 10) by a system of nonlinear ODEs. This model consists of 49 states and 149 parameters, and a detailed description on the model can be found in [75].

Fig 10

A schematic illustration of the LPS-induced NFκB signaling pathway.

In our previous study [75], RAW 264.7 murine macrophages were stimulated by LPS in the presence of BFA, and the dynamics of the NFκB signaling pathway were measured by flow cytometry. Specifically, we measured fold changes of TNFα and IκBα, which are two important proteins in the overall NFκB signaling pathway, under three different LPS concentrations in the presence of BFA. As a result, the model outputs are defined as the fold change of the two proteins with respect to their initial conditions:

where y₁(t) and y₂(t) are the fold changes of TNFα and IκBα concentrations, respectively, at time t. It should be noted that the measurements were taken at eight time instants, that is, t_s = [0, 10, 20, 30, 60, 120, 240, 360] minutes after addition of LPS.

w_c estimation

As outlined in the previous section, the first step in developing a hybrid model is to determine the number of correction terms needed as well as to which states these correction terms should be added. As the number of outputs for the system of interest is two (i.e., TNFα and IκBα), the dimension of x_c is also two.

Second, all possible permutations of choosing two from 49 model states are enumerated, and their corresponding digraphs are constructed to compute their corresponding relative order matrices. Based on the constructed relative order matrices, the minimum value of $\sum_{i = 1}^{n_{y}} r_{i}$ is found to be two, and Table 5 lists all the configurations whose $\sum_{i = 1}^{n_{y}} r_{i}$ values are equal to two. It should be note that r_1j and r_2j compute the relative order with respect to IκBα and TNFα measurements, respectively, in Table 5. Based on the result presented in Table 5, x₅ and x₃₇ are chosen as the best candidates to add w_c since this pair has the lowest ∑_i ∑_j≠i r_i/r_ij value. Here, x₅ and x₃₇ represent concentrations of IκBα and TNFα transcripts, respectively. Therefore, adding correction terms to these states is a reasonable choice since the predicted dynamics of a protein will become more accurate with more understanding of its transcript dynamics.

Table 5

Selection of optimal w_c locations by minimizing relative-order based criteria.

x_s1	x_s2	r₁₁	r₁₂	r₂₁	r₂₂		∑_i ∑_j≠i r_i/r_ij
5	37	1	6	6	1	2	0.33
1	37	1	6	5	1	2	0.37
3	37	1	6	5	1	2	0.37
34	37	1	6	5	1	2	0.37
2	37	1	6	4	1	2	0.417
4	37	1	6	4	1	2	0.417
5	28	1	4	6	1	2	0.417
1	28	1	4	5	1	2	0.45
3	28	1	4	5	1	2	0.45
34	28	1	4	5	1	2	0.45
2	28	1	4	4	1	2	0.50
4	28	1	4	4	1	2	0.5

With x_c = [x₁ x₃₇], the regularized least-squares problem (Eq 5) is solved to estimate W that contains the values of w_c at eight time points for each LPS concentration. Since the value of the regularization parameter α in Eq 5 is unknown beforehand, its optimal value is determined by the five-fold cross-validation. Table 6 shows the values of normalized MSE with respect to different values of α. From this result, the optimal value of α is determined to be 0.001, and the estimated values of W obtained with α = 0.001 are considered to develop a hybrid model.

Table 6

Comparison of the normalized MSE at different α values for the second case study.

α	normalized MSE
1 × 10⁻⁵	0.0275
0.0001	0.0283
0.001	0.0181
0.01	0.0187
0.1	0.0235
1	0.0310
10	0.0390

Before constructing an ANN model, the accuracy of the inferred W is assessed by comparing the experimental measurements and predictions from the imperfect model coupled with the inferred W. It should be noted that w_c is linearly interpolated to obtain its values at the time instants when the measurements are not taken. Fig 11 compares the predicted and measured TNFα and IκBα dynamics under three different LPS concentrations. Additionally, the predictions of the model coupled with the inferred W are compared with those of the imperfect model without W. Fig 11 shows that the addition of W significantly improves the model accuracy across all three LPS concentrations. Specifically, the addition of W renders the model prediction to match with trends observed in the experiments, which show the sustained low concentration of IκBα. At the same time, the addition of W helps the model predictions agree much better with the measured TNFα dynamics. Overall, these comparisons have demonstrated that the integration of W greatly improves the predictive capability of the available (imperfect) first-principle model.

Fig 11

The model prediction accuracy is improved by coupling the available (imperfect) first-principle model with the estimated w_c.

Red solid lines and blue empty circles represent simulated and measured TNFα dynamics and IκBα, respectively, in the presence of BFA under three different LPS concentrations. Blue dash lines represent the model predictions without the correction terms (w_c).

Additionally, in order to demonstrate the necessity of choosing an optimal x_c, a different x_c is chosen, and their corresponding values of W are estimated. Specifically, x_c = {2, 17} is selected, and their selection criterion values are $\sum_{i = 1}^{n_{s}} = 4$ and $\sum_{i}^{n_{s}} \sum_{j \neq i} r_{i} / r_{i j} = 1.25$ , which are much higher than those of the optimal choice of x_c. Then, the first-principle model is coupled with the estimated w_c, and its predictions are compared with the available measurements. Fig 12 illustrates that addition of w_c does not significantly improve the accuracy of the model prediction, which highlights the validity of determining the optimal set of x_c from Table 5 based on the relative-order criteria.

Fig 12

Importance of selecting an optimal x_c.

w_c are added to x_c = {2, 17} as explained in the test. Red solid lines and blue empty circles represent simulated and measured TNFα dynamics and IκBα, respectively, in the presence of BFA under three LPS concentrations. Blue dash lines represent the model predictions without the correction terms (w_c).

ANN development

With the inferred W, an ANN is developed to generalize the relationship between the first-principle model and w_c values. Specifically, inputs and output of the ANN are [x(t), t] and w_c(t), respectively.

Next, the ANN structure is optimized by minimizing the average AIC_c values. Similar to the previous case study, we will limit the numbers of hidden layers and neurons in each hidden layer to two and ten, respectively, and each ANN structure is trained 100 times. Fig 13 plots the average AIC_c value for all possible ANN structures, and the one with three and six neurons in the first and second hidden layers, respectively, is shown to be optimal since it provides the minimal average AIC_c value. Then, among 100 different trained ANNs with this particular structure, the best ANN is selected by its R² statistics. Table 7 shows the R² values of the chosen ANN, which are all above 0.98 and thus demonstrate its prediction accuracy.

Fig 13

The average AIC_c values for different ANN structures.

The filled circle represents the minimum average AIC_c value.

Table 7

The R² statistic valued of the best ANN for the second case study.

Training dataset	Validation dataset	Test dataset	Overall dataset
0.998	0.999	0.986	0.997

The developed ANN is then coupled with the available (imperfect) first principle-model to finalize the hybrid model. Fig 14 compare the predicted dynamics from the resulted hybrid model with the experimental measurements. The normalized MSE values of the hybrid model are 1.95 and 6.3 × 10⁻⁴ with respect to the TNFα and IκB measurements, respectively, while those of the original first-principle model are 12.5 and 7.0 × 10⁻³, respectively.

Fig 14

Validation of the developed hybrid model.

The IκBα dynamics predicted from the hybrid model with the developed ANN (red solid line) are compared with the measurements (blue empty circle) in the presence of BFA under the LPS concentrations of (a) 10ng/mL, (b) 50ng/mL, and (c) 250ng/mL. The TNFα dynamics predicted from the hybrid model with the developed ANN (red solid line) are compared with the measurements (blue empty circle) under the LPS concentrations of (d) 10ng/mL, (e) 50ng/mL, and (f) 250ng/mL. Blue dash lines represent the model predictions without the ANN.

This shows that the hybrid model with the developed ANN has generalized the prediction capability of the first-principle model coupled with the experimentally inferred w_c; as a result, the developed hybrid model now can be utilized to predict the dynamics of the NFκB signaling pathway in new conditions.

Although the developed hybrid model greatly improves the prediction accuracy, the model-system mismatch still remains. Specifically, under LPS = 10 ng/mL, the predicted dynamics of both TNFα and IκBα do not perfectly agree with the measurements. Specifically, the hybrid model predicts a monotonic increase in the IκBα dynamics and thus attenuation of TNFα synthesis, which indicates that the NFκB activity gradually decays during this time period. On the other hand, the measurements show that the TNFα synthesis slows down beyond 240 minutes while the IκBα level decreases after 240 minutes, which is not consistent with the model predictions. There is an additional mismatch in the IκBα dynamics under LPS = 50 ng/mL. Specifically, the hybrid model predicts an oscillatory behavior with two troughs at 60 and 240 minutes and a peak at 120 minutes while the experimental measurements indicate a monotonic increase from 60 to 240 minutes without an intermediate peak. Overall, such remaining model-system mismatches demonstrate that the BFA addition has more pronounced effects on the overall signaling dynamics after 200 minutes. This has been documented in our previous work [75], where the dynamics of IκBα in the presence of BFA deviate from those of IκBα without BFA after 200 minutes. Increasing the dimension of w_c may further improve the prediction accuracy due the increase in the degree of freedom. Alternatively, the first-principle model can be modified further by incorporating known mechanisms of the BFA-induced signaling pathways to improve the first-principle model before estimating the values of w_c. Specifically, it was suggested that the addition of BFA will elicit another signaling pathway called ER stress signaling pathway [75], which will suppress the translation of IκBα. In the future, this mechanism can be incorporated into the first-principle model to improve the accuracy of the hybrid model.

Discussion

In this work, we have presented a systematic way to construct a hybrid model that can accurately describe the dynamics of an intracellular signaling pathway even when we have partial understanding of the system. In order to simulate the dynamics of a signaling pathway of interest, prior understandings of the system are formulated into a system of nonlinear ODEs as its first-principle model. Since the first-principle model incorporates underlying mechanisms of the system, the model can be used to predict the system dynamics under new conditions and infer unmeasured model states’ dynamics once the model is properly calibrated by experiments [2–4, 86]. However, the development of such a first-principle model is nontrivial, and one of the largest bottlenecks in the model development process is lack of fundamental knowledge that may lead to the inaccurate formulation of a first-principle model.

Under such circumstances, data-driven modeling approaches such as proper orthogonal decomposition [15], subspace identification [17] and partial least squares regression [25] are commonly implemented to derive a dynamic model of a system whose corresponding first-principle model is too difficult to be formulated or is computationally too costly. Such models are advantageous to accurately derive empirical relationships between inputs and outputs. However, these models are difficult to be generalized and usually require a large amount of observations. To this end, this study proposes to use a hybrid model to improve prediction accuracy by combining the characteristics of both first-principle and data-driven modeling approaches. The available, albeit incomplete, knowledge of the system is used in a hybrid model, and the model-system mismatches are incorporated by inferring unknown components’ dynamics (i.e., W) from experimental measurements [87]. In order to construct a hybrid model systematically, the presented work proposes a sequential two-step approach. First, x_c and its dynamics are identified and estimated through the graphical approaches and solving an L2-regularized least-squares problem. Second, an ANN model is developed to correlate the available (imperfect) first-principle model with the values of w_c.

Through the proposed hybrid modeling approach, the developed hybrid model is able to posses prediction generalizability as a first-principle model does through the incorporation of the first-principle model coupled with an ANN. That is, a hybrid model can accurately predict the unmeasured states’ dynamics, and it can also be used to predict the system dynamics under a new operating condition as shown in two cases studies. At the same time, the hybrid model is likely to have more accurate predictions by inferring and incorporating the dynamics of components (i.e., w_c), which are missing in the first-principle model [29]. Such representations of the hidden components with an empirical function such as an ANN is particularly attractive when mechanistic understandings are completely lacking or only partially known, which may result in highly uncertain mechanistic equations with unreliable model predictions.

Acknowledgements

Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.

References

EKlipp, WLiebermeister. Mathematical modeling of intracellular signaling pathways. BMC Neuroscience. 2006;7:S10 10.1186/1471-2202-7-S1-S10

AJayaraman, JHahn. Methods in bioengineering: systems analysis of biological networks. Boston: Artech House; 2009.

HKitano. Systems biology: a brief overview. Science. 2002;295:1662–1664. 10.1126/science.1069492

DLee, AJayaraman, JSKwon. Identification of a time-varying intracellular signaling model through data clustering and parameter selection: Application to NFκB signaling pathway induced by LPS in the presence of BFA. IET Systems Biology. 2019;13:169–179.

PMendes, DKell. Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics. 1998;14:869–883. 10.1093/bioinformatics/14.10.869

AYazdani, MRaissi, GEKarniadakis. Systems biology informed deep learning for inferring parameters and hidden dynamics. bioRxiv. 2019;865063.

MSFBangi, SKwon. Deep hybrid modeling of chemical process: Application to hydraulic fracturing. Computers & Chemical Engineering. 2020;134:106696 10.1016/j.compchemeng.2019.106696

Mvon Stosch, JPeres, EFde Azevedo, ROliveira. Modeling Biochemical Networks with Intrinsic Time Delays: A Hybrid Semi-parameteric Approach. BMC Systems Biology. 2010;4:131 10.1186/1752-0509-4-131

KGGadkar, RGunawan, FJDoyle. Iterative approach to model identification of biological networks. BMC bioinformatics. 2005;6:155 10.1186/1471-2105-6-155

APTeixeira, NCarinhas, Jao M L Dias, PCruz, PMAlves, MJTCarrondo, et al Hybrid semi-parametric mathematical systems: Bridging the gap between systems biology and process engineering. Journal of Biotechnology. 2007;132:418–425. 10.1016/j.jbiotec.2007.08.020

JPPerley, JMikolajczak, MLHarrison, GTBuzzard, AERundell. Multiple model-informed open-loop control of uncertain intracellular signaling dynamics. PLoS Computational Biology. 2014;10:e1003546 10.1371/journal.pcbi.1003546

DHenriques, AFVillaverde, MRocha, JSaez-Rodriguez, JRBanga. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Computational Biology. 2017;13:e1005379 10.1371/journal.pcbi.1005379

Mvon Stosch, NCarinhas, ROliveira. Hybrid Modeling for Systems Biology: Theory and Practice In: PBBenner, RFindeisen, DFlockerzi, UReichl, KSundmacher, editors. Large-Scale Networks in Engineering and Life Sciences. Switzerland: Birkhäuser; 2014 p. 367–388.

JFPollard, MRBroussard, DBGarrison, KYSan. Process identification using neural networks. Computers & Chemical Engineering. 1992;16:253–270. 10.1016/0098-1354(92)80046-C

ANarasingam, PSiddhamshetty, JSKwon. Temporal clustering for order reduction of nonlinear parabolic PDE systems with time-dependent spatial domains: Application to a hydraulic fracturing process. AIChE Journal. 2017;63:3818–3831. 10.1002/aic.15733

ANarasingam, JSKwon. Development of local dynamic mode decomposition with control: Application to model predictive control of hydraulic fracturing. Computers & Chemical Engineering. 2017;106:501–511. 10.1016/j.compchemeng.2017.07.002

AGarg, PMhaskar. Subspace Identification-Based Modeling and Control of Batch Particulate Processes. Industrial & Engineering Chemistry Research. 2017;56:7491–7502. 10.1021/acs.iecr.7b00682

MLThompson, MAKramer. Modeling chemical processes using prior knowledge and neural networks. AIChE Journal. 1994;40:1328–1340. 10.1002/aic.690400806

Mvon Stosch, ROliveira, JPeres, EFde Azevedo, ROliveira. Hybrid semi-parametric modeling in process systems engineering: Past, present and future. Computers & Chemical Engineering. 2014;60:86–101. 10.1016/j.compchemeng.2013.08.008

ARaue, CKreutz, TMaiwald, JBachmann, MSchilling, UKlingmüller, et al Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25:1923–1929. 10.1093/bioinformatics/btp358

CKravaris, JHahn, YChu. Advances and selected recent developments in state and parameter estimation. Computers & Chemical Engineering. 2013;51:111–123. 10.1016/j.compchemeng.2012.06.001

DCPsichogios, LHUngar. A hybrid neural network-First principles approach to process modeling. AIChE Journal. 1992;38:1499–1511. 10.1002/aic.690381003

CATiemann, JVanlier, MHOosterveer, AKGroen, PAJHilbers, NAWvan Riel. Parameter trajectory analysis to identify treatment effects of pharmacological interventions. PLoS Computational Biology. 2013;9:e1003166 10.1371/journal.pcbi.1003166

JSchubert, RSimutis, MDors, IHavlik, ALübbert. Hybrid modelling of yeast production processes—combination of a priori knowledge on different levels of sophistication. Chemical Engineering & Technology. 1994;14:10–20. 10.1002/ceat.270170103

JPinto, CRde Azevedo, ROliveira, Mvon Stosch. A Bootstrap-aggregated Hybrid Semi-parametric Modeling Framework for Bioprocess Development. Bioprocess and Biosystems Engineering. 2019;42:1853–1865. 10.1007/s00449-019-02181-y

DChaffart, LARicardez-Sandoval. Optimization and control of a thin film growth process: A hybrid first principles/artificial neural network based multiscale modelling approach. Computers & Chemical Engineering. 2018;119:465–479. 10.1016/j.compchemeng.2018.08.029

GHu, ZMao, DHe, FYang. Hybrid modeling for the prediction of leaching rate in leaching process based onnegative correlation learning bagging ensemble algorithm. Computers & Chemical Engineering. 2011;35:2611–2617. 10.1016/j.compchemeng.2011.02.012

BSun, CYang, YWang, WGui, ICraig, LOlivier. A comprehensive hybrid first principles/machine learning modeling framework for complex industrial processes. Journal of Process Control. 2020;86:30–43. 10.1016/j.jprocont.2019.11.012

FHamilton, ALLloyd, KBFlores. Hybrid modeling and prediction of dynamical systems. PLoS Computational Biology. 2017;13:e1005655 10.1371/journal.pcbi.1005655

JLagergren, AReeder, FHamilton, RCSmith, KBFlores. Forecasting and Uncertainty Quantification Using a Hybrid of Mechanistic and Non-mechanistic Models for an Age-Structured Population Model. Bulletin of Mathematical Biology. 2018;80:1578–1595. 10.1007/s11538-018-0421-7

BEngelhardt, HFröhlich, MKschischo. Learning (from) the errors of a systems biology approach. Scientific Reports. 2016;6:20772 10.1038/srep20772

BEngelhardt, MKschischo, HFröhlich. A Bayesian approach to estimating hidden variables as well as missing and wrong molecular interactions in ordinary differential equation-based mathematical models. Journal of the Royal Society of Interface. 2017;14:20170332 10.1098/rsif.2017.0332

Bampou D. Polynomial approximations for infinite-dimensional optimization problems [PhD Thesis]. Imperial College London; 2012.

DLee, AJayaraman, JSKwon. Identification of cell-to-cell heterogeneity through systems engineering approaches. AIChE Journal. 2020;66:e16925 10.1002/aic.16925

CKravaris, JHSeinfeld. Identification of parameters in distributed parameter systems by regularization. SIAM Journal on Control and Optimization. 1985;23(2):217–241. 10.1137/0323017

ANarasingam, PSiddhamshetty, JSKwon. Handling spatial heterogeneity in reservoir parameters using proper orthogonal decomposition based ensemble Kalman filter for model-based feedback control of hydraulic fracturing. Industrial & Engineering Chemistry Research. 2018;57:3977–3989. 10.1021/acs.iecr.7b04927

LBansal, YChu, CLaird, JHahn. Regularization of Inverse Problems to Determine Transcription Factor Profiles from Fluorescent Reporter Systems. AIChE Journal. 2012;58:3751–3762. 10.1002/aic.13782

DPHowsmon, JHahn. Regularization techniques to overcome overparameterization of complex biochemical reaction networks. IEEE life sciences letters. 2016;2:31–34. 10.1109/LLS.2016.2646498

HSSidhu, ANarasingam, PSiddhamshetty, JSKwon. Model order reduction of nonlinear parabolic PDE systems with moving boundaries using sparse proper orthogonal decomposition: Application to hydraulic fracturing. Computers & Chemical Engineering. 2018;112:9–100.

RMHirschorn. lnvertibility of multivariable nonlinear control systems. IEEE Transactions on Automatic Control. 1979;24:855–865. 10.1109/TAC.1979.1102181

PDaoutidis, CKravaris. Inversion and zero dynamics in nonlinear multivariable control. AIChE Journal. 1991;37:527–538. 10.1002/aic.690370406

DKahl, PWendland, MNeidhardt, AWeber, MKschischo. Structural invertibility and optimal sensor node placement for error and input reconstruction in dynamic systems. Physical Review X. 2019;9:041046 10.1103/PhysRevX.9.041046

PDaoutidis, CKravaris. Structural evaluation of control configurations for multivariable nonlinear processes. Chemical Engineering Science. 1992;47:1091–1107. 10.1016/0009-2509(92)80234-4

DClaude. Commande non interactive simple des systèmes non linéaires par bouclages statiques réguliers. C R Acad Sc Paris. 1986;303:833.

BLee, YKim, DShin, ESYoon. A study on the evaluation of structural controllability in chemical processes. Chemical Engineering Science. 2001;25:85–95.

MSoroush. Evaluation of achievable control quality in nonlinear processes. Computers & Chemical Engineering. 1996;20:357–364. 10.1016/0098-1354(95)00000-3

DLee, ASingla, HWu, JSKwon. An integrated numerical and experimental framework for modeling of CTB and GD1b ganglioside binding kinetics. AIChE Journal. 2018;64:3882–3893. 10.1002/aic.16209

SHeo, PDaoutidis. Control-relevant decomposition of process networks via optimization-based hierarchical clustering. AIChE Journal. 2016;62:3177–3188. 10.1002/aic.15323

MSchmidt, HLipson. Distilling free-form natural laws from experimental data. Science. 2009;324:81–85. 10.1126/science.1165893

SLBrunton, JLProctor, JNKutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences. 2016;113:3932–3937. 10.1073/pnas.1517384113

ANarasingam, JSKwon. Data-driven identification of interpretable reduced-order models using sparse regression. AIChE Journal. 2018;119:101–111.

BBhadriraju, ANarasingam, JSKwon. Machine learning-based adaptive model identification of systems: Application to a chemical process. Chemical Engineering Research and Design. 2019;152:372–383. 10.1016/j.cherd.2019.09.009

ANarasingam, JSKwon. Koopman Lyapunov-based model predictive control of nonlinear chemical process systems. AIChE Journal. 2019;65:e16743 10.1002/aic.16743

SMUdrescu, MTegmark. AI Feynman: A physics-inspired method for symbolic regression. Science Advance. 2020;6:eaay2631.

GCybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems. 1989;2:303–314. 10.1007/BF02551274

DMHimmelblau. Applications of artificial neural networks in chemical engineering. Korean Journal of Chemical Engineering. 2000;17:373–392. 10.1007/BF02706848

HAkaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19:716–723. 10.1109/TAC.1974.1100705

CMHurvich, CLTsai. Regression and time series model selection in small samples. Biometrika. 1989;76:297–307. 10.1093/biomet/76.2.297

SSteinmeyer, DPHowsmon, RCAlaniz, JHahn, AJayaraman. Empirical modeling of T cell activation predicts interplay of host cytokines and bacterial indole. Biotechnology and Bioengineering. 2017;114:2660–2667. 10.1002/bit.26371

Lee D, Jayaraman A, Kwon JS. Identification of heterogeneous parameters in an intracellular reaction network from population snapshot measurements through sensitivity analysis and neural network. In: Proceedings of 8th Conference on Foundations of Systems Biology in Engineering FOSBE 2019; 2019. p. 107–112.

MChaves, TEissing, FAllgöwer. Bistable biological systems: A characterization through local compact input-to-state stability. IEEE Transactions on Automatic Control. 2008;53:87–100. 10.1109/TAC.2007.911328

Chaves M, Eissing T, Allgöwer F. Identifying mechanisms for bistability in an apoptosis network. In: Proceedings of Réseaux d‘interaction: analyse, modélisation et simulation; 2006. p. 6.

CFurusawa, TSuzuki, AKashiwagi, TYomo, KKaneko. Ubiquity of log-normal distributions in intra-cellular reaction dynamics. Biophysics. 2005;1:25–31. 10.2142/biophysics.1.25

JHasenauer, SWaldherr, MDoszczak, PScheurich, NRadde, FAllgöwer. Analysis of heterogeneous cell populations: A density-based modeling and identification framework. Journal of Process Control. 2011;21:1417–1425. 10.1016/j.jprocont.2011.06.020

CKreutz, MMBRodriguez, TMaiwald, MSeidl, HEBlum, LMohr, et al An error model for protein quantification. Bioinformatics. 2007;23:2747–2753. 10.1093/bioinformatics/btm397

ELimpert, WAStahel, MAbbt. Log-normal distributions across the sciences: keys and clues: on the charms of statistics, and how mechanical models resembling gambling machines offer a link to a handy way to characterize log-normal distributions, which can provide deeper insight into variability and probability—normal or log-normal: that is the question. BioScience. 2001;51:341–352.

SHRudy, SLBrunton, JLProctor, JNKutz. Data-driven discovery of partial differential equations. Science Advances. 2017;3:e160261 10.1126/sciadv.1602614

HSchaeffer. Learning partial differential equations via data discovery and sparse optimization. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2017;473:20160446 10.1098/rspa.2016.0446

SZhang, GLin. Robust data-driven discovery of governing physical laws with error bars. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2018;474:20180305 10.1098/rspa.2018.0305

JHLagergren, JTNardini, GMLavigne, EMRutter, KBFlores. Learning partial differential equations for biological transport models from noisy spatio-temporal data. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2020;476:20190800 10.1098/rspa.2019.0800

DJCMcKay. Bayesian interpolation. Neural computation. 1992;4:415–447. 10.1162/neco.1992.4.3.415

MAshyraliyev, YFomekong-Nanfack, JAKaandorp, JGBlom. Systems biology: parameter estimation for biochemical models. FEBS Journal. 2009;276:886–902. 10.1111/j.1742-4658.2008.06844.x

CGMoles, PMendes, JRBanga. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome research. 2003;13:2467–2474. 10.1101/gr.1262503

YChu, JHahn. Parameter set selection for estimation of nonlinear dynamic systems. AIChE Journal. 2007;53:2858–2870. 10.1002/aic.11295

DLee, YDing, AJayaraman, JSKwon. Mathematical modeling and parameter estimation of intracellular signaling pathway: Application to LPS-induced NFκB activation and TNFα production in macrophages. Processes. 2018;6:21 10.3390/pr6030021

AOeckinghaus, SGhosh. The NF-κB family of transcription factors and its regulation. Cold Spring Harbor Perspectives in Biology. 2009;1:a000034 10.1101/cshperspect.a000034

SMitchell, JVargas, AHoffmann. Signaling via the NFκB system. WIREs Systems Biology and Medicine. 2016;8:227–241. 10.1002/wsbm.1331

RRajaiah, DJPerkins, DDCIreland, SNVogel. CD14 dependence of TLR4 endocytosis and TRIF signaling displays ligand specificity and is dissociable in endotoxin tolerance. Proceedings of the National Academy of Sciences. 2013;112(27):8391–8396. 10.1073/pnas.1424980112

NParameswaran, SPatial. Tumor necrosis factor-α signaling in macrophages. Critical Reviews in Eukaryotic Gene Expression. 2010;20:87–103. 10.1615/CritRevEukarGeneExpr.v20.i2.10

SMaiti, WDai, RCAlaniz, JHahn, AJayaraman. Mathematical modeling of pro- and anti-inflammatory signaling in macrophages. Processes. 2015;3:1–18. 10.3390/pr3010001

ZCheng, BTaylor, DROurthiague, AHoffmann. Distinct single-cell signaling characteristics are conferred by the MyD88 and TRIF pathways during TLR4 activation. Science Signaling. 2015;8:ra69 10.1126/scisignal.aaa5208

RAKellogg, CTian, TLipniacki, SRQuake, STay. Digital signaling decouples activation probability and population heterogeneity. eLife. 2015;4:e08931 10.7554/eLife.08931

HLPahl, PABaeuerle. A novel signal transduction pathway from the endoplasmic reticulum to the nucleus is mediated by transcription factor NF-κB. EMBO Journal. 1995;14:2580–2588. 10.1002/j.1460-2075.1995.tb07256.x

DRDiedrichs, JAGomez, CSHuang, DTRutkowski, RCurtu. A data-entrained computational model for testing the regulatory logic of the vertebrate unfolded protein response. Molecular Biology of the Cell. 2018;29:1502–1517. 10.1091/mbc.E17-09-0565

KErguler, MPieri, CDeltas. A mathematical model of the unfolded protein stress response reveals the decision mechanism for recovery, adaptation and apoptosis. BMC Systems Biology. 2013;7:1–18. 10.1186/1752-0509-7-16

DLee, AMohr, JSKwon, HJWu. Kinetic Monte Carlo modeling of multivalent binding of CTB proteins with GM1 receptors. Computers & Chemical Engineering. 2018;118:283–295. 10.1016/j.compchemeng.2018.08.011

RBellazzi, RGuglielmann, LIroni, CPatrini. A Hybrid Input-Output Approach to Model Metabolic Systems: An Application to Intracellular Thiamine Kinetics. Journal of Biomedical Informatics. 2001;34:221–248. 10.1006/jbin.2001.1022

Introduction

Methods

System description

Estimation of w(t)

Proposed methodology for estimating w from measurements

Selection of xc

A graphical approach to evaluate relative orders

Development of artificial neural network models

Model selection and training

Results

Case study 1: TNFα signaling pathway

In silico measurements

Available first-principle model

Hybrid model development

Prediction capability of the hybrid model

Robustness of hybrid model

Case study 2: NFκB signaling pathway dynamics induced by LPS and BFA

wc estimation

ANN development

Discussion

Acknowledgements

References

Selection of x_c

w_c estimation