The authors have declared that no competing interests exist.
¤Current address: Department of Biomedical Engineering, Duke University, Durham, North Carolina, USA
Developing an accurate first-principle model is an important step in employing systems biology approaches to analyze an intracellular signaling pathway. However, an accurate first-principle model is difficult to be developed since it requires in-depth mechanistic understandings of the signaling pathway. Since underlying mechanisms such as the reaction network structure are not fully understood, significant discrepancy exists between predicted and actual signaling dynamics. Motivated by these considerations, this work proposes a hybrid modeling approach that combines a first-principle model and an artificial neural network (ANN) model so that predictions of the hybrid model surpass those of the original model. First, the proposed approach determines an optimal subset of model states whose dynamics should be corrected by the ANN by examining the correlation between each state and outputs through relative order. Second, an L2-regularized least-squares problem is solved to infer values of the correction terms that are necessary to minimize the discrepancy between the model predictions and available measurements. Third, an ANN is developed to generalize relationships between the values of the correction terms and the system dynamics. Lastly, the original first-principle model is coupled with the developed ANN to finalize the hybrid model development so that the model will possess generalized prediction capabilities while retaining the model interpretability. We have successfully validated the proposed methodology with two case studies, simplified apoptosis and lipopolysaccharide-induced NFκB signaling pathways, to develop hybrid models with in silico and in vitro measurements, respectively.
An intracellular signaling pathway is often represented by a set of nonlinear ordinary differential equations, which translate our current knowledge about the signaling pathway into a testable mathematical model. However, predictions from such models are often subject to high uncertainty since many signaling pathways are only partially known beforehand. In this study, we propose a systematic approach to develop a hybrid model to improve model accuracy by combining machine learning and the first-principle modeling. Specifically, model correction terms are learned from discrepancy between model predictions and measurements, and these terms are added to the first-principle model to enhance the prediction accuracy. Once these correction terms are learned from the data, an artificial neural network (ANN) model is developed to find an empirical relation between the model and the correction terms so that the developed ANN can be used to posses improved predictive capabilities even in new operating conditions (i.e., generalizability). The final hybrid model is then constructed by coupling the first-principle model with the developed ANN.
An intracellular signaling pathway is a biochemical reaction network of cells to adjust their metabolism, gene expression, and other necessary actions so that the cells can appropriately respond to perturbations present in their environment [1, 2]. Since an intracellular signaling pathway is complex involving interactions among a large number of biomolecules inside a cell, it is common to implement a systems biology approach, which integrates experimental observations and mathematical modeling, to analyze the signaling pathway comprehensively [3, 4]. As a result, one of the key tasks in systems biology is to develop a predictive mathematical model for analyzing underlying mechanisms and generating new hypotheses to be tested in the future. To this end, a first-principle based mechanistic model has been a preferred choice for modeling an intracellular signaling pathway. Specifically, a system of nonlinear ordinary differential equations (ODEs) is constructed based on the current knowledge about a system, where its differential equations are derived using kinetic laws such as mass-action and Michaelis-Menten kinetics [4–6]. Since a first-principle model represents the current understandings of a system, its ODEs are physically meaningful, and its predictions are valid over a wide range of conditions [7, 8]. However, since a signaling pathway of interest is often only partially understood due to its inherent complexity, structural mismatches often exist between dynamics of the true system and those predicted by the corresponding first-principle model [4, 8–12].
Instead of a first-principle model, a data-driven model can be developed from available experimental measurements, which can describe the input-output dynamics adequately even when mechanistic understanding of the system is limited [13–17]. However, a data-driven model has narrow applicability as it is tailored to describe input-output relationships contained in the training datasets [18, 19]. Also, available measurements of an intracellular signaling pathway are often limited both in quantity and quality, which may further limit the direct application of a data-driven approach [20, 21].
As an alternative, a hybrid modeling approach that combines first-principle and data-driven modeling techniques has been proposed to describe a process that is only partially known [19]. Here, a hybrid model refers to an improved version of a first-principle model that compensates for model-system mismatches by introducing empirical parameters or terms, which are inferred from available measurements. As a result, a hybrid model has better prediction capabilities than a first-principle one while it preserves generalizability and interpretability, which are difficult to be achieved through a data-driven model [18, 22, 23]. A classic example of the hybrid modeling approach is to model a fedbatch bioreactor, where the biomass growth rate is estimated from process data and coupled with mass conversation laws [18, 22]. In these studies, the mass conservation laws represent our prior knowledge of the system (i.e., the first-principle model), whereas its growth rate is uncertain and inferred from the measurements to improve the model’s prediction accuracy. Due to its merits, the hybrid modeling approach has been implemented to model various biological and chemical processes such as bioprocess development and optimization [24, 25], modeling propagation of fractures during hydraulic fracturing process [7], transcription factor dynamics [8], thin film deposition [26], metallurgic processes [27, 28], and flour beetles population dynamics [29, 30].
In this work, we propose a systematic hybrid modeling approach for an intracellular signaling pathway by incorporating available mechanistic knowledge and experimental measurements. Compared to previous implementation of hybrid models, unique challenges arise when a hybrid model is developed for an intracellular signaling pathway. First, a first-principle model of an intracellular signaling pathway is usually high-dimensional with a large number of states while the amount of available measurements is usually limited. Hence, it is desirable to minimize the number of components in a hybrid model that should be inferred from experimental measurements to minimize the possibility of overfitting, which may compromise the generalizability of the hybrid model. At the same time, it is usually unknown beforehand which parts of a hybrid model should be represented by a data-driven model. In modeling a bioreactor, it is known that the largest uncertainty resides in the cell growth rate, and it is inferred from experimental measurements [18, 22]. However, such knowledge is usually not available for an intracellular signaling pathway [4, 21].
Motivated by the above considerations, we propose a systematic approach to construct a hybrid model to describe the dynamics of an intracellular signaling pathway, which is only partially known. Among a few possible hybrid model structures, a hybrid model formulation proposed by Engelhardt et al. [31, 32] is adopted, where differential equations of model states are adjusted by correction terms inferred from experiments. Since an intracellular signaling pathway is often high-dimensional and its origin of prediction inaccuracy is unknown beforehand, a graphical approach is implemented to determine a subset of model states that have the highest correlations with the measurements. And only these states’ dynamics are modified by the correction terms. Specifically, the values of the correction terms are estimated at the times when the measurements are available so that the model with the estimated correction terms can reproduce the measurements accurately. Then, an empirical map between the first-principle model and the correction terms is approximated by an artificial neural network (ANN). Once an ANN is trained, the hybrid model now can be constructed by integrating the first-principle model and the ANN. The effectiveness and feasibility of the proposed methodology are demonstrated by developing hybrid models of intracellular signaling pathways for two case studies.
Dynamics of an intracellular signaling pathway are described by a system of nonlinear ODEs as follows:

Such ODE models for an intracellular signaling pathway are formulated based on the current understandings of the underlying signaling pathway. Hence, the accuracy of an ODE model depends on the accuracy and completeness of the prior knowledge. Unfortunately, an intracellular signaling pathway is quite complex, which involves interactions among a large number of intracellular biomolecules. As a result, it is likely to have a model-system mismatch, which prevents from utilizing the model for system analysis and prediction.
Under this circumstance, the following hybrid model can be used to minimize potential model-system mismatches while preserving the available knowledge of the system [31, 32]:

In order for Eq 2 to properly predict true dynamics of the system, the values of w need to be explicitly known at any arbitrary time instants so that the hybrid model (Eq 2) can be numerically integrated. Therefore, w values are inferred from available experimental measurements. On the other hand, even with experimentally inferred w(t), the hybrid model cannot be used to predict the system dynamics under a new condition since the corresponding temporal profile of w are not available. Hence, this study aims to develop a functional map
In summary, this study aims to develop a hybrid model by the following two subsequent steps:
Infer w(t) from available experimental measurements.
Develop the function,
Suppose that measurements are obtained at Nt discrete time instants (i.e.,



However, Eq 3 is likely to be ill-conditioned because (1) it is an infinite dimensional problem in which the decision variables (i.e., w(t)) are continuous temporal profiles, and (2) the available measurements are limited in quantity (i.e., small values of ny and Nt) [33–36]. Hence, its solution is subject to high uncertainties, and the resulted hybrid model based on the estimated w(t) will be difficult to be generalized for future predictions.
In order to address the aforementioned ill-posedness, the following assumptions are made to reduce the dimension of the estimation problem. First, a L2-regularized least-squares problem is solved to estimate w by handling potential overfitting issues [37–39]. Second, this study aims to infer the values of w at the time instants only when the measurements are available (i.e., ts). Third, instead of adding correction terms to all the model states, correction terms are given to only a subset of states, which will be denoted as

With these assumptions, the estimation of w(t) is reformulated as follows:






In Eq 5, the optimal value of α is unknown beforehand, so its value is optimized by five-fold cross-validation. Specifically, the available measurements
A key step before solving Eq 5 is to identify xc, whose trajectories are corrected by wc. Specifically, two questions need to be addressed: first, what is the dimension of xc (i.e., ns), and second, when the value of ns is known, which states in x should be selected to form xc. In this study, we employ the idea of invertibility and a graph-theoretical approach to determine xc.
Specifically, we choose xc from x so that the resulted system is close to be invertible [40]. If a system is invertible, for a given value of x0, unique values of y will correspond to unique values of inputs, so one could reconstruct the values of inputs from available output measurements [41, 42]. If wc in the hybrid model (Eqs 5b and 5c) is viewed as an input to the system and the hybrid model is invertible, the values of wc can be uniquely characterized from given measurements [41, 43]. Hence, it is our best interest to select the dimension of wc as well as its placement so that the resulted hybrid model is invertible, which will attenuate the ill-posedness of the inverse problem (i.e., Eq 5). In this regard, Daoutidis and Kravaris [41] have shown that a dynamic system is invertible when the following matrix is nonsingular:




In this study, xc is chosen so that C(x) of Eqs 5b and 5c is nonsingular, which will minimize the ill-posedness of the inverse problem (Eq 5). First, we let the size of xc equal to the size of outputs (i.e., ns = ny) so that C(x) is square. Second, an optimal choice of xc is identified systematically from x. In this regard, this study will assess the optimality of xc through the following criteria that are based on the relative order:
The value of
The value of
Previous studies have demonstrated that the relative order measures ‘physical closeness’ between a correction term and an output [43, 45]: a smaller value of ri represents a stronger connection between wc and yi. So, the first criterion renders wc to have the maximum correlations with the outputs. On the other hand, the second criterion is to render each wc,j will have the strongest correlation with only one output while having the weakest correlation with the remaining outputs [45, 46]. Consequently, by meeting the above two criteria, each correction term in wc will have the strongest correlation with only one correction term, which will minimize the likelihood of the ill-posedness of Eq 5 [47].
In the perspective of the invertibility, selecting xc based on the above two criteria, particularly the second one, maximizes the likelihood of C(x) to be nonsingular. To understand this point more clearly, the outputs and xc are re-ordered so that the smallest element in each row is diagonally located in the relative order matrix, and C(x) is also rearranged correspondingly. By minimizing the second criterion, the possibility that values of all rij, ∀j ≠ i, are larger than ri is maximized; therefore, the non-diagonal entries in the rearranged C(x) are likely to be zero. As a result, C(x) will be close to be a square diagonal matrix. Thus, achieving the aforementioned two criteria would guarantee the hybrid model (Eqs 5b and 5c) to be invertible as well as the reliability of the solution to Eq 5.
In order to select a combination based on the above criteria, the following steps are taken:
Enumerate all ny permutations of x.
For each candidate, construct the corresponding relative order matrix, and compute its
Compute their ∑i ∑j≠i ri/rij values.
Find the candidate that satisfies the condition.
For implementing the above procedure, a relative order matrix has to be constructed for each candidate to calculate the two criteria. However, performing Lie differentiation can be computationally expensive; hence, a graphical approach is implemented to evaluate relative orders of a system [43], which will be discussed in the next section.
A state-space model of a process (Eqs 5b and 5c) can be represented by a directed graph, which is defined by a set of vertices and a set of edges by the following rules [43, 48]:
States (
If ∂fi(x)/∂xj ≠ 0, i, j = 1, …, nx, there is a unidirectional edge pointing from the vertex of xj to that of xi.
If ∂fi(x)/∂wc,k ≠ 0, k = 1, …, ns, there is a unidirectional edge pointing from the vertex of wc,k to that of xi.
If ∂yl/∂xj ≠ 0, l = 1, …, ny, there is a unidirectional edge pointing from the vertex of xj to that of yl.
A path from one vertex to another is a sequence of edges without repeating vertices, and the path length is the number of edges included in one particular path.
Previously, Daoutidis and Kravaris [43] demonstrated that rij can be calculated by computing the shortest path length from an input wc,j to an output yi as follows:




Schematic illustration of the directional graph of a state-space model.
In summary, the procedures for selecting xc, whose dynamics will be corrected by wc, are as follows:
Set ns, the size of xc, to be equal to the dimension of outputs (i.e., ny).
Enumerate all ns permutations of x as candidates for xc.
Construct a directional graph by adding correction terms to each xc candidate enumerated in the previous step.
Construct the corresponding relative order matrix based on Eq 10.
Find a configuration that has the lowest
It should be noted that the identification of xc by the relative order and graph theory can also guide the future model refinement. Specifically, since the identified xc has the highest correlations with the output, further literature survey and experimentation on these states can be implemented to improve the differential equations for these states and thus increase the overall prediction accuracy of the first-principle model.
Once W is estimated by solving Eq 5, the available (imperfect) first-principle model coupled with the estimated W is now able to predict the system dynamics under the experimental measurements more accurately. However, it cannot predict system dynamics under a new operating condition since its corresponding wc(t) is not available. Hence, the next step is to infer
However, the functional form of
An ANN consists of an input layer, multiple hidden layers, and an output layer. Specifically, each layer contains multiple neurons, and each neuron in each layer is fully connected to all the neurons in the next layer (Fig 2). In each hidden layer, the following hyperbolic tangent sigmoid transfer function is used:



Schematic illustration of an ANN model.
On the other hand, the ANN outputs are computed as follows:

The goal of the ANN training is to estimate hyperparameters of an ANN model, which include α, b, β, and c in Eqs 12 and 13 from available datasets. Here, the datasets include ANN input and output datasets, where the ANN inputs refer to t and x(t) from simulating the original first-principle model (Eq 1) while the ANN outputs refer to w estimated from solving Eq 5. All the ANN training sessions in this study are performed in the MATLAB Neural Network Toolbox with the Levenberg-Marquardt algorithm.
Since the structure of an ANN (i.e., the numbers of hidden layers and neurons in each hidden layer) is unknown beforehand, a number of ANN models with different structures are trained and compared to find the best one through evaluating their average corrected Akaike information criterion (AICc) [57, 58]. For a model with p number of parameters, AICc can be calculated as follows:

To find an optimal structure, datasets are randomly partitioned into the training, testing, and validation sets with a 70:15:15 ratio 100 times, and ANN models with different structures are trained 100 times to compute their average AICc [59, 60]. Then, the ANN structure with the minimum AICc is selected as the optimal one.
Once an ANN is developed, the final mathematical form of a hybrid model can be described as follows:

In this section, two case studies are presented to demonstrate how the proposed methodology can be implemented to develop a hybrid model of an intracellular signaling pathway.
In this case study, the proposed scheme is first used to construct a hybrid model to describe a tumor necrosis factor-α (TNFα) signaling pathway, which is illustrated in Fig 3a. Specifically, this system describes how TNFα, which is an important inflammatory cytokine, can initiate apoptotic and nuclear factor-κB (NFκB) signaling pathways as well as crosstalks between these two pathways. In this system, the apoptotic signaling pathway is described by dynamics of caspase 3 (C3a) and caspase 8 (C8a), where C3a is a protein, whose high activity leads to apoptosis, and TNFα-activated C8a increases the C3a activity. On the other hand, TNFα activates NFκB protein by suppressing inhibitor of NFκB (IκB), which inhibits the NFκB activity. Since the NFκB activation in turn increases the IκB activity, the NFκB activity will naturally decay over time. Furthermore, the NFκB activation suppresses the C3a and C8a activities and thus promotes cellular survival, while the increase in the apoptotic signaling pathway lowers the NFκB activity. More details on this system can be found in [61, 62].


Schematic diagrams of the simplified NFκB signaling pathway models.
(a) The correct model that is used for generating in silico experimental measurements (adopted from Chaves et al. [61]). (b) The incorrect model that is used as the first-principle model for constructing a hybrid model. (c) A graph representation of the available (incorrect) first-principle model (Eq 19).
The TNFα signaling pathway represented in Fig 3a can be described by the following mathematical model [61, 62]:



| Parameter | Value | Parameter | Value |
|---|---|---|---|
| a1 | 0.6 | b1 | 0.4 |
| a2 | 0.2 | b2 | 0.7 |
| a3 | 0.2 | b3 | 0.3 |
| a4 | 0.5 | b4 | 0.5 |
| b5 | 0.4 |
In this case study, the model (Eq 16) is considered as the true system, and it is used to generate in silico experimental measurements. Specifically, the NFκB activity (i.e., x3) is measured every hour from 0 to 14 hours under three different input conditions (i.e., u = 0.5, 1, 2). It should be noted that the value of u is fixed while generating the in silico measurements. Also, experimental noise is introduced as follows:

On the other hand, we assume that the system is only partially understood, and Fig 3b represents the current understanding of the system, which is described by the following ODE model:



System-model mismatches in the TNFα signaling pathway model.
Comparison between experimental measurements (empty circles) and model predictions (solid lines) with 0.5 (blue), 1 (red), and 2 (black) of TNFα. The measurements are generated by simulating Eq 16 while the model predictions are obtained from Eq 19.
The first step of the proposed methodology is to determine a set of states whose trajectories need to be corrected by the addition of wc. Since there is only one output (i.e., x3), the number of states to be corrected by correction terms is one as described earlier. Then, a graphical approach is implemented to determine which state should be corrected by wc. Fig 3c is a graphical representation of the first-principle model (Eq 19) to visualize the interconnections among the states. Since the number of the correction term to be added is determined to be one, the correction term can be added to one of x1, x2, and x4 as shown in S1 Fig. In Fig 3, it is clear that there is only one directed edge pointing to the output (x3), which stems from x4. Consequently, the only feasible configuration for the correction term in this system is the first one in S1 Fig, where the correction term is placed to adjust the dynamics of x4 and eventually the system output. It should be noted that the correction term is added to the differential equation of x4 only.
Next, the regularized least-squares problem is solved to estimate the values of wc at the time instants when the measurements are taken under three different input conditions. As the α value in Eq 5 for this system is unknown, its optimal value is found by the five-fold cross-validation. Table 2 shows the average normalized mean squared errors (MSE) between model predictions and the measurements for five different α values, and the optimal α value is determined to be one. Hence, the wc estimation results corresponding to α = 1 are used for the subsequent analysis and ANN development. Before constructing an ANN, the accuracy of the estimated values of wc is assessed by comparing the experimental measurements and the dynamics predicted by the available (incorrect) first-principle model (Eq 19) coupled with the estimated wc. Fig 5 shows that the discrepancy between the predicted dynamics and the experimental measurements is diminished and thus validates the wc estimation results.

| α | normalized MSE |
|---|---|
| 0.001 | 0.777 |
| 0.01 | 0.150 |
| 0.1 | 0.072 |
| 1 | 5.75 × 10−3 |
| 10 | 9.77 × 10−3 |


Validation of the accuracy of inferred wc.
Comparison between predicted (red solid line) and measured (blue empty circle) NFκB dynamics when the activity of TNFα equals to (a) 0.5, (b) 1, and (c) 2. Blue dash lines represent the model predictions without the correction terms (wc).
As the last step of the hybrid model construction, an ANN is developed to compute the wc value. Here, inputs to the ANN are selected to be the values of states and input as well as the current time (i.e., [x1 x2 x3 x4 u t]), and the ANN output is the value of wc.
The optimal number of hidden layers as well as the number of neurons in each hidden layer is to be optimized. As outlined earlier in the methodology, the AICc criterion is used to determine the ANN structure. To reduce the combinatorial complexity, the number of neurons in each hidden layer and the number of hidden layers are limited to ten and two, respectively. Then, each ANN is trained 100 times to compute the average AICc value with 100 different initial conditions for its hyperparameters, and the optimal ANN structure is determined by finding an ANN structure which results in the minimum average AICc value. Fig 6 plots the average AICc values for all possible ANN structures, and the AICc value reaches its minimum with eight and four neurons in the first and second hidden layers, respectively; hence, this particular structure is used for the subsequent analysis.


The average AICc values for different ANN structures for the first case study.
The filled circle represents the minimum average AICc value.
Among the 100 different ANNs with the optimal structure that have been trained in the previous step, the best ANN is chosen based on its R2 statistic value. Specifically, an ANN with the highest R2 value is chosen. As shown in Table 3, the R2 statistics of the best ANN are sufficiently high to ensure its accuracy. Then, this ANN is coupled with the available (incorrect) first-principle model (Eq 19) to finalize the hybrid model development. In order to validate the prediction accuracy of this hybrid model, it is simulated under three input conditions and compared with the experimental measurements. As shown in Fig 7, the developed hybrid model can describe the true system dynamics fairly accurately under all the conditions, and the MSE of the model prediction is reduced to 0.00027 from 0.12 which is the MSE value of the original first-principle model (Eq 19). This result shows that the hybrid model constructed by the proposed methodology can accurately describe the system dynamics even when there is limited knowledge on the underlying system (i.e., only a model with model-system mismatches is available from the literature).

| Training dataset | Validation dataset | Test dataset | Overall dataset |
|---|---|---|---|
| 0.997 | 0.994 | 0.987 | 0.994 |


Validation of the developed hybrid model for the first case study.
Comparison between experimental measurements (empty circles) and hybrid model predictions (solid lines) with 0.5 (blue), 1 (red), and 2 (black) of TNFα.
The hybrid model is analyzed further in this subsection to assess whether the developed hybrid model possess the desired features of a hybrid model: intrepretability and generalized prediction capability.
First, the intrepretability of the developed hybrid model is tested by simulating and examining the temporal profiles of unmeasured states. Specifically, the dynamics of x1, x2, and x4 predicted by the developed hybrid model are compared with those of the true system (Eq 16) to assess whether the hybrid model can be used in predicting unmeasured states. As shown in Fig 8, the predictions from the developed hybrid model agree well with the true system dynamics (Eq 16) and show significant improvement in the prediction accuracy compared with the available (incorrect) first principle model (Eq 19). Particularly, it is remarkable to note that the hybrid model can predict the dynamics of x1 and x2 quite well even though wc is added only to x4 for correcting its dynamics. Such improvement is possible since the hybrid model has incorporated the existing (known) interactions among the model states via the first-principle model (Eq 19).


Validation of the interpretability of the hybrid model.
The developed hybrid model is used to infer the dynamics of unmeasured states (i.e., (a) x1, (b) x2, and (c) x4) under two input conditions: u = 0.5 (red lines) and u = 2 (black lines). The hybrid model predictions are compared with the true system dynamics (Eq 16) and the available (incorrect) first-principle model (Eq 19).
Additionally, it is of great interest to know whether the developed hybrid model has a generalized prediction capability. To this end, the hybrid model is used to predict the dynamics under three different input conditions (i.e., u = 0.7, 1.3, 1.7), and the model predictions are compared with the true system dynamics obtained from Eq 16. Here, these three particular input conditions are chosen since they lie within the input range used for training the ANN, but they are not identical to the inputs. As shown in Fig 9, the first-principle model as expected fails to capture the NFκB dynamics accurately. However, the hybrid model is capable of predicting the dynamics of x4 fairly accurately. These results show that the hybrid model possesses the generalized prediction capability due to the incorporation of an ANN. In summary, this case study highlights advantages of using a hybrid model over a purely data-driven model or first-principle one since a hybrid model can essentially improve the overall model prediction accuracy while maintaining the model interpretability.


Generalized prediction capability of the developed hybrid model for the first case study.
The prediction accuracy of the hybrid model is assessed by comparing with true system dynamics when the activity of TNFα equals to (a) 0.7 and (b) 1.7. The dynamics predicted by the original first-principle model (Eq 19) are also plotted for the comparison.
Additionally, since it is imperative to understand how the presence of measurement noise might impact the performance of the proposed methodology, different levels of noise are introduced by varying the distributions for μ× and μ+ in Eq 18 to examine the impact of the measurement noise. Specifically, a noiseless dataset (i.e., μ× = 1 and μ+ = 0) and that with a higher noise level (
However, it should be noted that both in the MSE and intrepretability analysis, the oscillations in the predicted dynamics become more noticeable with a higher level of noise. Since the oscillations are not present in the true dynamics, this comparison shows that the noise level may negatively influence the accuracy of a hybrid model as well as its interpretability. Also, even the hybrid model developed from the noiseless measurements produces the dynamics with nontrivial oscillations, which indicates that the ANN might have been overfitted. In order to mitigate such a problem, the future work in this direction could incorporate the following ideas to improve the hybrid model performance. First, a de-noise technique can be implemented prior to the hybrid model development so that the noise in the measurements will become less influential. In the literature, various methods such as finite difference with polynomial spline [67], spectral transformation [68], sparse Bayesian regression [69], and neural networks [70] have been proposed. Second, alternative ANN training mechanisms can be implemented to improve the performance of an ANN. So far, an ANN is trained with the Levenberg-Marquardt algorithm through Matlab Neural Network Toolbox. Since the presence of the oscillations suggests the potential overfitting issues in the ANN training, Bayesian regularization training method [71] may be implemented to mitigate this issue.
In a systems biology study, the parameter estimation is usually performed to quantitatively calibrate an uncertain model based on available measurements. In the literature, numerous studies have proposed a wide variety of methods to efficiently estimate parameter values, and these methods have been implemented to successfully model various biological systems [2, 72, 73]. However, if there is significant model-system mismatch, the parameter estimation alone may not be enough for calibrating a model. In such a circumstance, the proposed hybrid modeling approach can be implemented. As an example, the parameter estimation is performed for this system to see whether it is enough to train the model (Eq 19). Here, the parameter estimation is preceded by global sensitivity analysis to determine a subset of parameters that are identifiable from available measurements (see [74, 75] for details). The result of the sensitivity analysis is shown in Table 4, where ϕD represents the sensitivity index of a parameter set. Based on the magnitude of the sensitivity index, it is determined that b1 and b5 are identifiable. Then, a least-squares problem is solved to estimate their values by minimizing the difference between the model predictions and the measurements (i.e., in silico measurements simulated from Eq 16). The estimated values of b1 and b5 are 1 and 0.30, respectively, and the accuracy of the calibrated model is assessed by comparing the model predictions with the measurements as show in S6 Fig. It is clear that the parameter estimation alone is not enough to overcome the underlying structural mismatch between the model and the true system for this signaling pathway. In summary, the parameter estimation alone may not be enough for quantitatively calibrating the model if there is a significant degree of model-system mismatch. And, the proposed hybrid modeling approach will be a valuable option to construct an accurate and physically meaningful model.

| Parameter subsets | ϕD |
|---|---|
| b1 | 17.25 |
| b5 | 15.18 |
| b1, b5 | 137.4 |
While the previous case study demonstrates how the proposed methodology can be applicable to a relatively simple system and the in silico measurements, the second case study will examine how the proposed scheme can be used to develop a hybrid model for a much more complex system with real in vitro measurements. Specifically, a hybrid model is developed to describe the lipopolysaccharide (LPS)-induced NFκB signaling pathway in the presence of brefeldin A (BFA).
As briefly described in the previous case study, the NFκB signaling pathway is involved in the apoptotic signaling pathway, but it is also involved in a number of different cellular processes such as inflammation and differentiation [76, 77]. Our previous study [75] aimed to model how the NFκB signaling pathway can be activated by LPS, an endotoxin derived from gram-negative bacteria [78], in the presence of BFA. One of the major products of the signaling pathway is TNFα protein, which is a pro-inflammatory cytokine and propagates inflammatory signals to adjacent cells [79, 80]. While the LPS-induced NFκB signaling pathway is relatively well studied and has been previously modeled in the literature [81, 82], the impact of BFA on the overall signaling pathway is less known. It is suggested that BFA activates the NF-κB signaling pathway by activating another signaling pathway called the endoplasmic reticulum (ER)-stress pathway, which will subsequently initiate the NF-κB [75, 83]. Since the ER-stress pathway itself and how it activates the NFκB signaling pathway have not been fully elucidated yet [84, 85], it is very difficult to formulate an accurate mechanistic model to describe the overall signaling dynamics. In our previous study, we introduced time-varying functions to represent the effects of the BFA addition on the NFκB signaling pathway. By the introduction of new mechanisms and subsequent parameter estimation, the model accuracy was improved significantly, but there was still noticeable discrepancy between the model prediction and the measurements. Also, developing the time-varying components was also a nontrivial task, which involved further literature review and experimentation. In this study, the proposed hybrid modeling approach is to develop a hybrid model to infer the effects of the BFA addition on the dynamics of the LPS-induced NFκB signaling pathway from the measurements.
The first-principle model that will serve as the basis of the hybrid model to be developed is adopted from our previous study [75]. This model describes how LPS can induce the NFκB activation and TNFα synthesis (Fig 10) by a system of nonlinear ODEs. This model consists of 49 states and 149 parameters, and a detailed description on the model can be found in [75].


A schematic illustration of the LPS-induced NFκB signaling pathway.
In our previous study [75], RAW 264.7 murine macrophages were stimulated by LPS in the presence of BFA, and the dynamics of the NFκB signaling pathway were measured by flow cytometry. Specifically, we measured fold changes of TNFα and IκBα, which are two important proteins in the overall NFκB signaling pathway, under three different LPS concentrations in the presence of BFA. As a result, the model outputs are defined as the fold change of the two proteins with respect to their initial conditions:

As outlined in the previous section, the first step in developing a hybrid model is to determine the number of correction terms needed as well as to which states these correction terms should be added. As the number of outputs for the system of interest is two (i.e., TNFα and IκBα), the dimension of xc is also two.
Second, all possible permutations of choosing two from 49 model states are enumerated, and their corresponding digraphs are constructed to compute their corresponding relative order matrices. Based on the constructed relative order matrices, the minimum value of

| xs1 | xs2 | r11 | r12 | r21 | r22 |
![]() | ∑i ∑j≠i ri/rij |
|---|---|---|---|---|---|---|---|
| 5 | 37 | 1 | 6 | 6 | 1 | 2 | 0.33 |
| 1 | 37 | 1 | 6 | 5 | 1 | 2 | 0.37 |
| 3 | 37 | 1 | 6 | 5 | 1 | 2 | 0.37 |
| 34 | 37 | 1 | 6 | 5 | 1 | 2 | 0.37 |
| 2 | 37 | 1 | 6 | 4 | 1 | 2 | 0.417 |
| 4 | 37 | 1 | 6 | 4 | 1 | 2 | 0.417 |
| 5 | 28 | 1 | 4 | 6 | 1 | 2 | 0.417 |
| 1 | 28 | 1 | 4 | 5 | 1 | 2 | 0.45 |
| 3 | 28 | 1 | 4 | 5 | 1 | 2 | 0.45 |
| 34 | 28 | 1 | 4 | 5 | 1 | 2 | 0.45 |
| 2 | 28 | 1 | 4 | 4 | 1 | 2 | 0.50 |
| 4 | 28 | 1 | 4 | 4 | 1 | 2 | 0.5 |
With xc = [x1 x37], the regularized least-squares problem (Eq 5) is solved to estimate W that contains the values of wc at eight time points for each LPS concentration. Since the value of the regularization parameter α in Eq 5 is unknown beforehand, its optimal value is determined by the five-fold cross-validation. Table 6 shows the values of normalized MSE with respect to different values of α. From this result, the optimal value of α is determined to be 0.001, and the estimated values of W obtained with α = 0.001 are considered to develop a hybrid model.

| α | normalized MSE |
|---|---|
| 1 × 10−5 | 0.0275 |
| 0.0001 | 0.0283 |
| 0.001 | 0.0181 |
| 0.01 | 0.0187 |
| 0.1 | 0.0235 |
| 1 | 0.0310 |
| 10 | 0.0390 |
Before constructing an ANN model, the accuracy of the inferred W is assessed by comparing the experimental measurements and predictions from the imperfect model coupled with the inferred W. It should be noted that wc is linearly interpolated to obtain its values at the time instants when the measurements are not taken. Fig 11 compares the predicted and measured TNFα and IκBα dynamics under three different LPS concentrations. Additionally, the predictions of the model coupled with the inferred W are compared with those of the imperfect model without W. Fig 11 shows that the addition of W significantly improves the model accuracy across all three LPS concentrations. Specifically, the addition of W renders the model prediction to match with trends observed in the experiments, which show the sustained low concentration of IκBα. At the same time, the addition of W helps the model predictions agree much better with the measured TNFα dynamics. Overall, these comparisons have demonstrated that the integration of W greatly improves the predictive capability of the available (imperfect) first-principle model.


The model prediction accuracy is improved by coupling the available (imperfect) first-principle model with the estimated wc.
Red solid lines and blue empty circles represent simulated and measured TNFα dynamics and IκBα, respectively, in the presence of BFA under three different LPS concentrations. Blue dash lines represent the model predictions without the correction terms (wc).
Additionally, in order to demonstrate the necessity of choosing an optimal xc, a different xc is chosen, and their corresponding values of W are estimated. Specifically, xc = {2, 17} is selected, and their selection criterion values are


Importance of selecting an optimal xc.
wc are added to xc = {2, 17} as explained in the test. Red solid lines and blue empty circles represent simulated and measured TNFα dynamics and IκBα, respectively, in the presence of BFA under three LPS concentrations. Blue dash lines represent the model predictions without the correction terms (wc).
With the inferred W, an ANN is developed to generalize the relationship between the first-principle model and wc values. Specifically, inputs and output of the ANN are [x(t), t] and wc(t), respectively.
Next, the ANN structure is optimized by minimizing the average AICc values. Similar to the previous case study, we will limit the numbers of hidden layers and neurons in each hidden layer to two and ten, respectively, and each ANN structure is trained 100 times. Fig 13 plots the average AICc value for all possible ANN structures, and the one with three and six neurons in the first and second hidden layers, respectively, is shown to be optimal since it provides the minimal average AICc value. Then, among 100 different trained ANNs with this particular structure, the best ANN is selected by its R2 statistics. Table 7 shows the R2 values of the chosen ANN, which are all above 0.98 and thus demonstrate its prediction accuracy.


The average AICc values for different ANN structures.
The filled circle represents the minimum average AICc value.

| Training dataset | Validation dataset | Test dataset | Overall dataset |
|---|---|---|---|
| 0.998 | 0.999 | 0.986 | 0.997 |
The developed ANN is then coupled with the available (imperfect) first principle-model to finalize the hybrid model. Fig 14 compare the predicted dynamics from the resulted hybrid model with the experimental measurements. The normalized MSE values of the hybrid model are 1.95 and 6.3 × 10−4 with respect to the TNFα and IκB measurements, respectively, while those of the original first-principle model are 12.5 and 7.0 × 10−3, respectively.


Validation of the developed hybrid model.
The IκBα dynamics predicted from the hybrid model with the developed ANN (red solid line) are compared with the measurements (blue empty circle) in the presence of BFA under the LPS concentrations of (a) 10ng/mL, (b) 50ng/mL, and (c) 250ng/mL. The TNFα dynamics predicted from the hybrid model with the developed ANN (red solid line) are compared with the measurements (blue empty circle) under the LPS concentrations of (d) 10ng/mL, (e) 50ng/mL, and (f) 250ng/mL. Blue dash lines represent the model predictions without the ANN.
This shows that the hybrid model with the developed ANN has generalized the prediction capability of the first-principle model coupled with the experimentally inferred wc; as a result, the developed hybrid model now can be utilized to predict the dynamics of the NFκB signaling pathway in new conditions.
Although the developed hybrid model greatly improves the prediction accuracy, the model-system mismatch still remains. Specifically, under LPS = 10 ng/mL, the predicted dynamics of both TNFα and IκBα do not perfectly agree with the measurements. Specifically, the hybrid model predicts a monotonic increase in the IκBα dynamics and thus attenuation of TNFα synthesis, which indicates that the NFκB activity gradually decays during this time period. On the other hand, the measurements show that the TNFα synthesis slows down beyond 240 minutes while the IκBα level decreases after 240 minutes, which is not consistent with the model predictions. There is an additional mismatch in the IκBα dynamics under LPS = 50 ng/mL. Specifically, the hybrid model predicts an oscillatory behavior with two troughs at 60 and 240 minutes and a peak at 120 minutes while the experimental measurements indicate a monotonic increase from 60 to 240 minutes without an intermediate peak. Overall, such remaining model-system mismatches demonstrate that the BFA addition has more pronounced effects on the overall signaling dynamics after 200 minutes. This has been documented in our previous work [75], where the dynamics of IκBα in the presence of BFA deviate from those of IκBα without BFA after 200 minutes. Increasing the dimension of wc may further improve the prediction accuracy due the increase in the degree of freedom. Alternatively, the first-principle model can be modified further by incorporating known mechanisms of the BFA-induced signaling pathways to improve the first-principle model before estimating the values of wc. Specifically, it was suggested that the addition of BFA will elicit another signaling pathway called ER stress signaling pathway [75], which will suppress the translation of IκBα. In the future, this mechanism can be incorporated into the first-principle model to improve the accuracy of the hybrid model.
In this work, we have presented a systematic way to construct a hybrid model that can accurately describe the dynamics of an intracellular signaling pathway even when we have partial understanding of the system. In order to simulate the dynamics of a signaling pathway of interest, prior understandings of the system are formulated into a system of nonlinear ODEs as its first-principle model. Since the first-principle model incorporates underlying mechanisms of the system, the model can be used to predict the system dynamics under new conditions and infer unmeasured model states’ dynamics once the model is properly calibrated by experiments [2–4, 86]. However, the development of such a first-principle model is nontrivial, and one of the largest bottlenecks in the model development process is lack of fundamental knowledge that may lead to the inaccurate formulation of a first-principle model.
Under such circumstances, data-driven modeling approaches such as proper orthogonal decomposition [15], subspace identification [17] and partial least squares regression [25] are commonly implemented to derive a dynamic model of a system whose corresponding first-principle model is too difficult to be formulated or is computationally too costly. Such models are advantageous to accurately derive empirical relationships between inputs and outputs. However, these models are difficult to be generalized and usually require a large amount of observations. To this end, this study proposes to use a hybrid model to improve prediction accuracy by combining the characteristics of both first-principle and data-driven modeling approaches. The available, albeit incomplete, knowledge of the system is used in a hybrid model, and the model-system mismatches are incorporated by inferring unknown components’ dynamics (i.e., W) from experimental measurements [87]. In order to construct a hybrid model systematically, the presented work proposes a sequential two-step approach. First, xc and its dynamics are identified and estimated through the graphical approaches and solving an L2-regularized least-squares problem. Second, an ANN model is developed to correlate the available (imperfect) first-principle model with the values of wc.
Through the proposed hybrid modeling approach, the developed hybrid model is able to posses prediction generalizability as a first-principle model does through the incorporation of the first-principle model coupled with an ANN. That is, a hybrid model can accurately predict the unmeasured states’ dynamics, and it can also be used to predict the system dynamics under a new operating condition as shown in two cases studies. At the same time, the hybrid model is likely to have more accurate predictions by inferring and incorporating the dynamics of components (i.e., wc), which are missing in the first-principle model [29]. Such representations of the hidden components with an empirical function such as an ANN is particularly attractive when mechanistic understandings are completely lacking or only partially known, which may result in highly uncertain mechanistic equations with unreliable model predictions.
Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87