The field of optimal control typically requires the assumption of perfect knowledge of the system one desires to control, which is an unrealistic assumption for biological systems, or networks, typically affected by high levels of uncertainty. Here, we investigate the minimum energy control of network ensembles, which may take one of a number of possible realizations. We ensure the controller derived can perform the desired control with a tunable amount of accuracy and we study how the control energy and the overall control cost scale with the number of possible realizations. Our focus is in characterizing the solution of the optimal control problem in the limit in which the systems are drawn from a continuous distribution, and in particular, how to properly pose the weighting terms in the objective function. We verify the theory in three examples of interest: a unidirectional chain network with uncertain edge weights and self-loop weights, a network where each edge weight is drawn from a given distribution, and the Jacobian of the dynamics corresponding to the cell signaling network of autophagy in the presence of uncertain parameters.
Application of the control usually requires complete knowledge of the system, which is rare for biological networks characterized by uncertainty. Klickstein et al. propose an optimal control for uncertain systems represented by network ensembles where only weight distributions for edges are known.
Our ability to numerically solve and implement optimal controls1–3 has improved greatly this decade, but one typically must assume that nearly perfect knowledge of the system is available4. While this is usually not an issue for mechanical or designed systems5, the optimal control of biological systems, or networks, cannot yet provide certain mathematical models6. There are several reasons why the underlying network structure and parameters may be affected by uncertainty: (i) our knowledge of the network connections may be imperfect, e.g., due to noisy measurements, (ii) networks change with time so a change may occur between the time the network is measured and the time when a control action is introduced and (iii) measurements performed by different research groups or by the same group under different environmental conditions may differ from each other. As an example of (iii), one can find several versions of the neural network of the worm C. Elegans in the literature7,8 or the metabolic network of E. Coli9,10, or variations between brain scans over time of the same individual11. The E. Coli protein–protein interaction network12 is known to be affected by higher uncertainty than its metabolic network. Another example of (iii) is taking into account the effect of the environment on the transition rate between metabolites13.
While considerable research efforts have been addressed at designing control laws for biological networks and other networked systems14–21, a main limitation of these approaches is that an accurate mathematical model of these systems is typically unavailable. Recent work on applying optimal control to autophagy in cells22 and regulating glucose levels in type 1 diabetes23 required applying the resulting control to many possible realizations of the set of parameters to demonstrate their robustness. While the optimal control can be derived for any particular system realization, the resulting control is only optimal for that system. Instead, here, we derive the optimal control that is applicable to any one of a large number of possible realizations simultaneously. This has remained a fundamental open question; how can an optimal control be applied to systems and networks affected by uncertainty.
In general, uncertainty can appear in the form of both measurement and process noise affecting the system dynamics. The prototypical example of uncertainty entering a system is in the form of additive Gaussian noise, which in the case of a linear system and quadratic objective function, leads to the solution of the classical optimal control problem known as the linear-quadratic-Gaussian regulator24. In the field of stochastic optimal control25, a control is derived for a system described by stochastic differential equations.
A classic approach for handling uncertainty is encompassed by the field of robust control, which is concerned with both model inaccuracies, such as imperfect knowledge of the system parameters, and the effects of exogenous disturbances, which naturally arise in any experimental setting. The problem of integrating optimal control and robustness has been dealt with in a number of classic works26–32. Feedback is known to significantly increase robustness of the control action33. The use of Lyapunov functions to design feedback controls for nonlinear systems with guaranteed optimality has also been addressed34. However, performing closed loop control of biological systems is notoriously difficult due to the lack of availability of real time measurements. Here we deal with a different problem for which open loop control is applied to an uncertain system and we are willing to accept an increase in cost to accommodate for such uncertainty. Instead of using common approaches such as system identification or learning, we study how the solution of the optimal control problem changes as uncertainty (i.e., the number of possible system realizations) grows and compute scaling relations for how the solution of the optimal control problem varies in response to increasing uncertainty. Our results are relevant to systems and networks, for which identification may not be viable, such as biological time-evolving systems.
The minimum energy control of complex networks has recently been used to analyze the controllability of complex networks14,35,36 and our ability to allocate resources spatially to perform desired control tasks37–40. The work on controlling complex networks has currently centered around linear systems, which typically only provide rough approximations of biological systems as they normally exhibit multiple attractors. Nonetheless, examining linear systems has provided useful results18 that can be used in experiments. Consider the general network ensemble described in Fig. 1, where the weight associated with each network edge is drawn from a given distribution. For example, for gene regulatory networks the weight distributions are typically estimated from a series of expensive measurements, performed in a noisy environment41–43. The main question we address in this paper is whether it is possible to design an optimal control strategy for a network ensemble, like the one presented in Fig. 1. By network ensemble, we mean a family of weighted, possibly directed, networks that satisfy a set of constraints44–46, also sometimes called the canonical weighted network ensemble47. One possible solution to our proposed problem is to incorporate robustness in the optimal control strategy so that the strategy is effective regardless of the particular network realization drawn from the ensemble. Imagine for example to sample a number of network realizations A(0), A(2), . . . , A(N−1) from the ensemble, such as those networks whose edge weights correspond to the distributions shown in Fig. 1. This problem is addressed by the optimal control problem discussed in the remainder of this paper, with particular focus on the case when N → ∞, thus ensuring one can control a possibly infinite ensemble of systems.


A network ensemble described by edge weights each drawn from a distribution.
A network with n = 6 nodes and directed edges. The edge weight associated with each edge is not known precisely but is instead drawn from some distribution indicated by the plots along each edge.
To provide an example of a situation to which our analysis applies, imagine seeking to control a biological network for which certain connections are well known and characterized by experimental measures but other connections are not. For example, it may be uncertain whether given pairs of nodes are connected or not, and as a result, the network may exist in many different configurations based on which connections are present. Our approach described in the rest of this paper is based on incorporating such uncertainty in the control action and in designing a control solution that works for all the possible network configurations. Similarly we can deal with other sources of uncertainty, e.g., affecting the weights associated with the network connections.
We consider systems which can be described by the triplet

A small example of this type of composite system is shown in Fig. 2. Consider a five state linear dynamical system whose state matrix can be described by the adjacency matrix of a network shown on the top of Fig. 2 where the single control input is assigned to node 4 (in blue) so B = e4 and there is a single output, node 5 (in magenta), so


An outline of the method in terms of composite matrices
A system that can be described as a network is shown at the top where the presence of two edges, (2, 3) and (3, 4) is uncertain. Driver nodes are in blue and target nodes are in magenta. Then make N copies containing each possible network which contains a combination of those two edges. The composite adjacency matrix,
The control energy (or effort) of the control input is defined as,








To determine the behavior of the cost, the control energy, and the deviation, as expressed in Eq. (9) as a function of N, we make the following two assumptions:

In the following section, we present our main result, that under the proper choice of α = α(N), as N → ∞, the control energy EN(α) is upper bounded by a constant value (as are the average deviation DN(α)/N and the total cost JN(α)), as long as Assumption 1 and Assumption 2 hold. The main result of our work investigates the behavior of the control energy and the average deviation in Eq. (9) after applying Assumption 1 and 2, which we call
In Supplementary Note 1.4, an upper and lower bound for


While the average deviation can be bounded below an arbitrarily small value for large enough b, it is important to note that the upper bound of the control energy grows quadratically in b, re-expressing the weight that appeared in the optimal control problem in Eq. (4) as a single parameter trade-off in b. This trade-off between accuracy and control energy is an important consideration for the scientist to consider when designing the controller. For example, if one is interested in reducing
Through the following examples, the control energy, average deviation, and total cost are shown to be bounded as stated in Eq. (11) as well as that assumptions 1 and 2 hold. As a first example, we consider the simplest possible network, a unidirectional path graph
An example of the uncertain unidirectional chain graph is shown in Fig. 3(A) where the single input, labeled u and colored blue, is connected to the N copies of the driver node v0. Each copy of node vj is connected to the corresponding copy of the node vj−1, j > 0. The simplicity of this network and choice of only two unknown weights removes many of the other complicating factors, reducing the problem to only 3 variables; the distribution from which the loop weights are drawn,


An example of the derivations applied to the unidirectional chain graph.
A A diagram of a unidirectional path graph of length n = 4 and N possible realizations with loops −pk and edge weights sk, k = 0, …, N−1. B The largest eigenvalues of the COCG when we choose N realizations, where we see the linear growth with N. C The associated inner products
As both Assumptions 1 and 2 hold, we can be sure that the deviation, the control energy, and the total cost remain bounded in the N → ∞ limit. The values used in Assumptions 1 and 2 are found to be approximately c1 = 5.70 × 10−3, c2 = 0.911, r1 = 10−2.04, r2 = 10−3.14, and
Recently, it was shown that the graph distance between driver nodes and target nodes is an extremely important property when determining the control energy for single system realizations52,53. A study on the effect of uncertainty on results previously derived which state that control energy grows exponentially with distance between a single driver node and a single target node is presented in Supplementary Note 2. The next model we consider is a linear system which can be described by a network where the edge weights are drawn from distributions assigned to each edge. An example of this kind of network is shown in Fig. 4A where the distributions each edge weight is drawn from are shown qualitatively along the edges with further details collected in Table 1. We choose delta distributions for three edges which represents the case where an edge weight is known exactly, uniform distributions for three edges, triangular distributions for two edges, defined as


A network with uncertain edge weights.
A The diagram of the network with six nodes and 10 edges. Each edge weight is drawn from the distribution shown in the associated plot, with numerical values for the distributions shown in Table 1. Additionally, there is a negative self-loop assigned to each node drawn from the uniform distribution

| Edge | Distribution | Edge | Distribution |
|---|---|---|---|
| (1,4) | δ(0.5) | (1,3) | |
| (1,5) | (2,3) | ||
| (3,4) | δ(1) | (3,5) | |
| (4,1) | (4,2) | ||
| (5,6) | δ(0.75) | (6,2) |
For this network, we choose nodes 1 and 2 to be driver nodes and nodes 5 and 6 to be target nodes so that
Again, alternative choices of distributions for each edge weight and loop weight, sets of target nodes, and sets of drivers nodes, lead to qualitatively similar plots as shown in Fig. 4 except that the particular rates of increase, or constant values, will change. A common control goal is driving a nonlinear system near one of its fixed points using its linearization. Even for the case the system is not near a fixed point, the linearization can be used in a piecewise manner as discussed in Kilckstein et al.20. Generically, a controlled nonlinear system is written as,


As an example of this type of system, we apply our methodology to a recently published model of autophagy in cells22. The model contains five internal states which represent the properties of the cell itself, labeled x1 through x5, and six auxiliary states that represent the current concentration of drugs which may be introduced to the cell, labeled w1 through w6. This model consists of dozens of parameters but here we consider two in particular, CEN and CNU, which are coefficients that represent the amount of energy and nutrients available in a cell. As these parameters are cell dependent, their particular values may vary across multiple cells. This model was shown to have a stable fixed point for a range of values of CEN and CNU. We assume that all that is known about CEN and CNU is that they both lie between 0.1 and 0.6. The model is linearized about the stable fixed point and the resulting network is shown in Fig. 5A. In this system, we are interested in adjusting the amount of drug of type 1 (making w1 the sole driver node) to regulate the level of autophagy (making x5 the sole target node) which are color coded accordingly.


The Jacobian of a system with uncertain parameters.
A The Jacobian of the simplified model of autophagy represented as a network. Red edges have weights in which CNU appears explicitly and green edges have weights in which CEN appears explicitly, while black edges have weights that may or may not implicitly depend on CEN and CNU. For 500 choices of CNU and CEN, the stable fixed point is computed and collected in the bar plots in B (for
The fixed point of the system, about which the linearization is performed, is computed for 500 random choices of CEN and CNU selected uniformly from
Qualitatively similar results can be seen for alternative choices of therapy, that is, rather than choosing only drug 1, one could instead choose any combination of the six drugs. Also, if more information is known about the probability of CNU and CEN, then alternative distributions can be chosen from which these parameters are drawn. Next we examine the relationship between the number of target nodes and the cost. We have seen that controlling network ensembles requires more control energy than controlling a single network realization. Here we investigate the relationship between the number of target nodes and the energy required for controlling the ensemble. We see that in average the control energy decreases exponentially, as the number of target nodes is reduced, which indicates feasibility of our approach, as long as the number of target nodes remains small. To demonstrate this relationship, for each realization of N uncertain systems, b is chosen such that DN(b)/(Np) is a constant value regardless of the set of target nodes. To find b, bisection is used as DN(b) monotonically decreases with b. The values of b are averaged over target sets of the same cardinality in Fig. 6A and are seen to grow exponentially as the set of target nodes only grows linearly. The desired deviation is seen to be achieved in Fig. 6B where the error bars are smaller than the size of the marks as the bisection tolerance was set to 10−16. The resulting control energies are collected and their geometric mean is taken over sets of target nodes of the same cardinality in Fig. 6C. We see that as the cardinality of the target node set,


Target control of network ensembles.
The costs (weighting term b, deviation DN(b), control energy EN(b), and total cost JN(b)) averaged over sets of target nodes of the same cardinality for the small network shown in Fig. 4A. The weighting term b is chosen such that DN(b)/(Np) = 0.1 and the result is shown in panel A. The deviation is shown in panel B where the desired value is seen to be achieved. In C, the control energy is shown where it is clear as the number of target nodes decrease, the control energy decreases exponentially. The total cost in D is seen to grow approximately linearly. Error bars represent standard deviation.
The lack of precise information about the mathematics behind many biological systems motivated us to study optimal control of uncertain systems represented by network ensembles, where each edge weight is drawn from a given distribution rather than being exactly known. A practical application of our analysis is an experimental situation in which some of the system parameters are known to lie in a bounded range, but their exact value is unknown. In the presence of such uncertainty, we are able to analytically solve an associated optimal control problem. We have characterized the solution to this problem in the limit of infinitely many system realizations corresponding to the case the realizations are drawn from a continuous distribution. We have then showed how to properly formulate the objective function to ensure feasibility of the problem as the number of realizations grows.
We first demonstrated the feasibility of controlling uncertain linear systems, for the case that the state matrix A may be one of N possible choices drawn from some possibly continuous distribution such that the deviation, or variance, of the final state around some desired final state is maintained below a desirable threshold. We then extended this analysis to nonlinear systems with uncertain parameters. An example of such a system studied in the paper is the cell regulatory network of autophagy. We assumed that the amounts of energy and nutrient available to the cells were uncertain, which yielded different fixed points for the dynamics, about which the equations were linearized. We then characterized the optimal cost of controlling more and more realizations of this network (each one corresponding to different levels of available nutrient and energy). As long as the two assumptions about the COCG hold, which we have found to be the case for all systems analyzed, from simple networks to linearizations of complicated nonlinear dynamical networks, we have analytically shown that the average deviation and the control energy remain finite in the N → ∞ limit. This implies the feasibility of deriving a control input, not for a particular system, but rather for a system described only in terms of distributions, possibly determined experimentally.
Our main result is that as long as the weighting parameter α(N) is chosen properly, the cost of the optimal control solution remains finite. The price to pay for controlling uncertain systems is a higher cost of the optimal control solution. However, this cost can be consistently (exponentially) reduced by limiting the number of target nodes, i.e., the nodes chosen as targets of the control action.
To check assumptions 1 and 2, we required an ability to compute eigenvalues with additional accuracy not possible using double precision as they will typically be extremely small. To do this, we implement a few numerical methods with the multiple precision data type provided in the MPFR library50 which is built on top of Gnu GMP49. Additionally, for multiple precision complex variables, we use the extension to MPFR called MPC51. The code which we use to perform the simulations contained in the text is available at the following Github repository.
To find each block of the COCG as defined in Eq. (5), we solve the Sylvester equation,



Once the complete COCG is available, we are interested in computing the total eigendecomposition. As the COCG is real and symmetric, we use a symmetric tridiagonal decomposition using Householder matrices. Once the symmetric tridiagonal matrix is available, we can use QR steps again to determine the eigenvalues, as well as we can recover the eigenvectors from the Householder matrices as described in Chapter 8 in Golub and Van Loan54.
To compute the costs more efficiently than using the eigendecomposition, we use the quadratic form in Eq. (8). This requires solving the linear system in Eq. (7) which is a symmetric positive definite system of equations. The Cholesky decomposition of
The online version contains supplementary material available at 10.1038/s41467-021-22172-6.
This work has been supported by the National Science Foundation through grants No. 1727948 and No. CRISP- 1541148. The authors thank Franco Garofalo, Francesco Lo Iudice, and Anna Di Meglio for insightful discussions during the development of this problem.
F.S. proposed the problem; I.K. developed the theoretical results and performed the numerical studies; I.K. and F.S. wrote the paper.
Data for each of the figures is available upon request.
Code used to generate data is available from the online repository https://github.com/iklick/controlling_network_ensembles
The authors declare no competing interests.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.