Competing Interests: The authors have declared that no competing interests exist.
In this work, we explore the possibility of using a heterogeneous Susceptible- Infected-Susceptible SIS spreading process on an airline network to model airport congestion contagion with the objective to reproduce airport vulnerability. We derive the vulnerability of each airport from the US Airport Network data as the congestion probability of each airport. In order to capture diverse flight features between airports, e.g. frequency and duration, we construct three types of airline networks. The infection rate of each link in the SIS spreading process is proportional to its corresponding weight in the underlying airline network constructed. The recovery rate of each node is also heterogeneous, dependent on its node strength in the underlying airline network, which is the total weight of the links incident to the node. Such heterogeneous recovery rate is motivated by the fact that large airports may recover fast from congestion due to their well-equipped infrastructures. The nodal infection probability in the meta-stable state is used as a prediction of the vulnerability of the corresponding airport. We illustrate that our model could reproduce the distribution of nodal vulnerability and rank the airports in vulnerability evidently better than the SIS model whose recovery rate is homogeneous. The vulnerability is the largest at airports whose strength in the airline network is neither too large nor too small. This phenomenon can be captured by our heterogeneous model, but not the homogeneous model where a node with a larger strength has a higher infection probability. This explains partially the out-performance of the heterogeneous model. This proposed congestion contagion model may shed lights on the development of strategies to identify vulnerable airports and to mitigate global congestion by e.g. congestion reduction at selected airports.
Networks, ranging from social, transportation to physical contact networks, support the diffusion of information, transportation of goods and spreading of epidemics. Therefore, networks and processes that unfold on them have been investigated in a wide range of fields such as mathematics, engineering and social sciences [1–5]. The Susceptible-Infected-Susceptible (SIS) epidemic spreading process is one of the most studied dynamic processes on networks [6–14]. The classic homogeneous SIS spreading process has been defined as follows. At any time t, a node is either susceptible S or infected I. A susceptible node can be infected by each of its infected neighbors with an infection rate β. Each infected node recovers to be susceptible again with a recovery rate δ. Both the infection and recovery processes are independent Poisson processes. For a given network upon which the SIS process is deployed, a critical epidemic threshold τc exists. When the effective spreading rate τ = (β/δ) > τc, a non-zero fraction of infected nodes persists in the meta-stable state. When τ < τc, the epidemic dies out. The vulnerability of a network to an epidemic is estimated by the prevalence, defined as the average fraction of infected nodes in the meta-stable state. The infection probability vi∞ of a node i indicates its vulnerability to the epidemic. Recent studies have focused on the influence of the underlying network topology and heterogeneous infection/recovery rates on the epidemic threshold, the prevalence [15, 16] and nodal infection probabilities [17]. Epidemic spreading processes have been developed to model e.g. the propagation of epidemic, information, failures and computer worms.
A fundamental question is to what extent an abstract process like the epidemic spreading process could model a generic complex system, i.e. reproduce the key properties of the system. This question is motivated at least from the following perspective. The operating mechanisms of many complex systems like social systems and the brain are far from well understood. A model that could well reproduce the key properties of a complex system may unravel the possible operating mechanism. The operating mechanisms of many complex systems are possibly known, however, too complex to derive optimization/control solutions. In this case, an abstract model that well captures the key features of the system may possibly facilitate the development of optimization solutions.
For airline transportation networks, initial effort has been devoted to the analysis of their topologies, demonstrating properties such as the small-world and scale-free degree distribution [18, 19]. Topological properties of subsets of a network based on geography and airlines/alliances have also been explored [20, 21]. Recent investigations have focused on network resilience and vulnerability regarding random failures [22, 23]. The performance or state of an airport (e.g. congested or not and the average delay per hour) is not independent of the states of other airports. The delay propagation between airports has been studied via e.g. the correlation or causality measures between the time series (average delay per hour) of airports [24–28]. One of the main reasons why delay propagates is that each aircraft has a flight sequence where it travels between possibly multiple airports a day. The congestion at an airport can be introduced by local factors such as the slow boarding of passengers, the mechanical issues of an aircraft at the airport. Beyond, delayed flights that depart from a congested airport could cause an overcharge at the arrival airports. The Air-Traffic Flow Management systems use strategies such as ground holding (intentionally delaying an aircraft’s takeoff) and re-routing to reduce overload [29]. The weather condition could lead to the congestion of several nearby airports, which may further cascade to more airports due the rescheduling or re-routing of aircraft. These perspectives imply the possible contagion of congestion between airports. Airline congestion has been studied via network dynamics like queuing models [30]. Epidemic spreading process has been recently used to model the spreading of traffic jams in urban networks, assuming both homogeneous infection and recovery rate and homogeneous mixing approximation in network topology [31]. The possibility of modeling congestion contagion on an airline network using epidemic spreading process has been barely explored, not to mention how to develop a full-fledged heterogeneous spreading model.
In this work, we explore the possibility and limits of modelling airport congestion contagion by a heterogeneous SIS spreading process on an airline network in reproducing or predicting airport vulnerability. We consider the US Airport Network data [32]. The airport vulnerability is defined as the ratio of the duration of traffic congestion over the total operation time and derived from data. We construct three types of airport networks to capture diverse features such as the frequency and duration of flights. In the heterogeneous SIS model that we proposed, the infection rate of a link is proportional to the weight of the link, as defined in each of the three airline networks. Moreover, the recovery rate of a node is also heterogeneous, dependent on the strength of the node in the underlying network. We use the nodal infection probability in the meta-stable state as an estimation of the corresponding airport’s vulnerability, which will be further compared with the airport vulnerability derived from the US Airport dataset to evaluate our model. Specifically, our model is evaluated according to its capability to reproduce the distribution of the vulnerability of a node and the ranking of nodes in vulnerability. The modeling of airport congestion contagion by the SIS process, where the infection rate of a link is proportional to the weight of the link and the recovery rate is homogeneous, has been explored in [33]. That SIS process is a special case of our heterogeneous model and is called the homogeneous SIS model in this paper to emphasize its homogeneous recovery rate. We illustrate that the heterogeneous SIS model evidently outperforms the homogeneous model according both aforementioned evaluation perspectives. Our further exploration of the infection probability in relation to the node strength of an airport explains the better performance of the heterogeneous model in reproducing the ranking of nodes in vulnerabilities.
We propose and illustrate the basic method to model a complex system by an epidemic spreading process, via the airline system. The relatively good performance of the model does not imply that the derived model is the precise mechanism of congestion contagion. Further verification of the contagion mechanism is needed, e.g. regarding whether nodes with a large strength recover faster. The derived model may inspire the development of strategies to identify vulnerable airports and to mitigate global congestion by e.g. reducing congestion at selected airports.
The content of this paper is arranged as follows. Firstly we define, derive and characterize the airport vulnerability derived from data. Furthermore we introduce the heterogeneous SIS spreading model and network construction. Afterwards, the methods to evaluate the model are presented. In results, we compare the performance of our model with the homogeneous model. The final section summarizes our key findings and discusses possible future work.
Firstly, we describe the US Airport Network data. Airport vulnerability and its distribution are further defined and derived respectively. Airport vulnerability obtained from data will be adopted as a benchmark to evaluate the performance of our model.
We obtain the U.S. airport dataset from the Bureau of Transportation Statistics (BTS). This data set includes detailed information about the U.S. flight schedules since 1987 [32]. The computer reservation system (CRS) further distinguishes flight schedules as the planned schedule under optimal operation conditions, and the actual schedule. In order to demonstrate our modelling approach, we use the data spanning the high season period from July 1st 2018 to July 14th 2018, since flight schedule and rotations periodically repeat. In total N = 349 airports and E = 645299 flights have been considered. This data set contains as well extra information for each flight e.g. Tail-number, Origin and Destination, Date, the actual and scheduled Departure/Arrival Times.
The vulnerability of an airport is defined as its duration of traffic congestion over its total operation time, which is its probability of being congested. Per hour, an airport’s declared capacity corresponds approximately to the number of movements (the total number of departure and arrival flights) planned for that hour, such that a reasonable level of service (LOS) can be ensured. Delay is the principal indicator of LOS. Usually the declared capacity of an airport is up to 85–95% of its maximum throughput capacity, which is the maximal number of movements per hour that the airport’s runway system allows according to air traffic management rules and assuming continuous aircraft demands. An airport is considered congested if its actual number of movements per hour during operation is greater than its declared capacity (the planned number of movements) divided by a parameter α, where 0.85 ≤ α ≤ 1. We consider α = 0.9 as an example to illustrate our methods. The state of each airport i at each hour t is derived from U.S. airport dataset as follows: the airport is congested (Xi(t) = 1) if the actual number of movements is larger than the number of movement planned at time t divided by 0.9. If this condition is not satisfied, the airport is not congested (Xi(t) = 0). Airport i’s vulnerability is the fraction of time that airport i is congested. We considered all hours in the previously specified two week’s interval (excluding hours between 0 and 6 of each day due to their low number of movements). The hours considered are indexed as [1, 2, …, m], where m = 18 ⋅ 14 = 252.
In this work, we confine ourselves to this limited definition of airport vulnerability to start and to illustrate our method. The definition could be further generalized to capture the level of congestion per hour. The declared capacity can also also be better estimated based on airport characteristics (e.g. active runways, taxiways, etc.) and weather conditions, beyond flight schedule.
Fig 1 shows the distribution of airport vulnerability, whose average is 0.15 and variance is 0.01. The vulnerabilities of all the airports are within the range [0, 0.4].


Probability density function fϕ(x) of the vulnerability ϕ of an airport.
The average vulnerability is E[ϕ] = 0.15 and the variance is Var[ϕ] = 0.01. In total 45 bins are split within the interval [0, 1] with the same bin size. The probability density fϕ(x) at a given bin x eqals the percentage of the airports whose vulnerability falls within the bin normalized by the bin size 1/45.
We model the contagion of airport congestion as a heterogeneous SIS spreading process on an airline network. Firstly we introduce how to construct the three types of airline networks. Secondly, we propose the heterogeneous SIS spreading model. The last subsection illustrates the individual-based mean-field approximation to compute nodal infection probabilities in the meta-stable state, given the underlying network and the model parameters.
We derive three types of undirected networks from the U.S. Airport Network data over the two weeks’ period in order to capture various flight properties. This is motivated by the fact that the SIS spreading process unfolds differently on different underlying networks. Network G1 is unweighted: two nodes (airports) are connected if at least one direct flight exists in between. Each existing link has a weight wij = 1. Network G2 and G3 are both weighted and have the same network topology as network G1. It is assumed that the infection rate along a link is proportional to the link’s weight. In G2, the link which connects node i and j has weight
Finally, the weights in Networks G2 and G3 are respectively normalized as

Heterogeneous infection rate and recovery rate (link weight) have been shown to influence the nodal infection probabilities [11, 17]. Since the infection rate of a link and the recovery rate will later be defined as a function of the link weight and node strength of a node respectively, we examine the distribution of the link weight and node strength (the total weight of the links incident to a node) in Fig 2. Network G2 and G3 manifest different link weight and node strength distributions, which motivate again the consideration of the three types of networks that capture different features of the airline system.


The probability density functions
Horizontal and vertical axes are presented in logarithmic scale. The horizontal axis is split into 20 bins, each with the same bin size in the linear scale. The probability density
We explore relation between the strength of a node and other centrality metrics that describe varies topological properties of a node via the linear correlation coefficient. The following centrality metrics have been considered:
Clustering Coefficient. In an unweighted network, the clustering coefficient is the probability that two random neighbors of a node are connected. In a weighted network, a generalized definition for clustering coefficient has been introduced by [34]. The intensity of a triangle among node i, j and k is defined as
Betweenness Centality. The betweenness centrality of a node is the fraction of the shortest paths between all possible node pairs that pass through the node. To compute the shortest path between a node pair, we define the distance of each link in the underlying network as the reciprocal of its link weight [35].
Closeness Centrality. The closeness centrality is the average hopcount of a node to any other node. The hopcount between two nodes is the number of links of the shortest path, which is computed as described in betweenness.
Principal Eigenvector Component The principal eigenvector component of a node is its corresponding component in the principal eigenvector of the weighted adjacency matrix. The principal eigenvector is the one corresponding to the largest eigenvalue.
The linear correlation coefficient between node strength and each of centrality metric in the three networks constructed are shown in Table 1.

| Network | Clustering | Betweenness | Closeness | Eigenvector |
|---|---|---|---|---|
| G1 | -0.09 | 0.80 | 0.81 | 0.95 |
| G2 | 0.00 | 0.81 | 0.54 | 0.98 |
| G3 | -0.17 | 0.81 | 0.44 | 0.92 |
Node strength is strongly correlated with all the centrality metrics that describe a given importance of node in the whole network except for the clustering coefficient, a nodal property derived from local network connections. Hence, node strength that will be used to define the nodal recovery rate in the epidemic spreading model, captures as well nodal properties like betweenness, closeness and principal eigenvector component.
Furthermore, we study the relation between the vulnerability ϕ of an airport and a given centrality metric of the corresponding node in each of the three underlying networks. This helps us to evaluate the possibility of using a nodal centrality measure to estimate nodal vulnerability. In the scatter plot in Fig 3, we do not observe any monotonic trend between the vulnerability ϕ of an airport and the centrality metric of the corresponding node. This implies that centrality metrics can not be used as a good estimation of airport vulnerability. Our previous work [33] illustrated as well the worse performance of vulnerability prediction via centrality metrics than that via the homogeneous SIS model. Hence, we will compare performance of the heterogeneous SIS model with that of the homogeneous SIS model but not of the centrality metrics.


Airport vulnerability versus a nodal centrality metric.
The scatter plot of airport vulnerability ϕ versus node strength (a,b,c), clustering coefficient (d,e,f), betweenness (g,h,i), closeness (j,k,l) and eigenvector (m,n,o) centrality in network G1 (first column, blue color), G2 (second column, red color) and G3 respectively.
The networks we constructed have not taken the geographical locations of the airports explicitly into account. One may wonder whether the vulnerability of an airport may strongly correlate with its location, thus can be possibly estimated by its location. Fig 4 shows that vulnerable airports are scattered in location and no evident relation between vulnerability and location.


Geographic location and vulnerability of U.S. airports.
The geographic location and vulnerability of an airport in U.S. mainland (a), Alaska (b), Hawaii Islands (c), Puerto Rico (d), American Samoa and Guam (e). The nodes/airports are color-coded according to their airport vulnerability ϕ. We show the names of the top 30 most vulnerable airports.
We model the airport congestion dynamics as a heterogeneous SIS spreading process, where both the infection rate per link and the recovery rate per node are heterogeneous. The infection rate of a link with weight wij is βij = βwij. In network G1, which is unweighted, the infection rate is homogeneous. The heterogeneous recovery rate is motivated by the fact that airports with a larger declared capacity may recovery faster i.e. are more capable to deal with operational delay and congestion due to their better infrastructure. The declared capacity of an airport is affected by the number and geometric layout of the runways, type and location of taxiway exits from the runway and the ATM system. The primary factor in determining the capacity is the number of simultaneous active runways. The selection of runways to be operated depends on demand, weather conditions (visibility, wind speed/direction) and noise restrictions. During periods of high congestion, a large airport can decide to keep more runways active to match the demand, however, a small airport does not have that option. Furthermore, a large airport with several runways will have even more runway configurations, which is a combination of simultaneous active runways, weather conditions and assignment of aircraft types and movements (arrival/departures). This makes larger airports more suitable to handle congestion [29]. Similarly, recent studies showed that large airports are less likely to propagate delay [27, 28]. In the three networks we constructed, the node strength tends to be a good proxy of the declared capacity and it is strongly correlated with several other nodal centrality metrics. Hence, we define the recovery rate δi of a node as a function of its node strength:

We derive nodal infection probabilities via mean-field approximation instead of simulating the SIS stochastic process for computational efficiency. The N-Intertwined Mean-Field Approximation (NIMFA) is one of the most precise individual-based mean-field approximations [9]. Different from homogeneous or degree-based mean-field approximations where only the degree of a node is taken into account, NIMFA preserves the whole network topology in its governing equations, coupling the infection probability of neighboring nodes. It further assumes that the states of neighboring nodes are uncorrelated. Under NIMFA, the governing equation for a node i in our heterogeneous SIS spreading model is

In a heterogeneous SIS model, the condition for the epidemic to spread out on a given network G is

We evaluate our model via its capacity to capture: (a) the probability distribution of airport vulnerability and (b) the rank of airports in vulnerability.
We firstly quantify the similarity of the probability distribution of nodal infection probability obtained from the heterogeneous SIS model with that of airport vulnerability via the Jensen Shannon divergence JSD. Given two discrete probability distributions P = (p1, p2, …, pK) and Q = (q1, q2, …, qK) where K ≥ 2, the Jensen-Shannon divergence(JSD) [37] measures the similarity of P and Q. We define the mixture of P and Q as M = (m1, m2, …, mK) where

From the application perspective, the identification of the most vulnerable airports is crucial. We can evaluate the quality of using nodal infection probability to rank airports in vulnerability as follows. A node with a high infection probability is supposed to correspond to an airport with a high vulnerability. We rank the nodes (airports) according to their infection probability and vulnerability respectively. These two rankings are recorded by two vectors

We define the overall recognition quality ξ as the area under the rϕv(f) function:

Our heterogeneous SIS model has three control parameters δ, c and θ. In order to understand the influence of the parameters on the performance of the model, we consider all possible combinations of the parameters. We consider for c all possible values within [0, 2] and with step size 0.02. Similarly, θ can be any value within [0, 2] and with step size 0.1. The smaller step size of c is motivated by the high sensitivity of the model’s performances (especially the recognition quality ξ) on c. This is because the term
The Jensen Shannon divergence JSD evaluates the similarity between nodal infection and vulnerability distribution, whereas the recognition quality ξ assesses the capability of identifying the most vulnerable airports according to their corresponding infection probabilities. In this section we explore the performance of the heterogeneous SIS model in comparison with the baseline homogeneous SIS model. If we aim to develop a model to reproduce the vulnerability distribution alone (to minimize the JSD) or the ranking of nodal vulnerability (to maximize ξ), but not both at the same time, the heterogeneous SIS model evidently outperforms the homogeneous one. As shown in Fig 5, the minimal possible JSD and the maximal ξ achieved by the heterogeneous model are far lower and higher respectively than those obtained by the homogeneous model. The minimal JSD and the maximal ξ are not obtained by the heterogeneous model at the same time, i.e. via the same parameter set θ and c.


The scatter plot of the recognition quality ξ versus Jansen-Shannon divergence JSD for both heterogeneous and homogeneous SIS model with diverse parameter sets.
The scatter plot is obtained in network G1 (figure a1, a2), G2 (b1, b2) and G3 (c1, c2). Points correspond to the heterogeneous model, where θ ∈ [0, 2] with step size 0.1 and c ∈ (0, 2] with step size 0.02. The points are colored according to parameter c in a1, b1, c1 and according to θ in a2, b2, c2. The dash lines correspond to the baseline homogeneous model.
Furthermore, the data points on the top-left panel in each sub-figure of Fig 5 correspond to the parameter sets with which the heterogeneous model outperforms the homogeneous one in reproducing both the vulnerability distribution and ranking the airports in vulnerability. Among those points, those that lead to an evidently high recognition quality are within the parameter range θ > 1 and c = 0.02, when the recovery rate is highly heterogeneous. The heterogeneous SIS model on the unweighted network G1 could possibly achieve slightly better recognition quality than the model on G2 and G3. The homogeneous model on network G1 however, performs worse than that on G2 and G3 in recognition quality. The network G1, which contains less information than the other two networks, is sufficient for the heterogeneous model to perform well. When c = 0.02, the heterogeneous model achieves the best performance in ξ. This suggests that a fine tuning of the c within the range (0, 0.02) may further improve the performance of the model. The parameter sets that we have considered are sufficient for us to illustrate that the heterogeneous SIS model could perform better than the homogeneous one.
Identifying the most vulnerable airports is crucial for operations. In this section, we aim to understand why the heterogeneous SIS model better recognizes vulnerable airports, i.e. is higher in recognition quality than the homogeneous model. In the homogeneous SIS model, a node with a large strength tends to have a high infection probability. In the heterogeneous SIS model, a node with a large strength has high rates of getting infected by its neighbors, contributing to a high infection probability. On the other hand, a node with a large strength could have a large recovery rate when θ > 0. These two factors imply that a node with a large node strength does not necessarily have a high infection probability. In this section, we explore whether the better performance of our heterogeneous SIS model in recognition quality corresponds to its better capability to reproducing the relationship between the vulnerability and strength of a node if compared to the homogeneous SIS model.
We take network G1 as an example. The heterogeneous SIS model on G1 achieves the highest recognition quality ξ when c = 0.02 and θ = 1.5. We consider the SIS model when c = 0.02 whereas θ varies and when θ = 1.5 whereas c varies. We plot the vulnerability ϕ and the meta-stable infection probability v (derived by the heterogeneous SIS model or the homogeneous SIS baseline model) of a node versus the strength of the node in Fig 6a. When θ < 1, and c = 0.02, the infection probability increases monotonically with the strength of a node (see Fig 6a). When θ > 1, the new phenomena unfolds: high nodal infection probability is obtained by nodes with an intermediate strength, but not those having a small nor large strength. A large θ attributes to the heterogeneity of the recovery rates, allowing nodes with a large strength to have a small infection probability. Given the θ = 1.5, Fig 6b shows that the nodal infection probability increases monotonically with the node strength when c is large, e.g. c > 1. A large c reduces the heterogeneity of the recovery rate. When c is small, the maximal vulnerability has also been obtained by nodes with an intermediate node strength.


Airport vulnerability and nodal infection probability versus normalized node strength.
The scatter plot of the vulnerability ϕ (points) and infection probability v (lines) of a node versus the normalized node strength
The node strength that leads to the maximal infection probability increases as c increases because a larger c makes the recovery rate more homogeneous. In the extreme case, the most heterogeneous case, when θ > 1 and c = 0, v decreases monotonically as the node strength increases, which can be seen in Fig 6a. In this special case, a larger θ > 1 corresponds to a steeper decrease. The relative magnitude of the constant term c with respect to the node strength dependant term


Cumulative distribution of
We model airport traffic congestion contagion as a heterogeneous SIS spreading process on an airport transportation network, aiming to identify airport’s vulnerability, i.e. probability of being congested, using nodal infection probabilities derived from our model. Three airline networks are constructed to capture diverse information e.g. flight frequency and duration and the infection rate of each link is assumed to be proportional to its link weight. Per node, we introduce an heterogeneous recovery rate which is a function of its node strength. The model is evaluated via its capability to reproduce the distribution of nodal vulnerability and to rank airports in vulnerability. Our model evidently outperforms the SIS model with a homogeneous recovery rate in ranking airports from both perspectives. One explanation of the better performance of our heterogeneous model in reproducing the ranking of airports in vulnerability is that: the phenomena that the vulnerability is the largest at airports whose strength in the airline network is neither too large nor too small can be only captured by the heterogeneous model. In particular, a node with a large strength has high rates (link weights) of getting infected by its neighbors, whereas its large recovery rate could reduce its infection probability. Finally, the simplest airline network that represents which airports have direct flight(s) in between already allows the heterogeneous model to evidently outperform the homogeneous one.
The identification of vulnerable airports is crucial for airport operations. Beyond, our model may facilitate the development and evaluation of optimization strategies. The optimization problem can be, e.g. which airports should be invested in improving their capacity thus reducing their vulnerability or in improving their recovery rates in order to minimize the global vulnerability. The derived model that describes how congestion at one airport spreads to other airports could be used to evaluate optimization solutions as a starting point. Such questions require as well further improvement and validation of the model, accounting for e.g. other operational factors and the time varying nature of airport vulnerability. The definition of airport vulnerability can also be generalized by considering e.g. the extent of congestion at an airport.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39