I have read the journal’s policy and the authors of this manuscript have no competing interests.
By modifying and calibrating an active vertex model to experiments, we have simulated numerically a confluent cellular monolayer spreading on an empty space and the collision of two monolayers of different cells in an antagonistic migration assay. Cells are subject to inertial forces and to active forces that try to align their velocities with those of neighboring ones. In agreement with experiments in the literature, the spreading test exhibits formation of fingers in the moving interfaces, there appear swirls in the velocity field, and the polar order parameter and the correlation and swirl lengths increase with time. Numerical simulations show that cells inside the tissue have smaller area than those at the interface, which has been observed in recent experiments. In the antagonistic migration assay, a population of fluidlike Ras cells invades a population of wild type solidlike cells having shape parameters above and below the geometric critical value, respectively. Cell mixing or segregation depends on the junction tensions between different cells. We reproduce the experimentally observed antagonistic migration assays by assuming that a fraction of cells favor mixing, the others segregation, and that these cells are randomly distributed in space. To characterize and compare the structure of interfaces between cell types or of interfaces of spreading cellular monolayers in an automatic manner, we apply topological data analysis to experimental data and to results of our numerical simulations. We use time series of data generated by numerical simulations to automatically group, track and classify the advancing interfaces of cellular aggregates by means of bottleneck or Wasserstein distances of persistent homologies. These techniques of topological data analysis are scalable and could be used in studies involving large amounts of data. Besides applications to wound healing and metastatic cancer, these studies are relevant for tissue engineering, biological effects of materials, tissue and organ regeneration.
Confluent motion of cells in tissues plays a crucial role in wound healing, tissue repair, development, morphogenesis and in numerous pathological processes such as tumor invasion and metastatic cancer. For such complex processes, controlled experiments help clarifying the roles of chemical, mechanical and biological cues. Among them, spreading of cellular tissues on an empty space and antagonistic migration assays between cancerous and normal cells are quite revealing. The interfaces between confluent cellular aggregates uncover properties thereof when a combination of modeling, numerical simulation and data analysis is used. Here we have modified an active vertex model with a dynamics that includes inertia, friction and active forces that tend to align cells based on interaction with its immediate neighborhood. Selecting appropriately junction tensions among cells and using the SAMoS software, we have succeed in simulating assays of cellular tissue spreading on an empty space and the invasion of healthy tissue by cancerous one. We have introduced topological data analysis to characterize, track and compare in an automatic manner the interfaces of the tissue both in numerical simulations and from experimental data of normal and Ras modified precancerous Human Embryonic Kidney cells. We find good agreement when normal cells are solidlike and modified cells are liquidlike according to their shape parameters. In addition, cell variability means that a fraction of randomly distributed cells favor mixing, the others segregation. Topological data analysis techniques are scalable and could be used in studies involving large amounts of data. Besides applications to wound healing and metastatic cancer, these studies are relevant in ascertaining how the biophysical features of materials may affect tissue and organ regeneration.
Confluent motion of epithelial cell monolayers [1–28] is crucial in many biological processes, such as morphogenesis [3, 26], biological pattern formation [9, 23], biological aggregation and swarming [17, 21], tissue repair [6, 18, 19], development [4], and tumor invasion and metastasis [1–3, 28]. It serves as a relatively simple paradigm for collective motion of cells that retain their cell-cell junctions as they move on a two dimensional (2D) substrate. Confluent cellular motion can be tracked and visualized in experiments. Velocity and stress fields can be obtained by particle imaging velocimetry (PIV), time resolved cellular motion is observed using time-lapse imaging and fluorescence microscopy, traction microscopy allows to measure the forces that cells exert on the substrate as they move [5, 6, 15]. Collective cell migration also poses challenging questions in soft and active matter physics, as it may exhibit fluid, solid or glass behavior with interesting flocking and jamming/unjamming transitions [12, 20, 29–35]. Interesting dynamics occurs as an epithelial cell aggregate advances through an empty space, as in wound healing, or it collides and encroaches a different tissue, as in cancer invasion. Advancing cellular fronts may display wave phenomena [15, 36], grow fingers [16, 37, 38], or breakdown and interpenetration against an oppositely moving front [22, 27]. Different aspects of these phenomena have been studied by models ranging from macroscopic continuum mechanics to detailed subcellular agent models [25, 29, 37, 39, 40].
Here we combine particle dynamics [16] with the active vertex model (AVM) [39] to provide a cellular dynamics perspective on monolayers colliding in antagonistic migration assays (AMA) [22, 27] or on monolayers spreading over an empty space [11, 16, 37, 38]. The resulting model describes the collective migration dynamics of a large number of cells and implements exchanges of neighboring cells automatically (T1 transitions) [39]. In contrast to the usual overdamped dynamics in the AVM, the dynamics of the cell centers is underdamped. The underdamped AVM incorporates internal dissipation of cells through a friction parameter, a Vicsek-like velocity alignment of neighboring cells [30, 41, 42], noise and and active forces that may include cell polarity. We calibrate its parameters so that the simulations agree with experiments. Parameters for collective migration to an empty space are calibrated for Madin-Darby canine kidney (MDCK) cells [10, 16]. In AMA between MDCK cells, Ras modified cells collapse and are pushed backward by normal cells, which detect the former by an ephrin related mechanism [22]. The repulsive interactions between the two cell types drives cell segregation, produce sharp borders [24], and may generate deformation waves at the interface between the two cell types that propagate across the monolayers [36]. In agreement with experiments in the literature, simulations of spreading test with our model exhibit formation of fingers in the moving interfaces, there appear swirls in the velocity field, and the polar order parameter and the correlation and swirl lengths increase with time, all of which has been observed in experiments [10, 11, 16, 37, 38, 43]. Our model is quite flexible, which gives it some advantages when describing behavior across different scales. Compared with particle models with underdamped dynamics, our model does not require introducing leader cells [16] to account for fingering instabilities. Compared to continuum models [38], stochasticity enables our model to reproduce the observed spatial autocorrelation of the velocity [11]. Simulations of our model show that cells in a finger of a moving interface may exhibit fast irregular oscillations in their velocity (periods of about one hour). This has been reported in early experiments [43]. Our underdamped dynamics also predicts that cells inside an aggregate spreading onto an empty space have smaller area than those at the tissue interface. This prediction has been corroborated by experiments [44]. Simulating the AVM with overdamped dynamics, we observe the opposite: cells at the interface and fingers have smaller are than cells inside the tissue [39].
In AMA with Human Embryonic Kidney (HEC) cell assemblies, precancerous Ras modified cells displace normal cells [27]. These latter experiments have been interpreted using continuum mechanics in a simple biophysical model through phenomenological couplings [38], without recourse to biochemical signaling mechanisms and without clear relations to cellular processes. In this paper, we consider wild type (wt) HEC cells to be solidlike whereas invading Ras cells are fluidlike and push the former backward. Experiments show that wt HEC cells keep their shape and area quite unchanged whereas Ras HEC cells may change shape and undergo larger deformations. This enforces our characterization of wt and Ras HEC cells as solidlike and fluidlike, respectively. As time elapses, there are cell exchanges and islands of one cell type form inside the tissue of the other cells, which characterizes a flocking liquid state [32, 34, 40]. In AMA with MDCK cells, the roles are inverted: Ras cells are solidlike and wt cells are fluidlike. The precise form of the separating interface among monolayers of different cell type depends on cell parameters governing segregation vs aggregation of these cells. We characterize it by topological data analysis (TDA). A measure of cellular diversity in the junction tensions produces islands of one type of cells inside the monolayer of the other cells, which is reflected in TDA of simulations and experiments. Cell cohesion given by the underdamped AVM, the cell alignment rule and the active noise force produce fingers in interfaces during assays of cell invasions of empty spaces rendering unnecessary to assume a different phenotype for lead cells [16]. Recapitulating, our model explains a wide variety of experiments on confluent motion of cellular aggregates onto free space (wound healing) and invasion of one aggregate by another (antagonistic assays, cancer). It does this by choosing judiciously physical parameters such as cellular junction tension, adhesion and those in active forces. Fine tuning of parameters may require a deeper study of experimental data. One promising area where our results are very relevant is the study of the biophysical features of materials as they affect tissue and organ regeneration (materiobiology, tissue engineering) [45].
Recent experiments have connected metastasis in colorectal cancer to wound healing and tumor invasion of tissue using appropriate molecular markers [28]. Thus, our description of spreading of cellular tissue and antagonistic migration assays using our modified active vertex model might be relevant for metastatic cancer. In particular, we have shown the role of cellular junction tensions in cell invasion, agglomeration and segregation. Promising mechanisms include Notch signaling pathways [46] and models of the epithelial-mesenchymal transition and cancer stem cell formation [47, 48]. Incorporating these cellular mechanisms to our vertex model may pave the way to future progress in this area, much as incorporating the Notch signaling pathway to cellular Potts models helps understanding many aspects of angiogenesis [49]. Understanding precise biochemical mechanisms influencing cell-cell contact and confluent cellular tissue may help develop therapies for metastatic cancers [48].
When studying spreading and collisions between cellular aggregates, the interfaces become rough and can shed and absorb groups of cells. It is important to be able to track automatically these changes for long time numerical simulations and experiments generate large data sets that is hard to visualize and follow. For the first time in studies of confluent motion of cellular aggregates, we use topological data analysis of time series generated by numerical simulations to automatically group, track and classify the advancing interfaces of cellular aggregates. Topological changes in the interfaces are reflected in barcodes and persistence diagrams of clusters and holes that change with scales (filtration parameters) [50, 51] and themselves evolve in time. We track and study these changes by means of bottleneck or Wasserstein distances [51, 52]. Measuring these changes with time in data available from experiments and comparing with data from numerical simulations allows characterizing milestones in confluent motion of aggregates and the important time scales involved. In this work, we use techniques of topological data analysis with some data taken from experiments and a modest amount of data from numerical simulations so as to explain techniques and results in a clear manner. However, our techniques are scalable and could be used in studies involving large amounts of data, as, for example, those generated to characterize zebrafish patterns by combining machine learning and topological data analysis [53].
The paper is organized as follows. The Section Mathematical Model describes the models we simulate. The numerical values of the parameters are calibrated so as to reproduce experimental observations of collective cell migration in two different cases: an aggregate spreading to an empty space and the collision of two different cellular monolayers in antagonistic migration assays. The Results and discussion section contains the numerical simulations, the characterization of the structure of advancing and interpenetrating cell fronts by means of topological data analysis, and our conclusions. The Methods section provides additional background on topological data analysis for the readers’ ease of use, and it details our study of evolving interfaces of a spreading aggregate by taking slices of cells near the front.
We modify an active vertex model (AVM) [39] and simulate it by adapting the SAMoS software [54]. The AVM combines the Vertex Model (VM) for confluent epithelial tissues [29, 55] with active matter dynamics [39]. Sometimes what we call AVM following Ref. [39] is called an active self-propelled Voronoi model [40]. Let us describe first the VM, then the AVM and our modification of its dynamics.
The VM assumes that all cells in the epithelium are roughly the same height and thus that the entire system can be well approximated as a two-dimensional sheet. The conformation of the tissue in the VM is computed as a configuration that simultaneously optimizes area and perimeter of all cells. Two neighboring cells share a single edge, which is a straight line. Three junction lines typically meet at a vertex, although vertices with a higher number of contacts are also possible. The model tissue is therefore a mesh consisting of polygons (i.e., cells), edges (i.e., cell junctions), and vertices (i.e., meeting points of three or more cells). Each configuration of the mesh has the following associated energy

While the moduli Ki and Γi are positive, Λμν < 0. When the cell i shares junctions only with others of the same type, ∑〈μ, ν〉Λμν
lμν = Λμν∑〈μ, ν〉
lμν = Λμν
Pi, and this term can be put together with the perimeter term, thereby yielding

Clearly, Λ12 < (Λ11 + Λ22)/2 implies that energy is minimized when the number of junctions between both types of cells increases. Cells of different types therefore tend to mix. Conversely, when Λ12 > (Λ11 + Λ22)/2 cells of different type segregate, as suppressing junctions between cells of different type minimizes energy. There is also a competition between the two first terms in Eq (2) to minimize energy. Assume Λ12 = (Λ11 + Λ22)/2 and therefore the last term in Eq (2) vanishes. The shape index
To introduce dynamics in the VM, we have to go from polygon vertices rμ to polygon centers that represent cells, ri, consider these centers as particles and introduce dynamics for them [39]. In this, the AVM is similar to the self-propelled Voronoi model [31]. The core assumption of the AVM is that the tissue configurations that optimize the energy in Eq (1) correspond to the Voronoi tessellations of the plane with polygons as cells and cell centers acting as Voronoi seeds. Given a Voronoi tessellation, we consider its dual Delaunay triangulation, comprising Voronoi seeds and the edges joining them (triangles), which have the property that no seed is inside the circumcircle of any triangle; see Fig 1. From a Voronoi tesselation it is straightforward to obtain the dual Delaunay triangulation and vice versa. However, working with Delaunay triangulations has an advantage: they retain their nature when triangle vertices move by flipping edges conveniently [39], whereas Voronoi tessellations do not. The latter have to be reset after motion of polygon vertices. In the AVM, the area Ai in Eq (1) of the cell i is the area of the associated Voronoi polygon, Ωi, given by the following discrete version of Green’s formula:






Voronoi tessellation and Delaunay triangulation.
(a) Here rμ are vertices of polygons in the Voronoi tessselation and ri are centers of polygons that are vertices of Delaunay triangles. Here the zoom of a monolayer shows (b) the Voronoi tesselation and the Delaunay triangulation.
In the AVM, the usual dynamics for the cell centers is a gradient flow of the energy in Eq (1), that is overdamped dynamics with forces Fi given by Eq (6), plus active forces fa ni, and stochastic forces νi [39]

In this work, we shall modify the AVM dynamics. Instead of Eq (7), we shall use the particle dynamics of Ref. [16] but with different forces between particles. As discussed in Ref. [56], trajectories of motile cells can be explained by assuming that their acceleration is a certain functional of velocity. Despite the mass of the cell being so small that inertia is negligible compared with typical forces exerted on the cell, the formula for acceleration resembles Newton’s second law [56]. In this formula, a linear damping term represents dissipative processes coming from friction with substrate, with other cells, or rupture of adhesion bonds. Active memory terms, which are linear in the velocity, may propel single cells and account for the observed non-monotonic velocity autocorrelation [56]. When considering cellular tissue, Sepúlveda et al model cells as actively motile particles and replace the memory terms by Vicsek-like alignment “forces” [41], and interparticle and random “forces” [16]. Thus, the acceleration in these models is a consequence of the collective motion of cells and the interaction with the environment and it does not follow from Newton’s second law with a mass given by that of a single cell. However, we will continue denoting by forces (per unit mass) the terms comprising the acceleration [16]. In contrast to Eq (7), the cells in Ref. [16] are not self-propelled, so that they can stop their motion and start moving again if there are missing cells in their neighborhood and the active force is zero:


| α | β | τ | σ0 | K | Γ | Λ | λ | l0 | ζ |
|---|---|---|---|---|---|---|---|---|---|
| h−1 | h−1 | h | ![]() | - | - | - | - | - | - |
| 0.534 | 41.36 | 0.56 | 95 | 1 | 0.1 | -1 | 0.1 | 0 | 0.5 |
The cells at the boundary between a cellular monolayer and an empty space, or between tissues, are special. They may form actin cables, thereby having a line tension and a bending stiffness [39]:


A random configuration of the particles comprising a confluent cell monolayer is usually different from those configurations observed in experiments. Thus, we have to carry out an initialization stage until the particle configuration is compatible with their observed velocity distributions. For spread tests, we proceed as follows. We set a square box of size 1 mm2 area, N ≈ 4000 particles (comparable to the number of cells in the experiments), the packing ratio and the particle mean velocity. Then, we numerically solve Eqs (1) and (8) with forces φi = 0 and ∑〈j, i〉 fij = Fi/mi, Fi given by Eq (6), until the velocity probability density functions (PDFs) of the experiments are fitted. The parameters adjusted to the experimental data at early time (30 min after stencil removal) are listed in Table 1. We stop the initialization stage when the distribution of mean distances between particles is close to the initial distribution as observed in experiments and displayed in Fig 2. From this simulation, we obtain the particle positions ri and solve the underdamped AVM with forces given by Eq (6) and initial random directions for the particle velocities. As we can see in Fig 3, the velocity field obtained from the simulations, Fig 3(b), is very similar to that measured by PIV analysis [16], Fig 3(a).

![Probability distribution function (PDF) for particle velocities: (a) vx, (b) vy, (c) v = |v|; and (d) mean distance d between neighboring particles; after the initialization procedure (red triangles) as compared to the experimentally observed PDF (black line) [16].](/dataresources/secured/content-1765608618460-8a7acce1-c0d9-4d07-a5f2-58f96d02579e/assets/pcbi.1008407.g002.jpg)
Probability distribution function (PDF) for particle velocities: (a) vx, (b) vy, (c) v = |v|; and (d) mean distance d between neighboring particles; after the initialization procedure (red triangles) as compared to the experimentally observed PDF (black line) [16].
Parameter values are those in Table 1.
For AMA, we choose a random configuration having the same number of wt and Ras cells separated by a vertical straight line and we set a known velocity distribution from experiments [27]. This represents the situation of the two monolayers when they first make contact. See details in the next section.
We have simulated two different tissue configurations: (a) a cellular monolayer spreads over an empty space, and (b) two monolayers comprising wild type and modified cells collide. In each case, the simulations are compared to relevant experimental observations.
Inspired by wound healing phenomena and experiments on tissue scratching, we are interested in the movement of an epithelium which encroaches on a virgin substrate. The experimental protocol consists of microfabricated stencils whose removal increases the motility of the epithelium. In our simulations, we consider a narrow strip configuration as that in Fig 4(a), which is similar to those in Ref. [43]. We adapt the SAMoS code [54] to simulate the AVM with dynamics given by Eqs (8) and (6), in which φi = 0. Parameter values are those in Table 1. Cells migrate on the surface maintaining their junctions with their neighbors, which is enforced by the term proportional to β in Eq (8). During healing, noisy forcing in Eq (8) makes some cells to move faster that the others while keeping their contacts. This is the origin of the fingers or instabilities of the interface with the cell free space, which are illustrated by Fig 4(b), see S1 Video for the complete time evolution. In addition, cells on the interface, or close to it, may grow beyond the target area A0 in Eq (1). As they do so, each cell has a probability to divide into two daughter cells, which equals rd(A − A0)dt. Here dt is the time step and rd is the division rate. We have normalized the target area to A0 = π, dt = 0.05, rd = 0.01, and we check whether the cell divides with ten times the above probability every 10 time steps that we observe A > A0. With these parameters, there is some cell division near the interface of the confluent layer and the empty space.


Initial configuration and configuration after 20 h of stencil removal showing the formation of fingers according to the numerical simulation of the model.
(a) Full view, (b) zoom. Initial box size is 1.6 mm2, P0 = 10, A0 = π, and shape index p0 = 5.65. Parameter values are those in Table 1. See S1 Video.
After two hours of stencil removal, the PIV recorded from the experiments reveals the complex movements that can appear inside the bulk of the tissue, cf. Fig 2A of Ref. [10]. Cells do not move independently and their velocities are correlated. The presence of these cellular flows shows the existence of motion inside the monolayer [11]. Similar to experiments, our simulations in Fig 5 show that incipient fingers appear in areas of high speed; see also S2 Video. In our simulations, we find these areas without having to postulate the existence of special leader cells. Having calculated numerically the velocity field, we can quantify the orientational motion inside the epithelium. Take for example, the configuration after 35 h of stencil removal shown in Fig 6. In addition to the velocity field and the speed (modulus of the velocity vector) map, we have depicted a density map of the polar order parameter Spol,



Cell velocity field after 2 h of stencil removal in an invasion configuration calculated from simulations of the model with the parameters of Fig 4 and Table 1.
(a) Phase contrast visualizing cells, (b) profile of cell speed (modulus of velocity), (c) velocity field. These panels should be compared with those obtained from experimental MDCK data in Fig 2A of Ref. [10].
The fronts of advancing cells in Figs 5 and 6 clearly show the formation of fingers. The AVM keeps cells together while the term proportional to β in Eq (8) induces a common average direction in their motion. This effect becomes stronger the larger β is, which promotes and enforces finger formation. Thus, unlike the particle model of Ref. [16], we do not need a longer range attractive potential interaction between cells. We do not need to distinguish leader cells to trigger finger formation [16] because advancing cells at the forefront of the monolayer pull those behind them. A comparison of our simulation results in Figs 5 and 6 to the experiments reported in Refs. [10, 43] shows that the appearance and size of the cell velocity field are reproduced qualitatively. Our simulations show that the area and velocity of cells both increase as their distance to the boundary of the cellular tissue decreases. Fig 8 shows that cells near the interface in a spreading configuration have larger areas than cells far from the interface. This is particularly noticeable in the fingers: the cells in them are faster and have a larger area than the cells elsewhere. The cells far from the tissue border are compressed and have smaller area than boundary ones. This prediction of the underdamped AVM with dynamics as in Eq (8) has been observed in experiments; see Fig 4 of Ref. [44]. In experiments, the area of finger cells reaches larger values than in the simulations, which is related to the fact that we use a fixed target area for all cells and the cell area cannot depart arbitrarily far from target in the AVM. We have also simulated the AVM with the overdamped dynamics of Eq (7) and with the same boundary and initial conditions. In this case, after fingers are formed as in Fig 6 of Ref. [39], the interior cells far from the interface have larger area than cells at the boundary and in the fingers. This is also shown in Fig 8 of Ref. [39]. However, this behavior is contrary to experimental observations [44].


Areas of cells during a simulation of a spreading configuration: (a) Area of cells near the interface, (b) area of cells far from the interface.
Our simulations exhibit the same trend as measurements reported in Ref. [44]. The bar length in both panels is 100 μm.
Our numerical simulations of spreading configurations show that the cells inside a finger move faster than those at other portions of the interface. We have observed that the average velocity of finger cells may oscillate irregularly about some average value with a short period of about one hour. Fig 9 shows the average velocity of 9 finger cells during a 7 hour time interval. The velocity of a single cell in the finger oscillates somewhat more irregularly in a similar fashion. For much longer time intervals, the average velocity may experience an overall upward trend. The average velocity of boundary cells in flat regions also oscillates with time but it does not show a definite behavior over long time intervals: it may even display a downward trend. In experiments, the velocity of cells leading interfacial fingers has also been observed to oscillate rapidly and irregularly with periods of about one hour or less, which is similar to the findings based on numerical simulations of our model; see Fig 101A of Ref. [43]. Some models based on continuum mechanics predict longer periods of tens of hours [43].


Average velocity of the marked cells during finger expansion.
The velocity of each cell oscillates in a similar but somewhat more irregular manner (not shown).
The velocity field in Fig 6(a) exhibits swirl patterns [11]. To characterize them, we have depicted in Fig 10(a) the correlation function for the x-component of the velocity field:

Recently, Moitrier et al have reported confrontation assays between antagonistically migrating cell sheets [27]. In their experiment, the two confluent cellular monolayers (wild type and modified Ras HEK cells) advance toward an intermediate empty space, collide and the Ras monolayer displaces the wt one. The experiment shows that the velocities of the cells decay exponentially fast the farther they are from the advancing fronts [27]. If x = L(t) is the position of the monolayer front, the velocity of the cells at position x < L is Vwt exp[(x − L)/λwt] for the wt and −VRas exp[−(x − L)/λRas] for the Ras cells at x > L. After the collision, these velocity functions remain the same but now Vwt and VRas acquire a common and lower value −Vinterface. Moitrier et al interpret their experiments by comparing with simple solutions of a 1D continuum model [27]. In our simulations, we use the SAMoS code to simulate the underdamped AVM cellular model with dynamics given by Eqs (8) and (2). The invading Ras cells (magenta) move to the left whereas the wt cells (green) are pushed backward because they experience aversion to mixing with Ras cells. We model this situation by adding a negative active force


Simulation of the antagonistic migration assay: One population advances pushing back the other.
Junction tensions are Λ11 = −6.2, Λ22 = −6.8, which yield shape indices 3.50 (green cells) and 3.84 (magenta cells), respectively. Other parameters are listed in the first row of Table 2, and
Our underdamped AVM uses more features of wt and Ras cells obtained from the experiments than kept by continuum models. The latter lose features at distances close to the cell size. Continuum models fit friction, viscosity and strength of active forces for the two cell populations to explain how Ras cells invade the wt monolayer [27].
The AVM allows us to study tissues that behave differently. In our simulations, 5000 cells are split into two populations with different properties specified by the junction tensions Λij, j = 1, 2, which affect each pair of cell-cell contacts. The simulations producing Figs 11, 12 and 13 have open boundaries because we have focused on the interface between populations. We have fixed K = Γ = 1 and −6.8 = Λ22 < Λ11 = −6.2, which produce shape parameters p0 of 3.50 (green cells) and 3.84 (magenta cells), below and above the transition value p0* = 3.812, respectively. Thus, Ras magenta cells are fluidlike (supercritical shape index) and their strain energy density is smaller than that of the solidlike wt cells. This is consistent with the observation that wt cells have larger mean traction force amplitudes than Ras cells [27]. Our aim is to analyze the effect of Λ12 on the AMA. Both monolayers occupy the right and left portions of a 4.4 mm wide, 3.1 mm tall box. In Figs 11–13, we show a 1 mm × 2.5 mm region.


Simulation of the antagonistic migration assay: Creation of a extremely rugged interface.
First and second snapshots:


Simulation of the antagonistic migration assay: One population advances pushing back the other, while the interfaces mix, creating scattered islands of cells of different type.
Parameters and times are as in Fig 12, except that one fifth of the overall population (randomly placed green and magenta cells) have Λ12 = −7.5 (population mixing) and the other four fifths have Λ12 = −6.0 (population segregation). The marked region has similar size to that reported in experiments [27] and will be used in our TDA studies. See S5 Video.
In our simulations, we start from having the cell populations separated by a straight vertical interface at L(0) = 0. The active force φ pushes Ras cells with x > L(0) to the left, whereas φ = 0 for any cell to the left of x = L(0). The junction tension Λ12 in Fig 11 (Λ12 = −7.0) and in the two left panels of Fig 12 (Λ12 = −7.5) favors population mixing. Ras (magenta) cells push wt (green) cells backwards at a velocity close to the observed Vinterface, meanwhile creating a rugged interface between cell populations. As time elapses, fingers and some isolated islands (lagging wt in the Ras assembly and advancing Ras islands in the receding wt assembly) appear. These effects are more pronounced the smaller Λ12 is, as shown by comparison of Figs 11 and 12. It is possible to create some realistic mixing of the populations by changing the junction tension Λ12 with time. The first two snapshots in Fig 12 have
We have also focused on the effects of cellular alignment. There are two terms in Eq (8) that try to synchronize cell velocities: the term proportional to β and the active force φ, which pushes the Ras cells to the left. Although the values of β used to draw Figs 11–13 are smaller than that in Table 1, different β still make a difference in the behavior during tissue collision, specially in the Ras population. Fig 11 exhibits global polar migration because its β value is larger than that in Figs 12 and 13, but types of cells are not mixed despite having a favorable value Γ12 = −7.0. The smaller value of β in Figs 12 and 13 creates a weaker polar alignment than that in Fig 11. The different patterns observed in these figures illustrate that cell alignment affects importantly the shape and configuration of the interface. S3–S5 Videos compare the time dynamics of these three sets of simulations.
While the rightmost panel of Fig 12 is similar to some of the experimental data [27], we can obtain a similar formation of islands and fingers by assuming that Λ12 is randomly distributed among cells. In particular, we assume that one fifth of magenta and green cells have Λ12 = −7.5, which favors mixing of populations, while the remaining ones have Λ12 = −6.0 and favor population segregation. The result is depicted in Fig 13, which exhibits behavior similar to experimental observations [27], compare also to Fig 14. The topological data analyses of the next section characterize the geometry of the interface between cell types in antagonistic migration assays.

![Structure of the interface between colliding layers corresponding to two snapshot sof the collision of two confluent cellular monolayers in Moitrier et al’s experiment [27].](/dataresources/secured/content-1765608618460-8a7acce1-c0d9-4d07-a5f2-58f96d02579e/assets/pcbi.1008407.g014.jpg)
Structure of the interface between colliding layers corresponding to two snapshot sof the collision of two confluent cellular monolayers in Moitrier et al’s experiment [27].
These profiles correspond to (a) the third and (b) the fourth panels (counting from the left) in the cover of Soft Matter corresponding to Ref. [27].
Experiments and numerical simulations of cell monolayers produce time series of images that make it possible to identify the structure of interfaces and to compare their time evolution. It is quite cumbersome to process manually these time series. Here we use Topological Data Analysis (TDA) as a computational tool to process automatically time series of images. We next illustrate how to use TDA for this purpose and how to interpret the obtained results. We focus on specific parts of selected snapshots of images from experiments and then on time series of images from numerical simulations. While we have few images of interfaces from experiments, we can generate arbitrarily many from numerical simulations. Having many images, the automatic TDA tool enables us to describe in detail the topological changes of the interfaces and to implement hierarchical clustering strategies, thereby classifying the evolving interface structures.
Fig 14 shows the interfaces between two colliding confluent cellular monolayers in an AMA [27]. In this experiment, magenta Ras cells make green wild type cells move back, cf. third and fourth snapshots in the cover of Soft Matter, vol. 15 [27]. The interface between the two cell populations is rather rough, it exhibits fingers, and there are islands or pockets of green cells left behind by the advance of the magenta front. To quantify these phenomena in an automatic way, we proceed as follows. Using Matlab, we transform the images in matrices of ones (green) and zeros (magenta). Then we extract the positions of green/magenta interfaces, represented by the point clouds shown in Fig 14, and process them using TDA. We pursue a similar strategy for images extracted from numerical simulations of our underdamped AVM, which yields a more complete picture of the evolution of interfaces.
A finite set of data points may be considered a sampling from the underlying topological space. Homology distinguishes topological spaces (e.g., annulus, sphere, torus, or more complicated surface or manifold) by quantifying their connected components, topological circles, trapped volumes, and so forth. Persistent homology characterizes the topological features of clouds of point data or particles at different spatial resolutions [58]. Highly persistent features span a wide range of spatial scales. Persistent features are more likely to represent true features of the data/pattern under study than to constitute artifacts of sampling, noise, or parameter choice [50]. To find the persistent homology of a cloud of point data/set of particles, we must first view them as a simplicial complex C. Roughly speaking, a simplicial complex is defined by a set of vertices (points or particles) and collections of k-simplices. The latter are the convex hulls of subsets with k + 1 vertices, comprising also faces; see the Methods section for precise definitions. Defining a distance function on the underlying space (the euclidean distance, for instance), we can generate a filtration of the simplicial complex, which is a nested sequence of increasingly bigger subsets. More precisely, a filtration of a simplicial complex C is a family of subcomplexes
How do we construct a filtration? The Vietoris-Rips filtration VR(X, r) [50, 58], which we will use here, is constructed as follows:
The set of vertices X is the cloud of points under study.
Given vertices x1 and x2, the edge [x1, x2] is included in VR(X, r) if the distance d(x1, x2) ≤ r.
If all the edges of a higher dimensional simplex are included in VR(X, r), the simplex belongs to VR(X, r).
A default choice for the distance d to study homology of 2D particle configurations is the Euclidean metric. Fig 15 displays two simplexes of a Vietoris-Rips filtration for the point cloud in Fig 14(a). Notice the appearance and disappearance of holes and isolated components as the threshold distance r to connect points increases. This filtration is governed by three parameters:
The maximum dimension dmax. This is the maximum dimension of the simplices to be constructed. The persistent homology (characterized by its Betti numbers) can be computed up to dimension dmax − 1. In this case dmax = 2, we consider points (0-simplices), edges (1-simplices), and triangles (2-simplices).
The maximum filtration value rmax and the number of divisions N. These values define the filtered simplicial complexes to be constructed, for


Visualization of the complexes VR(X, r) for the point cloud depicted in Fig 14(a) when (a) r = 6 and (b) r = 10.
For large enough r all the components merge in a single one. Holes appear and disappear as new connections are created, reflecting the overall point cloud arrangement.
Notice that for a set of P points, the full simplicial complex will have about 2P − 1 simplices in it. Therefore, dmax and rmax are usually slowly increased to get information without reaching computational limits. The computation is not too sensitive to the specific value of N. When rmax is greater than the diameter of the point cloud, all possible edges form and join all the points in one simplex.
For the readers’ ease of use, we include more detailed definitions and intuitive examples in the Methods section. In the next two sections, we apply TDA to experimental and numerical images.
Let us consider the snapshots depicted in Fig 14. Fig 15 processes the earlier snapshot depicted in Fig 14(a), in which the green and magenta monolayers have made contact and started interpenetrating each other. Ras cells (magenta) are pushing back wt cells (green) towards the left. As they do so, there are islands of wt cells inside the Ras monolayer. How does TDA capture these features? After constructing the Vietoris-Rips filtration, there are two commonly employed graphical representations that visualize the persistent homology of a point cloud: barcodes and persistence diagrams [59].
Barcodes of a homology Hk depict Betti intervals [rb, rd) for k-holes (k > 0) or connected components (k = 0) as the filtration parameter r varies. The homology class H0 comprises the points forming the green/magenta interfaces. As the size filtration parameter r increases from zero, there appear edges joining these points, thereby forming clusters as illustrated by Fig 15 for specific values of r and indicated by the barcodes in Fig 16(a) for the selected range of r. The class H1 further distinguishes compact components of the interface that are detached from the main part of the interface and form topological cycles, cf. the corresponding barcode in Fig 16(a). These components are islands of one cell type (phase) inside the bulk of the other phase.


Barcodes (left) and persistence diagrams (right) for the homologies H0 (circles) and H1 (asterisks) of the interfaces separating cell types in images from experiments and numerical simulations.
We use Vietoris-Rips filtrations with parameters N and rmax. (a)-(b) TDA from Fig 14(a) (experiments) with N = 45, rmax = 45; (c)-(d) TDA from Fig 14(b) (experiments) with N = 45, rmax = 45; (e)-(f) TDA from the leftmost panel in Fig 13 (numerical simulations) with N = 60, rmax = 30; (g)-(h) TDA from the rightmost panel in Fig 13 (numerical simulations) with N = 30, rmax = 30. Points in the persistence diagrams mark the beginning (birth) and end (death) of a bar (homology class) in the barcode. Triangles represent a component with infinite persistence. The green line is the diagonal.
Persistence diagrams represent the Betti intervals by points in a birth-death plane (see the Methods section for precise definitions). The x axis represents the filtration value r at which components/holes are created. The y axis represents the filtration value r at which they disappear. Those points less close to the diagonal (green) tend to mark robust underlying geometrical features. Fig 16(b) depicts the persistence diagram corresponding to Fig 14(a). Red circles mark connected components of the interface between cell monolayers and the magnitude of the filtration parameter r at which they disappear. As the filtration parameter increases, points comprising the main front merge rapidly in one component that absorbs neighboring clusters. They correspond to blocks of bars in the H0 panel of Fig 16(a) that start at the lowest value of r. Blue asterisks represent the appearance (horizontal axis) and disappearance (vertical axis) of holes inside such clusters. The first column of asterisks represents the ten bars in the H1 panel of Fig 16(a) that start at the same value of r and form four groups of bars, which end at about the same value of r. The remaining bars and asterisks are similarly related. They represent the new holes that form as the clusters merge, which gives an idea of the relative arrangement thereof. Relatively narrow barcodes produce points in the persistence diagram that are close packed.
Fig 16(c) and 16(d) display the barcodes and persistence diagram corresponding to Fig 14(b). Compared to the earlier snapshot of Fig 14(a) and its TDA in Fig 16(a) and 16(b), there are more islands of each phase in the bulk of the other: the invasion of Ras cells leaves pockets of wt cells inside their midst. The main interface has become more meandering and exhibits more fingers than in the earlier snapshot. As a consequence, the number of clusters or interface components is larger than at the earlier time. Similarly, there are more topological cycles, which reflects the larger number of islands of one cell type in the midst of the other cell type. Barcodes and persistence diagram are more spread out. This is further quantified by the Betti numbers bj that count the number of elements in Hj, for j = 0 (clusters) and for j = 1 (holes), as depicted in Fig 17(a) and 17(b) for the snapshots shown in Fig 14. The trends are similar in the simulations, as shown in Fig 17(c) and 17(d).


(a)-(b) Betti numbers versus filtration parameter diagrams for Fig 14(a) (blue asterisks, from S1 Data) and 14(b) (magenta circles, later time in the AMA experiment, from S2 Data) show that the number of clusters and holes in the interface between aggregates increases with time. (c)-(d) Same for the numerical simulations considered in Fig 16(e)–16(h) corresponding to the leftmost (S3 Data) and rightmost (S4 Data) panels in Fig 13. As a result of island formation and motion, which increases with time, Panels (a) and (c) show that the number of components decreases more slowly with r for the later time. The peaks in Panels (b) and (d) are similar for r below 20. In both cases, the interfaces formed at the later time display larger numbers of holes with larger sizes as a result of island formation. The additional peaks in Panel (b) near r = 40 correspond to islands that have already penetrated further inside the other cell population in the experiment.
How do we characterize quantitatively variations in the persistence diagrams of point clouds? We have to introduce distances between diagrams to measure their differences. In the next section, we explain the bottleneck and Wasserstein distances using time series of numerical simulations, out from which we have generated much more complete data sets than available from experiments [27].
As indicated in the previous section, to observe island formation, we have to tune the (negative) junction tensions when simulating antagonistic migration assays. In particular,
Let us now interpret the evolution shown in the panels of Fig 13 using TDA (see also S5 Video, out from which we have extracted 12 snapshots). Figs 16(e)–16(h) and 17(c) and 17(d) show the barcodes, persistence diagrams and Betti numbers for the marked sections of the leftmost and rightmost panels in Fig 13. As before, we represent the interfaces by point clouds. At r = 0, each point of the interface is a component. For the more regular interface of the leftmost panel in Fig 13, increasing r produces point components appearing as the short H0 bars in Fig 16(e). These bars end at similar filtration values and appear as a single red circle in the persistence diagram of Fig 16(f). The main three islands correspond to the three intermediate bars in the inset of Fig 16(e), which disappear at larger filtration values. The lowest circle in Fig 16(f) represents the point components, the three intermediate ones represent the islands in the barcode and their sizes. All clusters finally merge in the main front represented by the arrow on top of the vertical axis in Fig 16(f). Analysis of H1 confirms that the intermediate circles/bars are round islands and not strings. Each component corresponds to a cycle represented by the three largest H1 bars in Fig 16(e) and the two first asterisks in Fig 16(f), one of which represents the two bars of similar length. The two shortest bars represent holes formed as components merge during the filtration process and correspond to the two asterisks closer to the diagonal in Fig 16(f).
Fig 16(g) and 16(h) correspond to the more meandering interface of the rightmost panel in Fig 13. There are more points in the cloud representing the interface, whose irregularity results in different extinction values of r for the associate H0 bars. The main seven islands correspond to the intermediate bars in the inset of Fig 16(g), and their extinction values in the persistence diagram give an idea of the distance to the main front or to another island. The fact that they are islands (enclosed by a boundary) is inferred from the H1 bars in Fig 16(g). They correspond to the seven bars that appeared first, which are also represented by the first column five asterisks in Fig 16(h) having smaller r. Two of the asterisks correspond to two islands of similar size length each, which have bars of similar size. The length of the bars in the barcode or the distance of the asterisks from the diagonal in the persistence diagram give an idea of the island size. Additional H1 bars represent holes created during the filtration process as components merge and give an idea of the relative arrangement of the islands or of the fingers in the main front. They are represented by the additional asterisks in Fig 16(h). The Betti numbers in Fig 17(c) and 17(d) show a larger number of island and holes as time increases from the leftmost snapshot in Fig 13 to the rightmost one. Compared to the TDA of experiments in Fig 16(a)–16(d), there are no gaps between bars and asterisks appearing for large r in Fig 16(e)–16(h). The reason is that the distance of islands to the main front is smaller for the simulation than for the experiment.
We have applied TDA to a time series of 12 snapshots (extracted from S5 Video, which visualizes the evolution of the numerically simulated interface in an AMA). Fig 16(e)–16(h) correspond to snapshots 2 and 10. For each of them, we calculate barcodes as explained above. To quantify the variations in the barcode patterns, we introduce distances between persistence diagrams that are stable against random perturbations [51, 52]. Given two persistence diagrams X and Y, their bottleneck distance is defined as




(a) Bottleneck distance matrix for the interfaces between cells populations appearing in the 12 snapshots forming S5 Video and (b) associated dendrogram illustrating how the interfaces between cell populations can be grouped in clusters.
Interfaces in frames 1-3, 4-10, 11-12 can be grouped together, and the last two groups are closer to each other than to the initial frames. These groupings reflect similarities between frames as they succeed one another, and the disruptions between frames reflect significant topological changes of the interfaces (e.g., detachment and reattachment of islands). S5–S16 Data have been used to draw this figure.
Once we have a matrix of distances, we resort to unsupervised clustering methods to classify the frames in similar blocks. This classification automatically extracts the similitudes and changes between interfaces at different times of the AMA. Fig 18(b) displays a dendrogram obtained by agglomerative hierarchical clustering using Ward’s method [61] and the bottleneck distance. A dendrogram consists of U-shaped lines that connect data points (which, in this case, are the interfaces in each frame) in a hierarchical tree. The height of each U represents the distance between the interfaces that are connected by it. A dendrogram is not a single set of clusters, but a multilevel hierarchy. For a given dendrogram, we identify the natural cluster divisions relying on the inconsistency coefficient [62]. The latter compares the height of a link in a cluster hierarchy with the average height of links below it: larger inconsistency coefficients mark natural divisions [62]. By defining a cutoff value for the inconsistency coefficient, we automatically detect clusters. A 0.9 cutoff in the inconsistency coefficient detects 3 clusters, corresponding to times {1, 2, 3}, {4, 5, 6, 7, 8, 9, 10}, and {11, 12} in S5 Video. Thus, we distinguish the initial period (interfaces close to the original connected one, times {1, 2, 3}), the intermediate period (a phase in which a few islands form and advance, times {4, 5, 6, 7, 8, 9, 10}), and the final period (severe disruption with the abrupt formation of several islands, times {11, 12}). In this way, we gain insight on the time evolution: how fast the interfaces experience significative changes, and when abrupt changes do occur and mark the onset of a new cluster of frames. The chosen cutoff 0.9 is a critical value: Increasing it, we find only one cluster. Lowering the cutoff, we detect 5 clusters of frames, corresponding to times {1, 2, 3}, {4, 6, 8}, {5, 7}, {9, 10}, {11, 12}. With respect to the 0.9 value, the intermediate cluster splits into other 3, reflecting detachment, reattachment and slow progression of islands.
We can also obtain clusters by setting distance cutoffs, i.e., a height in the dendrogram of Fig 18(b). The height of a link between clusters represents the distance between them, which is called cophenetic distance. It is possible to calculate the correlation between the cophenetic distance and the distances in the matrix of Fig 18(a) (the cophenetic correlation [63]). For our simulations, Ward’s hierarchical clustering provides the largest cophenetic correlation when compared with other clustering approaches, such as single-linkage [53]. Thus, the Ward dendrogram gives the most faithful representation of the distance matrix, which is why we use it. Depending on the chosen cutoff height in Fig 18(b), we obtain one, two, three, four or five clusters. The sets of three or five clusters are the same as before (this does not necessarily occur in general). Alternatively, we can use the K-means algorithm to group the interfaces in clusters, selecting an optimal number of clusters by silhouette or elbow type criteria [64, 65]. In our case, K-means with 3 or 5 clusters produces the ones already obtained (this does not necessarily occur in general). We have illustrated TDA with a relatively short time series, but it is clear that it could be used for automatic detection of topological changes in much longer time series, or to quantify how close the interfaces obtained from different simulations or experiments are. In the Methods section, we have used TDA with 1-Wasserstein distance to interpret the evolution of tissue interfaces in the numerically simulated spreading assay of Fig 4.
We have modeled how epithelial cell aggregates advance through empty spaces (wound healing, tissue spreading) and collisions between aggregates (tumoral invasion) using an active vertex model with dynamics for cell centers that includes collective tissue forces [39], and velocity alignment and inertia [16]. The active vertex model implements exchanges of neighboring cells automatically (T1 transitions) and uses the SAMoS software. Compared with particle models with underdamped dynamics, our model accounts for fingering instabilities in spreading tissue without having to introduce leader cells [16]. Compared to continuum models [38], stochasticity enables our model to reproduce the observed fast irregular oscillation of cell velocities in fingers [43] and the spatial autocorrelation of the velocity [11]. Our underdamped AVM predicts that cells at the interface and the fingers have larger area than those well inside the tissue, which has been corroborated by recent experiments [44]. We also observe in numerical simulations of tissue spreading that the velocity of the fastest cells in a finger may oscillate with a short period in a range between 30 minutes to about one hour. A similar short period oscillation has been observed in experiments; cf Fig 101A in L. Petitjean’s PhD thesis [43]. Thus, for spreading tissue, detailed comparison to experimental data provides a quantitatively accurate description of cell motion (speed, velocity correlation function and polar order parameter). For antagonistic migration assays, we have reproduced collisions in which one cell population pushes back another whereas both populations mix forming different types of interfaces. The key element to model mixing is to keep different junction parameters for the two colliding tissues: the invading cells are liquid like whereas the receding tissue comprises solid like cells. In addition, a fraction of cells favor mixing, the others segregation, and that these cells are randomly distributed in space. Thus characterized, numerical simulations produce outcomes similar to those observed in experiments [27]. Compared to particle models, ours includes active vertex forces between cells that keep them together preventing gaps and keeping track of cellular compression, enlargement and changes of area. To characterize automatically the dynamics of islands and the rugged interface between aggregates, we have introduced topological data analyses of experiments and time series from numerical simulations. In collisions between aggregates, the interface between wt and Ras cell populations roughens and islands appear. The persistence diagrams of Homology classes 0 (clusters) and 1 (cycles) spread out and the number of these classes given by the corresponding Betti numbers increases. Using time series of data generated by numerical simulations, we have explained how to cluster interfaces using distance matrices based on the bottleneck distance between their persistence diagrams, which are stable to perturbations in the process. Despite the amount of data from experiments being limited, disruptive events such as island and cluster formation can be automatically captured by topological data analyses of numerical simulations and contrasted with experiments. Similarly, the Wasserstein distance between images enables us to track and classify automatically the evolving shapes of interfaces between cell populations by using time series from experimental or numerical studies. These techniques of topological data analysis are scalable and could be used in studies involving large amounts of data whenever available.
Our results allow to extract parameter values and to determine biologically relevant physical mechanisms for characterizing confluent motion of cellular aggregates, as described above. In particular: (i) cells at the interface are larger, inform the aggregate motion and are influenced by it, without needing leader cells to form fingers at the interface; and (ii) in colliding cellular aggregates, the solid or liquid like character of the cells (as determined by their junction parameters) decides the way the invasion goes. These aspects of our model are important in ascertaining how the biophysical features of materials influence tissue/organ regeneration [45]. Our work provides researchers in the field with useful tools to gain biological insight, to devise and to interpret data from experiments. To enhance the value of our results, e.g, for studies of metastatic cancer, future works may add cellular mechanisms such as Notch signaling dynamics [46], models of epithelial/mesenchymal transition and cancer stem cell formation [47, 48] to our vertex model. This venue has been successfully followed in studies of angiogenesis [49].
For the reader’s ease of use, this section makes more precise some definitions and includes simple examples to provide an intuitive idea of the meaning of persistent homology features. It is structured as follows. First, we give more precise definitions of persistent homology concepts. Second, we present applications to simpler synthetic data, for an easier visual interpretation of the results in the main text when interfaces are formed by many connected components. Finally, we discuss how to extract information on front roughness from numerical or experimental data, when interfaces define a single component.
As said before, a finite set of data points may be considered a sampling from the underlying topological space. Data structure can be investigated by creating connections between proximate data points, varying the scale over which these connections are made, and looking for features that persist across scales [59]. Homology distinguishes topological spaces (e.g., annulus, sphere, torus, or more complicated surface or manifold) by quantifying their connected components, topological circles, trapped volumes, and so forth. Persistent homology describes how the homology of a nested family of simplicial complexes changes with respect to a defining parameter. What is a simplicial complex S? To define it, we need three elements [66]:
A set of points X in a space of dimension D.
Sets of k-simplices [ν0, ν1, …, νk] with vertices νi ∈ S, i = 0, 1, …, k, for each k ≥ 1. A k-simplex is a k-dimensional polytope which is the convex hull of its k + 1 vertices:

A k-simplex has k + 1 faces, each constructed by deleting one of the vertices. The faces must satisfy the following property: If [ν0, ν1, …, νk] belongs to the simplicial complex S, then all its faces must also be in the simplicial complex S. This can be made more precise. The set of all k-simplices in S is a vector space Ck. The boundary of a k-simplex is the union of all its (k − 1)-subsimplices. For each k ≥ 1, the boundary map ∂k: Ck → Ck−1 is the linear transformation defined by

The motivation for studying the homology of simplicial complexes is the observation that two shapes can be distinguished by comparing their topological features. A disk is not a circle because the disk is solid, while the circle has a hole. Similarly, a circle is not a sphere, because the sphere encloses a two dimensional hole, whereas the circle encloses a one dimensional hole. To distinguish topological features, we need several definitions. Boundary operators connect the vector spaces Ck into a chain complex … → Ck+1 → Ck → Ck−1 → …→ C0 → 0. The kernel and image of boundary operators determine k-cycles Zk = Ker{∂k: Ck → Ck−1} and k-boundaries Bk = Im{∂k+1: Ck+1 → Ck, respectively. Since a boundary has no boundary [66], Bk is a subspace of Zk. Thus, Ck is the vector space of all k-chains in the simplicial complex Sr, Zk is the subspace of Ck consisting of k-chains that are also k-cycles, and Bk is the subspace of Zk consisting of k-cycles that are also k-boundaries. We say that two k-cycles are homologous (equivalent) if they differ by a k-boundary. This equivalence relation splits Zk in equivalence classes denoted by [z] if z ∈ Zk. The kth homology of Sr is the quotient set Hk = Zk/Bk comprising all equivalent k-cycles. The dimension of Hk, bk = dimZk − dimBk, is the kth Betti number. In terms of the topological characteristics, bk is the number of independent holes of dimension k. For instance, b0 is the number of connected components, b1 is the number of topological circles, b2 is the number of trapped volumes, and so on. The topology of a simplicial complex may be described by the sequence of Betti numbers, b = (b0, b1, …). For instance, a topological circle has b = (1, 1, 0, …), a topological torus has b = (1, 2, 1, 0, …), and a topological sphere has b = (1, 0, 1, 0, …,). Betti numbers are a topological invariant, meaning that topologically equivalent spaces have the same Betti number.
The Vietoris-Rips filtration (VRF) constructed in the main text provides an example. For each value of a scale proximity parameter r > 0 and a given set of points X, we form a simplicial complex VR(X, r) = Sr by finding all k-simplices such that all pairwise distances between their points are smaller than r. The simplicial complex Sr comprises finitely many simplices such that (i) every nonempty subset of a simplex in Sr is also in Sr, and (ii) two k-simplices in Sr are either disjoint or intersect in a lower dimensional simplex. Clearly, if r1 ≤ r2, then
b0 gives the number of components for a filtration value r, thereby providing the number of clusters in H0 for that r. We can use this information and the knowledge of the distance, to find out which points belong to which cluster. All the points of a cluster are connected in a simplex. Thus, we have a clustering strategy.
Similarly, b1 gives the number of 1-dimensional holes in H1 for a given value of the proximity parameter r. These holes may be inherent to the shape of a cluster or appear when basic clusters connect to more distant ones. See Figs 20 (one island), 21 (two islands) and 22 (seven islands). Thus, H1 contains information on both the structure of basic clusters and their relative arrangements. The accompanying barcodes may characterize interfaces or data sets.


Persistent homology for data on a circle.
(a) Noisy versus true circle data. (b) Barcodes for the Betti numbers b0 (H0) and b1 (H1) for the noisy data. (c) Persistence diagram for the noisy data with rmax = 4 and N = 100. (d)-(g) Vietoris-Rips simplicial complexes formed from the noisy data increasing the filtration parameter r. (h) Barcodes for the Betti numbers b0 and b1 for clean data on the circle. (i) Persistence diagram for clean data on the circle, with rmax = 3 and N = 100.
Next, we consider the homology of point clouds defining an interface between two populations, see Figs 20–22.


Persistent homology for the border of two colliding populations.
We should stress that these examples use schematic figures, with clear fronts, not results from experiments or simulations (which have larger noise, and less clear features). Thus, the persistent homology of schematic figures is clearer and easier to interpret. (a) Interface separating the two populations. (b) Barcodes for the Betti numbers b0 (H0) and b1 (H1), and (c) Persistence diagram with rmax = 0.4 and N = 100. (d)-(f) Vietoris-Rips simplicial complexes formed increasing the filtration parameter r. S17 Data have been used to draw this figure.
Fig 20(d)–20(f) visualize the filtration process: For a grid of values of the filtration distance parameter r, we depict balls centered at points with radius r and count the components (clusters) formed. Initially we have as many clusters as points. As the filtration parameter r increases, clusters merge and their number is reduced to two: the continuous front and the detached island in Fig 20(d), which are represented by the two top bars in H0 of Fig 20(b) and circles in Fig 20(c). Fig 20(e) exhibits only one cluster represented by the top bar in Fig 20(b), which persists for larger values of r and is represented by an arrow. Similarly, the largest bar in H1 of Fig 20(b) represents the dominant hole defined by the island border in Fig 20(d). This bar, and island, corresponds to the asterisk in Fig 20(c) placed furthest from the diagonal. The small bar corresponds to the asterisk in Fig 20(c) which is closest to the diagonal. As seen in Fig 20(e), the small bar forms and disappears as holes form when clusters merge during the filtration process.
Figs 21 and 22 can be similarly interpreted. Fig 21(d)–21(f) visualize the filtration process. Initially we have one cluster per point, represented as a bar in the top panel of Fig 21(b). As the filtration parameter r increases, clusters merge and the number of bars diminishes. Fig 21(d) exhibits three components, represented by the three top bars in H0 of Fig 21(b) and circles in Fig 21(c). There are two clusters in Fig 21(e) represented by the two top bars in Fig 21(b). Similarly, the two largest bars in H1 of Fig 21(b) represent the two dominant holes defined by the island borders in Fig 21(d). These bars, and islands, correspond to the two asterisks in Fig 21(c) furthest from the diagonal. The small bar corresponds to the circle in Fig 21(c) which is closest to the diagonal. As seen in Fig 21(e), it forms and disappears as holes form when clusters merge during the filtration process.
Fig 22 exhibits an increased number of islands and it is described in the same manner. The main eight components correspond to one interface and seven islands. They are represented by the top bars in H0 of Fig 22(b) and circles in Fig 22(c). Seven bars represent the seven dominant islands, associated to the seven largest bars in H1 of Fig 22(b) and asterisks in Fig 22(c). The remaining bars represent gaps in the simplicial structure formed during the filtration process. The largest one represent a late hole appearing due to the fact that some islands are far from the main interface front, and corresponds to the final asterisk distant from the diagonal.
When the interface is simply connected, homology studies of the boundary points, or all the population points in the plane, hardly give information on its roughness. Instead we may study the evolution of the H0 homology of slices x = c as c varies, see Figs 23 and 24. We choose as rmax the degree of roughness we want to capture, the ‘scale’ at which we wish to ‘resolve’. Consider the two dimensional region occupied by cells (magenta patch) in Fig 24(a). We build a square mesh of step smaller than rmax and consider the points that are inside the occupied region. We define a matrix on the mesh M(i, c), equal to one at points inside the patch, and zero outside. For each slice x = c, the points at which M(i, c) = 1 define a point cloud, we evaluate the zero homology of that cloud using as maximum filtration value rmax. The variation of the Betti number b0(rmax, c) with c measures how irregular the front is: the larger b0 is, the rougher the interface. As shown in Fig 24 corresponding to the spread assay of Fig 4(a), the rougher lower front has a larger maximal Betti number b0 than the upper front; cf Fig 24(c) versus Fig 24(b). However, the bigger and more persistent fingers of the upper front cause b0 to decay more slowly than the corresponding Betti number of the lower front. The fingers of the latter are smaller in size, cf Fig 24(a). Fig 23 visualizes the idea on fragments of this configuration. This complementary slice by slice study gives qualitative information on the shape. This strategy would allow to study the time evolution of 2D interfaces comparing the information obtained for each time, and comparing the effect of different controlling parameters on each population, as done in the main text for mixing interfaces.


For the expanding strip in (a), we get the Betti numbers b0 for the upper front (b), and for the lower one (c).
The lower front is rougher (larger b0) but its fingers are shorter (b0 decays faster to one and zero), whereas the upper front has a dominant persistent finger. We have set rmax = 1 mm in the scale of the image Fig 4. S22 and S23 Data have been used to draw this figure.
Fig 25 quantifies differences between configurations by means of distances between computational images [67]. To compare images and shapes, we first have to define measures ρj(x), x ∈ Ω, over grids of images j (j = 0, 1, 2, …). Given two images, j = 0 and 1, with measures ρ0(x) and ρ1(x), we define their Wasserstein-1 distance as a particular type of optimal transport distance [67],



(a)-(d) Consecutive snapshots of the evolution of the spreading configuration in Fig 4. (e)-(g) Dendrograms for hierarchical clustering constructed using Wasserstein distances W1,∞ between the four snapshots. We distinguish between overall snapshots (numbered 1 to 4) in Panel (e), and half snapshots representing right (numbered 1 to 4) and left (numbered 5 to 8) moving interfaces in Panels (f) and (g). (e) The smoother overall snapshots 1 and 2 (corresponding to Panels (a) and (b)) are clustered together and, likewise, the rougher overall snapshots 3 and 4 of Panels (c) and (d). (f) The Wasserstein distance clusters together successive pairs of left and right fronts. (g) Dendrograms using the Betti number profiles b0(rmax) (analogous to those in Fig 24) calculated for the left and right interfaces of each snapshot. Note that the interfaces of the first two snapshots are clustered together because they have similar roughness levels.
Parts of this work were carried out at the Workshop on Modeling Biological Phenomena from Nano to Macro Scales, held in 2018 at the Fields Institute in Toronto, Canada. We thank Prof. I. Hambleton, director of the Fields Institute, for the invitation and support during the Workshop. We thank Dr. Rastko Sknepnek for his help in implementing SAMoS software and for related and useful comments during the Workshop. The authors thank Prof. Russel Caflisch for hospitality during their stay at the Courant Institute of Mathematical Sciences, New York University, where the bulk of the work was carried out.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67