The authors have declared that no competing interests exist.
A game of rock-paper-scissors is an interesting example of an interaction where none of the pure strategies strictly dominates all others, leading to a cyclic pattern. In this work, we consider an unstable version of rock-paper-scissors dynamics and allow individuals to make behavioural mistakes during the strategy execution. We show that such an assumption can break a cyclic relationship leading to a stable equilibrium emerging with only one strategy surviving. We consider two cases: completely random mistakes when individuals have no bias towards any strategy and a general form of mistakes. Then, we determine conditions for a strategy to dominate all other strategies. However, given that individuals who adopt a dominating strategy are still prone to behavioural mistakes in the observed behaviour, we may still observe extinct strategies. That is, behavioural mistakes in strategy execution stabilise evolutionary dynamics leading to an evolutionary stable and, potentially, mixed co-existence equilibrium.
A game of rock-paper-scissors is more than just a children’s game. This type of interactions is often used to describe competition among animals or humans. A special feature of such an interaction is that none of the pure strategies dominates, resulting in a cyclic pattern. However, in wild communities such interactions are rarely observed by biologists. Our results suggest that this lack of cyclicity may stem from imperfectness of interacting individuals. In other words, we show analytically that heterogeneity in behavioural patterns may break a cyclic relationship and lead to a stable equilibrium in pure or mixed strategies.
The question frequently arising in ecology is: Under which conditions does a particular type of species survive? This question is also relevant in the context of understanding a wide range of environmental, social, genetic and other conditions potentially influencing evolutionary trajectories. Evolutionary game theory, a branch of game theory and ecological sciences, aims to answer that question [1–5]. One of the most well-known games applied to biology is the rock-paper-scissors game (RPS). Here, rock beats scissors, scissors beat paper and paper beats rock. Whether we are talking about population dynamics or economics and human behaviour, this game is known to illustrate salient features while being easy to understand (for a thorough review of the models used to study RPS games see [6]). In biology, this game was applied to explain cyclic dynamics in some species such as mating strategies of side-blotched lizards [7, 8] and phenotypic competition in bacterial strains of E. Coli [9, 10]. Furthermore, in the engineered microbial populations, introduction of such a competition seemed to stabilise the community [11] and even promote cooperation [12]. Moreover, it was suggested that introduction of new strategies into classic social dilemmas, such as loners [13–15] or risk-averse hedgers [16], can lead to cyclic competition. Nevertheless, cyclicity is rarely observed in wild communities of microbes [17], even though it was shown experimentally that behavioural heterogeneity in microbes can stabilise communities [18]. Recently, it was suggested that it might be challenging for such non-transitive competition to evolve in the first place [19]. However, even if cyclic competition emerges, its stability can be very sensitive to the exact balance in the community, potentially leading to the dominance of only one strategy [20]. In this paper, we utilise a game-theoretic concept of incompetence [21, 22] which allows individuals to make mistakes during the execution of their strategy. This results in a potentially unintended strategy being actually played during the interaction with another individual. We show that such an assumption can induce evolutionary stability in the initially unstable rock-paper-scissors dynamics and predict possible outcomes of the competition under the assumption of execution errors.
Behavioural stochasticity is an expanding field rich in different approaches to the problem. An approximation of behavioural errors of players in games was first considered as “trembling hands” [23] with the presence of mistakes during the strategies’ execution with some small probability. Later, in evolutionary games it was modelled via mutations [24, 25], language learning [26–28] or other experimental learning processes [29–32], adaptation dynamics [33], phenotypic plasticity [34], edge diversity in games on graphs [35, 36], and noise in continuous and discrete-time replicator dynamics [37–40]. Furthermore, mutations of players were introduced to the replicator dynamics via the replicator-mutator dynamics [28, 41], where each type has its own mutation rate but these mutations do not occur simultaneously. However, behavioural stochasticity at the moment of interaction was not considered in these studies.
An attempt to generalise players’ behavioural mistakes via the notion of incompetence was made in classic game theory [42]. Later, the concept of evolutionary games under incompetence was suggested to model such social problems of species in biological settings [22]. The notion of incompetence proposes a general framework for modelling behavioural mistakes with the underlying assumption that only one of the n non-cooperative strategies can be executed. That is, with a certain probability, individuals might execute a strategy different from the one they chose. In these settings, both players are prone to making mistakes resulting in stochastic payoffs of all involved individuals, altering overall population’s fitness.
Here, we consider the following scenario. Imagine, each randomly chosen individual finds itself in the pairwise interaction with another randomly chosen individual. Both of them choose a strategy to play. However, the chance that they will play their chosen strategies depends on two factors: on the overall level of behavioural plasticity in the population and a distribution of behavioural mistakes. If the population is completely homogeneous, then all interactions among the individuals are deterministic (λ = 1, see Fig 1A). However, if the population’s behaviour is plastic (λ < 1), then individuals may make mistakes when executing their chosen strategies. The probabilities of playing one or another strategy are determined both by the degree of plasticity, λ, and their maximal probabilities of mistakes captured in matrix S (λ = 0). The latter results in behavioural plasticity that perturbs the game outcome (see Fig 1B). In some games, execution errors mean that organisms are able to execute strategies required by the environmental conditions even when they make a wrong choice. That is, species execute strategies that are required for their survival in the environment, by mistake. We do not assume that they carry out this execution consciously. However, this random characteristic may be crucial when we consider changing environments where adaptation becomes particularly important and depends strongly on the interplay between behavioural patterns and fitness.


A schematic representation of behavioural mistakes in a rock-paper-scissors game.
(A) Rock-paper-scissors dynamics with pure strategies is described by a fitness matrix such that the cyclic relationship between the three strategies is promoted. (B) The effect of execution errors on the example of one interaction: here individual 1 has chosen strategy paper and individual 2 has chosen strategy rock. Without mistakes, individual 1 would win this instance of the contest. However, a mistake in the execution leads to mixed strategies being played for both individuals resulting in different possible outcomes of the interaction. Hence, the outcome of the game is no longer deterministic but stochastic and depends on the probability distribution of mistakes.
In low-dimensional games this interplay can be captured and analysed in detail. Unfortunately, it becomes challenging as dimensionality of a game grows where even small perturbations may impact an evolutionary outcome. However, under a natural assumption that behavioural mistakes are completely random, we can describe game behaviour for general n dimensions. We show that in such settings, strategies (or behavioural types) leverage their fitness advantage. This in turn might lead to only one strategy dominating. Further, we assume that mistakes do not have to be completely random. We consider a symmetric case of an unstable RPS game where no choice of strategies yields a fitness advantage. Such games lead to a heteroclinic orbit where none of the strategies dominate. We choose such settings precisely because it is challenging to induce stability in these games. By contrast, an initially stable version of the RPS game can promote biodiversity even in finite populations settings [43], and even very small perturbations can stabilise a classic version of the RPS game [44]. We show that behavioural mistakes bring asymmetry to the game, breaking the cyclic relationship and potentially leading to dominance of one of the strategies. That is, the structure of execution errors may technically imply the existence of an evolutionary stable interior point.
In this paper we focus on the RPS dynamics. Hence, we shall mostly work with the general form of R given by

In classic games, there is an underlying assumption that players are able to execute the chosen actions perfectly. We assume that actions selected by players may not coincide with the executed actions. Such behavioural stochasticity results in executing unintended strategies and is captured in matrix Q(λ) from [21] defined as



It is sufficient to consider the following simpler canonical form of the fitness matrix



In the evolutionary sense, behavioural mistakes lead to perturbations in fitness that populations obtain over time. This might be due to populations’ migration to new and unexplored environments or due to changing environments. Then, interacting individuals obtain a finite number, n, of available behavioural strategies. With the absence of mistakes, both interacting individuals are making their strategical choices which lead to some payoff according to the fitness matrix R. However, mistakes from matrix Q(λ) perturb the outcome of the interaction twice as both interacting individuals are prone to execution errors. Hence, the population dynamics now depends on the degree of plasticity, that is competency of individuals, according to replicator equations [46] defined as



The model proposed here was first referred to as a “game with incompetence of players” [21, 42]. That is, the matrix Q was consisting of probabilities of players’ mistakes, when they intended to execute strategy i but played strategy j instead. Such a model was inspired by an analogy with tennis players, where less experienced players are prone to hitting a different shot to one they initially intended. Here, players have a set of n possible shots to hit. Given the complexity level of the shot as well as players’ talents, those probabilities of mistakes will not be uniform. Moreover, players are learning while training and, hence, reducing their incompetence. This was captured in the parameter λ: with the level of mistakes decreasing as λ → 1.
This concept was next considered in the evolutionary settings as a modelling approach to adaptation to a new environment [22]. First, it was assumed that a population is immersed into a new environment, which can happen either due to migration of animals or changing environmental conditions. It is assumed that there are n behavioural types or strategies available to individuals. Then, new conditions might increase stress levels and force individuals’ behaviour to deviate from the one in the old environment. Such deviations are then captured in the matrix S. As time passes by, animals learn and adapt to their new environmental conditions, which is then reflected in the parameter λ. In such settings, one can also assume some form of learning dynamics, λ(t) [47].
Another possible way to think about this model, is to apply it at a genetic level [48]. That is, we would construct a game between n pure types, for instance, genes in microbes. The time-dependent process of λ(t) evolving from 1 to 0 can then be considered more as environmental stimuli dynamics and have various functional forms reflecting environmental fluctuations. Matrices S and Q would represent levels of phenotypic plasticity, where each phenotype would allow some mixing between n genes that depend on the level of environmental stimuli. Then, natural selection would drive the evolution, which might result in extinction of one type or another. This also depends on the assumption concerning the exact form of environmental fluctuations.
Here, we focus on the more general interpretation of λ as the strength of behavioural plasticity. For this general approach we do not impose any time-dependence on λ. Instead, we study all possible equilibria for each of the values of λ in the interval [0, 1]. Every pure strategy i has an assigned probability distribution captured in the matrix Q(λ). When λ = 0, the population utilises a limiting distribution of mistakes S and has maximal plasticity. When λ = 1, the population’s behaviour is deterministic and no plasticity is observed. This can be interpreted as an approach to modelling behavioural heterogeneity or noise in interactions. Specifically, in the settings of phenotypic plasticity, it is natural to assume a complete randomisation in the strategy execution corresponding to S being comprised of uniformly distributed probability vectors. However, in terms of adaptations to new environmental conditions, probability of mistakes may differ depending on the strategy being chosen. Thus, we shall assume a general form of matrix S. Next, we shall first demonstrate this model on some examples.
First, consider phenotypic behavioural plasticity as an interpretation of the model. In such settings, it is natural to assume that “execution errors” are symmetric and equally likely. That is, let us assume that if λ = 0, then individuals are completely random in their strategic choice. Then, all components of matrix S are equal and are given by

Game flows for different values of λ are depicted in Fig 2. For λ = 1, the game possesses an unstable mixed equilibrium


Game flow for the unstable RPS game with uniform mixed strategies for different values of λ.
Here, a stable fixed point is denoted by a red circle and a unstable fixed point is denoted by a white circle. The colour in the interior of the simplex indicates the rate of change: from slow (blue) to fast (red). In this example, completely random execution errors lead to the dominance of the rock strategy. We use the Wolfram Mathematica project [49] to produce these phase planes.
Since the stable equilibrium is a strict Nash equilibrium, it is an evolutionary stable strategy (ESS) [4]. However, for any given λ and strategy choice,

The assumption that individuals make mistakes completely at random is somewhat limiting. In some cases, more freedom in the definition of individuals’ plasticity is required. For instance, if we assume that λ is interpreted as an adaptation process to new environmental conditions, then some behavioural choices may have different distributions of mistakes. For instance, let us consider an example where the fitness matrix R is given as follows

Note that the determinant of R is negative, which implies that this game possesses an interior fixed point (
The exact probability distributions captured in S would depend on the particular situation and species under consideration. Let us demonstrate the influence of execution errors on the following example of matrix S. Assume that at the highest level of execution errors (λ = 0) individuals play each of their chosen strategies with probability not less than

Game flows for different values of λ are depicted in Fig 3. As λ varies from 1 to 0, the game dynamics go through several transitions (see panel A for the overview). The first transition happens at


Game transitions under execution errors from example 2.
(A) Frequencies of each strategies in the interior equilibrium as functions of λ. Here, x1 represents rock frequency, x2—paper frequency and x3—scissors frequency. The interior equilibrium exists for most the values of λ but (
These examples demonstrate that execution errors might break the heteroclinic orbit by introducing a stable equilibrium in the game. That is, stochasticity induced by mistakes might stabilise dynamics that were unstable before. In addition, in the case of players executing only mixed strategies, the game might obtain a stable interior point altering its original dynamics (see Figs 2C and 3J). In the following analysis we shall examine possible transitions in unstable RPS games. We aim to define conditions under which we can secure existence of a stable equilibrium.
Let us first consider the case when behavioural mistakes are completely random. Such settings can be interpreted as either a form of phenotypic plasticity or just noise in the interactions. Then, the matrix S is such that any strategy obtains the same probability of mistakes, that is,

Result 1. Let
In other words, if in a row-sum-constant game everyone is making mistakes with the same probabilities, then population dynamics are invariant under these mistakes. However, diversity in fitness advantages between the strategies might help one of the groups to benefit from behavioural heterogeneity of the population by leveraging its fitness advantage. We can calculate the interior fixed point in a general row-sum case as follows:
Result 2. Let

Note that, the point
Note that Result 2 holds for any number of strategies n and any game. For a general form of the result see S1 File.
Result 2 implies that the interior equilibrium of the original game is shifted by behavioural heterogeneity and drives less fit strategies to extinction. However, the observed strategy will remain the same for any dominating pure strategy due to the symmetry in mistakes distributions. Hence, uniform S introduces evolutionary stability in the games with heteroclinic cycles. Moreover, for the extreme case of behavioural plasticity (λ ≈ 0), this equilibrium will be close to a completely mixed equilibrium (
Next we address the question: What if behavioural mistakes of individuals are not necessarily uniformly distributed? For instance, if we treat the parameter λ as some form of adaptation or learning, then the probabilities of mistakes might be different for different strategies. In such a case, we consider the general form of matrix S as in Eq (3). In order to study the effect of the limiting distribution of mistakes (as λ → 0), we shall focus on the form of a RPS game, where no strategy gains a fitness advantage. That is, we assume a row-sum-constant fitness matrix with an unstable equilibrium in the centre of the simplex by letting a1 = a2 = a3 = a and b1 = b2 = b3 = b in the matrix (1). The condition a > b ensures instability of the interior fixed point (
In three dimensions (see (6)), transitions in a game are caused by either the elements
Note that for a homogeneous population (λ = 1) the interior fixed point,

Hence, depending on the probabilities of mistakes, we can describe possible transitions in the game dynamics induced by the changes in the strength of behavioural plasticity, λ. For instance, the game might possess an unstable interior equilibrium for any λ. However, the stability of the vertices will be disturbed as the entries of

![Various game transitions of the unstable RPS game as λ varies between [0, 1].](/dataresources/secured/content-1766064080225-4e5de553-43d5-45bd-a4e0-7d0f72a62b63/assets/pcbi.1008523.g004.jpg)
Various game transitions of the unstable RPS game as λ varies between [0, 1].
The components of the interior fixed point are plotted as functions of λ. Further, the coloured bar at the top of the plot indicates stability intervals of λ for different vertices (a stable vertex is indicated on top of the bar). (A) The interior fixed point exists for all λ but vertices interchange their stability. In the limit of mistakes (λ → 0), two vertices are stable. (B) The interior fixed point exists for a sub-interval and vertices interchange their stability. As λ → 0, two vertices are stable. (C) The interior fixed point exists for two sub-intervals of (0, 1). In the limit of mistakes (λ → 0), only vertex 1 is stable. (D) The interior fixed point exists for almost all values of λ. In the limit of mistakes (λ → 0), all three vertices are stable. Generally, the exact equilibria transitions and existence of an interior equilibrium is determined by the limiting distribution of mistakes, S. We found that for almost all matrices S there is a high chance that at least one of the pure strategies will become dominant.
Generally, components of the interior equilibrium are rational functions with numerators and denominators being 4-th order polynomials in λ. Consequently, there is a variety of possible behaviours. However, we can determine strict conditions for a vertex to be stable, based on its behaviour for a mixed strategy profile captured in the corresponding rows of the matrix S. Specifically, we can determine those conditions in the following result (see S1 File for more details).
Result 3. Let λc ∈ (0, 1) be such that

This result follows from the fact that as the population becomes more plastic as λ → 0 and R(λ)→SRST, the canonical form of the fitness matrix is reduced to

Note that for λ = 0, conditions (10) imply stability of vertex j for any number of strategies n and any game.
Note that since the stable equilibrium is pure, it is a strict Nash equilibrium. Hence, the original replicator dynamics obtains an evolutionary stable point under the assumptions of our model. In fact, by Eq (7), according to the strategy execution of individuals, we obtain a stable point in the interior that corresponds to si, where i is a stable vertex. A schematic representation of such a transformation can be found in Fig 5.


A schematic representation of a possible influence of execution errors on the RPS dynamics.
The original game possesses an unstable equilibrium
As demonstrated in Examples 1 and 2, while λ decreases from 1 to 0, the dynamics can experience several bifurcations where an equilibrium can emerge on one of the edges. An edge-equilibrium is characterised by exactly one of the components of the equilibrium being 0, that is,
Overall, when strategies are initially equivalent in their fitness advantages in the non-plastic game, the asymmetry in matrix Q(λ) introduces asymmetry in the game
Much research has been devoted to describing behavioural mistakes of organisms and how those mistakes affect the outcome of the evolutionary competition. In addition, the RPS game itself received a lot of attention due to its ability to describe cyclic competitive interactions. However, such cycles are rarely observed in nature. We propose that behavioural heterogeneity or noise can induce stabilisation of communities driving them to evolutionary stable outcomes. Our model introduces behavioural mistakes in the context of a cyclic RPS game. Here, behavioural mistakes imply that individuals might execute a strategy different from the intended one. We encode all probabilities of mistakes in a matrix Q(λ) and allow individuals to play either a mixed or pure strategy. The degree of plasticity is captured by the parameter λ varying from 1 (no plasticity) to 0 (maximum plasticity).
We then explore the influence of the limiting distribution of mistakes captured in matrix S on the evolution of social behaviour of species. Depending on the matrix S, different pure strategies might benefit from those mistakes. Such matrix captures mistake probabilities for the limiting case of λ = 0. We analyse the interplay of learning and fitness advantages and define conditions under which strategies can prevail. For example, in the case with completely random mistakes, the most beneficial strategy is the strategy with the highest relative fitness advantage (see Result 2). However, it does not change the outcome of the evolution since in this case it will be a completely mixed interior point.
One can also interpret our model as adaptation to new environmental conditions. Then, it is natural to expect that specific environments require different strategies to be adopted. For instance, in the case with an RPS game with the interior equilibrium (
Interestingly, at λ = 0, strategies are leveraging the advantage they can gain from mistakes from maximum plasticity. For instance, in the case with a general form of limiting probability distribution, stability of a pure strategy is determined by its plastic response to itself (see Result 3). For a strategy to become stable, it is necessary to be uninvadable by the other two plastic strategies.
Overall, behavioural heterogeneity, captured through the execution noise, might help species to benefit from behavioural heterogeneity or plasticity. The ability of our model to induce a stable equilibrium in the unstable game might help in explaining why such unstable RPS dynamics are not observed in wild communities. That is, plasticity in behaviour might help to stabilise the evolutionary outcome and sometimes enable one of the strategies to become dominant.
Authors would like to thank Christian Hilbe and Martin Nowak for their inspiring and very helpful feedback on the manuscript.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51