PEtab—Interoperable specification of parameter estimation problems in systems biology

PLoS Computational Biology

Home PEtab—Interoperable specification of parameter estimation problems in systems biology

Leonard Schmiester, Yannik Schälte, Frank T. Bergmann, Tacio Camba, Erika Dudkin, Janine Egert, Fabian Fröhlich, Lara Fuhrmann, Adrian L. Hauber, Svenja Kemmer, Polina Lakrisenko, Carolin Loos, Simon Merkt, Wolfgang Müller, Dilan Pathirana,

... See all authors

The authors have declared that no competing interests exist.

‡JH and DW also contributed equally to this work.

https://doi.org/10.1371/journal.pcbi.1008646, Volume: 17, Issue: 1, Pages: 1-10

Article Type: Research Article Article History

Publisher: Public Library of Science

Altmetric

Table of Contents

Introduction
Design and implementation
Results
Availability and future directions
Supporting information

Abstract

Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been—so far—no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies.

Parameter estimation is a common and crucial task in modeling, as many models depend on unknown parameters which need to be inferred from data. There exist various tools for tasks like model development, model simulation, optimization, or uncertainty analysis, each with different capabilities and strengths. In order to be able to easily combine tools in an interoperable manner, but also to make results accessible and reusable for other researchers, it is valuable to define parameter estimation problems in a standardized form. Here, we introduce PEtab, a parameter estimation problem definition format which integrates with established systems biology standards for model and data specification. As the novel format is already supported by eight software tools with hundreds of users in total, we expect it to be of great use and impact in the community, both for modeling and algorithm development.

Schmiester,Schälte,Bergmann,Camba,Dudkin,Egert,Fröhlich,Fuhrmann,Hauber,Kemmer,Lakrisenko,Loos,Merkt,Müller,Pathirana,Raimúndez,Refisch,Rosenblatt,Stapor,Städter,Wang,Wieland,Banga,Timmer,Villaverde,Sahle,Kreutz,Hasenauer,Weindl,and Schneidman-Duhovny: PEtab—Interoperable specification of parameter estimation problems in systems biology

Introduction

Dynamical modeling is central to systems biology, providing insights into the underlying mechanisms of complex phenomena [1]. It enables the integration of heterogeneous data, the testing and generation of hypotheses, and experimental design. However, to achieve this, the unknown model parameters commonly need to be inferred from experimental observations.

Various software tools exist for simulating models and inferring parameters [2–10], which implement various methods and algorithms. Many of these tools support community standards for model specification to facilitate reproducibility, interoperability and reusability. In particular the Systems Biology Markup Language (SBML) [11], CellML [12] and the BioNetGen Language (BNGL) [13] are widely used.

The Simulation Experiment Description Markup Language (SED-ML) builds on top of such model definitions and allows for a machine-readable description of simulation experiments based on XML [14]. Also more complex simulation experiments like parameter scans can be encoded, and a human-readable adaptation is provided by the phraSED-ML format [15]. Similarly, the XML-based Systems Biology Results Markup Language (SBRML) was designed to associate models with experimental data and share simulation experiment results in a machine-readable way [16]. Like SED-ML, SBRML can also be used for parameter scans. Complementary, SBtab is a set of table-based conventions for the definition of experimental data and models designed for human-readability and -writability [17].

However, parameter estimation is so far not in the scope of any of the available formats, and important information for it, like the definition of a noise model, is missing. Parameter estimation toolboxes usually use their own specific input formats, making it difficult for the user to switch between tools to benefit from their complementary functionalities and hindering reusability and reproducibility.

Based on our experience with parameter estimation and tool development for systems biology, we developed PEtab, a tabular format for specifying parameter estimation problems. This includes the specification of biological models, observation and noise models, experimental data and their mapping to the observation model, as well as parameters in an unambiguous way.

Design and implementation

Scope

The scope of PEtab is the full specification of parameter estimation problems in typical systems biology applications. In our experience, a typical setup of data-based modeling starts either with (i) the model of a biological system that is to be calibrated, or with (ii) experimental data that are to be integrated and analyzed using a computational model. Measurements are linked to the biological model by an observation and noise model. Often, measurements are taken after some perturbations have been applied, which are modeled as derivations from a generic model (Fig 1A). Therefore, one goal was to specify such a setup in the least redundant way. Furthermore, we wanted to establish an intuitive, modular, machine- and human-readable and -writable format that makes use of existing standards.

Fig 1

Specifying parameter estimation problems in PEtab.

(A) Example of a typical setup for data-based modeling. Usually, a model of a biological system is developed and calibrated based on measurements from perturbation experiments, which are linked to the biological model by an observation model. Different instances of a generic model are used to account for different perturbations or measurement setups. (B) Simplified illustration of how different entities from (A) map to different PEtab files (not all table columns are shown).

PEtab problem specification format

PEtab defines parameter estimation problems using a set of files that are outlined in Fig 2. A detailed specification of PEtab version 1 is provided in supplementary file S1 File, as well as at https://github.com/PEtab-dev/PEtab. Additionally, we created a tutorial illustrating how to set up a PEtab problem, covering the most common features (supplementary file S2 File). Further example problems can be found at https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab. The different files specify the biological model, the observation model, experimental conditions, measurements, parameters and visualizations (Fig 1B). These files are described in more detail in the following.

Fig 2

Overview of PEtab files and the most important features.

PEtab consists of a model in the SBML format and several tab-separated value (TSV) files to specify measurements and link them to the model. A visualization file can be provided optionally. A YAML file can be used to group the aforementioned files unambiguously.

Model (SBML): File specifying the biological process using the established and well-supported SBML format [11]. Any existing SBML model can be used without modification. All versions of SBML are supported by PEtab and can be used if the specific toolbox supports it.

Experimental conditions (TSV): File specifying the condition(s), such as drug stimuli or genetic backgrounds, under which the experimental data were collected. These experimental conditions specify model properties that are altered between conditions, and allow for a hierarchical specification of model properties (Fig 3A). If simulation conditions are used for pre-equilibration—meaning that some experiment started from the equilibrium reached for another condition—specific model states can be marked for re-initialization (Fig 3B).

Fig 3

Parameter hierarchy and pre-equilibration in PEtab.

(A) Illustration of possibilities and precedence of parameter overriding at different stages. The generic model parameter vector, as specified in the SBML model, can be overridden via the observable, measurement, condition and parameter tables, differentially for conditions and measurement points to account for different model inputs or observational model parameters. The parameters that are overridden in each step are indicated with thicker cell borders. Individual parameters can be set to specific values or marked to be estimated (as here p1). (B) In an often encountered experimental setup, a biological system is under some “baseline” condition and assumed to be in equilibrium (e.g., here depicted for after 24h incubation) before a perturbation is applied. If the equilibrium state of the system is not known a priori, such a setup can be modeled by simulating the system until an apparent steady-state is reached (pre-equilibration). To simulate the perturbation, a subset of model states are reinitialized.

Observables (TSV): File linking model properties such as state variables and parameter values to measurement data via observation functions and noise models. Various noise models including normal and Laplace distributions are supported, and noise model parameters can be estimated. Observables can be on linear or logarithmic scale.

Measurements (TSV): File specifying and linking experimental data to the experimental conditions and the observables via the respective identifiers. Optionally, simulation conditions for pre-equilibration can be defined (Fig 3B). Parameters that are relevant for the observation process of a given measurement, such as offsets or scaling parameters, can be provided along with the measured values. This allows for overriding generic output parameters in a measurement-specific manner (Fig 3A).

Parameters (TSV): File defining the parameters to be estimated, including lower and upper bounds as well as transformations (e.g., linear or logarithmic) to be used in parameter estimation. Furthermore, prior information on the parameters can be specified to inform starting points for parameter estimation, or to perform Bayesian inference.

Visualization (TSV): Optional visualization file specifying how to combine data and simulations for plotting. Different plots such as time-course or dose-response curves can be automatically created based on this file using the PEtab Python library described below. This allows, for example, to quickly create visualizations to inspect parameter estimation results. A default visualization file can be automatically generated.

PEtab problem file (YAML): File linking all of the above-mentioned PEtab files together. This allows combinations of, e.g., multiple models or measurement files into a single parameter estimation problem, as well as easy reuse of various files in different parameter estimation problems (e.g., for model selection). The current YAML version 1.2 is used here.

We designed PEtab to cover common features needed for parameter estimation. The TSV files comprise different mandatory columns. These provide all necessary information to define an objective function like the χ² or likelihood function. However, some methods tailored to specific problems require additional information to estimate the unknown parameters. To acknowledge this, we allow for optional application-specific extensions in addition to the required columns in the PEtab files, e.g., if some parameters can be calculated analytically using hierarchical optimization approaches [18].

PEtab library

To facilitate easy usability, PEtab (https://github.com/PEtab-dev/PEtab) comes with detailed documentation describing the specific format of each of the different files in a concise yet comprehensive manner. Additionally, we provide a Python-based library that can be used to read, modify, write, and validate existing PEtab problems. Furthermore, the PEtab library provides functionality to package PEtab files into COMBINE archives [19]. After parameter estimation, the modeler usually investigates how well the model fits the experimental data. To support this, the PEtab library provides various visualization routines to analyze data and parameter estimation results.

Results

PEtab support in established tools

We implemented support for PEtab in currently eight systems biology toolboxes, namely COPASI [2], AMICI [6], pyPESTO [20], pyABC [21], Data2Dynamics [5], dMod [10], parPE [18], and MEIGO [4]. These toolboxes provide a broad range of distinct features for model creation, model simulation, parameter inference, and uncertainty quantification (Table 1). Combining different tools with complementary features is often desirable. However, in practice this was hitherto hampered by the substantial overhead of tedious and error-prone re-implementation of the parameter estimation problem in the specific format required by the respective tool. With all of these tools now supporting PEtab, a user can more easily combine different tools and make use of their specific strengths. For example, one can use COPASI for model creation and testing, AMICI for efficient simulation of large models, pyPESTO for multi-start local optimization and sampling, or MEIGO for global scatter searches, and Data2Dynamics or dMod for profiling. The ease of switching between tools also provides the opportunity to easily reproduce and verify results, e.g., whether different tools yield similar results. Additionally, developers can compare the performance of newly developed methods with existing algorithms implemented in different toolboxes, independent of the programming language, to select the most appropriate one for a given setting.

Table 1

The list of supporting tools and functionality covered by the respective tools may increase over time. An updated overview is available on the PEtab website. Darker colors indicate more accurate, scalable, or broader functionality compared to basic implementations.

Non-exhaustive overview of the functionality offered by the different toolboxes currently supporting the PEtab format.

	COPASI	D2D	dMod	MEIGO	AMICI	parPE	pyABC	pyPESTO
Interface / Language	Graphical interface	MATLAB	R	MATLAB	C++, Python, MATLAB	C++	Python	Python
Model construction	Advanced	Basic	Basic	No	No	No	No	No
Model simulation	Accurate	Accurate, Scalable	Accurate, Scalable	Uses AMICI	Accurate, Scalable	Uses AMICI	Uses AMICI	Uses AMICI
Gradient computation	Approximative	Accurate	Accurate	Uses AMICI	Accurate, Scalable	Uses AMICI	No	Uses AMICI
Gradient-free (global) parameter estimation	Multiple algorithms	Basic	No	Metaheuristic algorithms	No	No	No	Basic
Gradient-based parameter estimation	Basic	Multiple local optimizers	Multiple local optimizers	Metaheuristic algorithms	No	Multiple local optimizers	No	Multiple local optimizers
Parameter profile likelihood	Basic	Advanced	Advanced	No	No	No	No	Advanced
Prediction profile likelihood	Basic	Advanced	Advanced	No	No	No	No	No
Parameter sampling	No	Basic	No	No	No	No	Adaptive SMC algorithms	Multiple MCMC algorithms
Simulation of stochastic models	Multiple algorithms	No	No	No	No	No	No	No
Parameter inference for stochastic models	Basic	No	No	Basic	No	No	Scalable	Basic
Particular strengths	Advanced modeling Strong allrounder Graphical interface	Powerful gradient based optimization Advanced profiling Strong allrounder	Powerful gradient based optimization Strong allrounder	Powerful metaheuristic optimization	Highly scalable simulation & gradient comp.	Highly scalable optimization for large-scale models	Scalable likelihood-free inference	Allrounder Multiple MCMC samping methods

PEtab test suite and examples

Along with introducing PEtab support to different tools, we have set up a test suite with various toy problems and reference values that can be used by other tool developers to assess and verify PEtab support in their software packages. The specific status of the PEtab support of the different tools is provided in Table 2 and continuously updated on the PEtab GitHub webpage. The test cases are based on SBML level 2 version 4 which is supported by all considered toolboxes.

Table 2

The first character indicates whether computing simulated data is supported and simulations are correct (✓) or not (-). The second character indicates whether computing χ² values of residuals are supported and correct (✓) or not (-). The third character indicates whether computing likelihoods is supported and correct (✓) or not (-). An up-to-date overview of supported features is maintained on the PEtab GitHub page.

Overview of supported PEtab features in different tools, based on passed test cases of the PEtab test suite.

Test-case	AMICI	Copasi	D2D	dMod	MEIGO	parPE	pyABC	pyPESTO
Basic simulation	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Multiple simulation conditions	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Numeric initial compartment sizes in condition table	- - -	✓ - -	✓✓✓	✓✓✓	✓✓✓	- - -	- - -	- - -
Numeric initial concentration in condition table	✓✓✓	✓ - -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Numeric noise parameter overrides in measurement table	✓✓✓	✓ - -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Numeric observable parameter overrides in measurement table	✓✓✓	✓ - -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Observable transformations to log scale	✓- ✓	✓- -	✓✓✓	✓✓-	✓✓✓	- - ✓	✓- ✓	✓- ✓
Observable transformations to log10 scale	✓- ✓	✓- -	✓✓✓	✓✓-	✓✓✓	- - ✓	✓- ✓	✓- ✓
Parametric initial concentrations in condition table	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Parametric noise parameter overrides in measurement table	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Parametric observable parameteroverrides in measurement table	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Parametric overrides in condition table	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Partial pre-equilibration	✓✓✓	- - -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Pre-equilibration	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Replicate measurements	✓✓✓	✓- -	✓✓✓	✓✓✓	✓✓✓	- - ✓	✓✓✓	✓✓✓
Time-point specific overrides in the measurement table	- - -	- - -	✓✓✓	✓✓✓	✓✓✓	- - -	- - -	- - -

To demonstrate the various features and the broad applicability of PEtab, we provide a growing collection of currently 20 example parameter estimation problems in the PEtab format largely based on a previously published benchmark collection [22]. These models can be used as templates for creating new PEtab problems and for method development and testing.

Availability and future directions

PEtab complements existing standards for model definition by facilitating the specification of complex estimation problems using tabular text files, defining experimental measurements and linking model entities and measurements via observables and a noise model.

The specification of the PEtab format, the PEtab Python library, as well as links to examples, a web-based validation tool, and all supporting software are available at https://github.com/PEtab-dev/PEtab. A snapshot is available at https://doi.org/10.5281/zenodo.3732958. PEtab and all original content presented here is available under permissive licences. For any questions or requests related to PEtab, we encourage interested users to approach us via the Issues function in the aforementioned GitHub repository, or the respective tool repositories for more specific queries.

We developed PEtab to cover the most common features needed for parameter estimation in the context of dynamic modeling. However, as multiple model formats as well as a multitude of tailored parameter estimation methods exist, which require different information, we could not cover every aspect. While at the time of writing, PEtab only allows for models defined in the SBML format, the PEtab format is general enough to be integrated with other model specification formats like CellML and rule-based formats [13] in the future. Additionally, other formats like SBtab [17] or Antimony [23] provide converters to SBML and can therefore also indirectly be used together with PEtab. Recently, new methods have been developed to estimate parameters in a hierarchical manner [18], including from qualitative data [24, 25]. PEtab could be extended to also allow for these types of measurements. To cover the most important needs, we invite users and developers to suggest new features to be supported by PEtab. We formed a maintainer team comprising developers of all supporting toolboxes to facilitate long-term support and improvement of PEtab. We encourage additional toolbox developers to implement support for PEtab. As an example, since the preprint publication of this manuscript, PEtab has already been adopted as the input format for a newly developed tool, SBML2Julia [26].

As PEtab is already supported by software tools with hundreds of users in total, we envisage that it will facilitate reusability, reproducibility and interoperability. We expect that a common specification format will prove helpful for users as well as developers of parameter estimation tools and methods in systems biology.

Acknowledgements

We thank Dagmar Waltemath for helpful discussions.

References

HKitano. Computational Systems Biology. Nature. 2002;420(6912):206–210. 10.1038/nature01254

SHoops, SSahle, RGauges, CLee, JPahle, NSimus, et al COPASI—a COmplex PAthway SImulator. Bioinformatics. 2006;22(24):3067–3074. 10.1093/bioinformatics/btl485

EBalsa-Canto, JRBanga. AMIGO, a toolbox for advanced model identification in systems biology using global optimization. Bioinformatics. 2011;27(16):2311–2313. 10.1093/bioinformatics/btr370

JAEgea, DHenriques, TCokelaer, AFVillaverde, AMacNamara, DPDanciu, et al MEIGO: An open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinf. 2014;15(136). 10.1186/1471-2105-15-136

ARaue, BSteiert, MSchelker, CKreutz, TMaiwald, HHass, et al Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics. 2015;31(21):3558–3560. 10.1093/bioinformatics/btv405

FFröhlich, BKaltenbacher, FJTheis, JHasenauer. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks. PLoS Comput Biol. 2017;13(1):e1005331 10.1371/journal.pcbi.1005331

KChoi, JKMedley, MKönig, KStocking, LSmith, SGu, et al Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Bio Systems. 2018;171:74–79. 10.1016/j.biosystems.2018.07.006

PStapor, DWeindl, BBallnus, SHug, CLoos, AFiedler, et al PESTO: Parameter EStimation TOolbox. Bioinformatics. 2018;34(4):705–707. 10.1093/bioinformatics/btx676

EDMitra, RSuderman, JColvin, AIonkov, AHu, HMSauro, et al PyBioNetFit and the Biological Property Specification Language. iScience. 2019;19:1012–1036. 10.1016/j.isci.2019.08.045

DKaschek, WMader, MFehling-Kaschek, MRosenblatt, JTimmer. Dynamic Modeling, Parameter Estimation, and Uncertainty Analysis in R. J Stat Softw. 2019;88(10). 10.18637/jss.v088.i10

MHucka, AFinney, HMSauro, HBolouri, JCDoyle, HKitano, et al The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. 10.1093/bioinformatics/btg015

AACuellar, CMLloyd, PFNielsen, DPBullivant, DPNickerson, PJHunter. An Overview of CellML 1.1, a Biological Model Description Language. Simulation. 2003;79(12):740–747. 10.1177/0037549703040939

LAHarris, JSHogg, JJTapia, JAPSekar, SGupta, IKorsunsky, et al BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. 2016;32(21):3366–3368. 10.1093/bioinformatics/btw469

DWaltemath, RAdams, FTBergmann, MHucka, FKAKMiller, IIMoraru, et al Reproducible computational biology experiments with SED-ML—The Simulation Experiment Description Markup Language. BMC Syst Biol. 2011;5(198). 10.1186/1752-0509-5-198

KChoi, LPSmith, JKMedley, HMSauro. phraSED-ML: A paraphrased, human-readable adaptation of SED-ML. Journal of bioinformatics and computational biology. 2016;14(06):1650035 10.1142/S0219720016500359

JODada, ISpasić, NWPaton, PMendes. SBRML: a markup language for associating systems biology data with models. Bioinformatics. 2010;26:932–938. 10.1093/bioinformatics/btq069

TLubitz, JHahn, FTBergmann, ENoor, EKlipp, WLiebermeister. SBtab: a flexible table format for data exchange in systems biology. Bioinformatics. 2016;32(16):2559–2561. 10.1093/bioinformatics/btw179

LSchmiester, YSchälte, FFröhlich, JHasenauer, DWeindl. Efficient parameterization of large-scale dynamic models based on relative measurements. Bioinformatics. 2019;36(2):594–602.

FTBergmann, RAdams, SMoodie, JCooper, MGlont, MGolebiewski, et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics. 2014;15:369 10.1186/s12859-014-0369-z

Schälte Y, Fröhlich F, Stapor P, Wang D, Weindl D, Schmiester L, et al.. ICB-DCM/pyPESTO: pyPESTO 0.0.11; 2020. Available from: 10.5281/zenodo.3715448.

EKlinger, DRickert, JHasenauer. pyABC: distributed, likelihood-free inference. Bioinformatics. 2018;34(20):3591–3593. 10.1093/bioinformatics/bty361

HHass, CLoos, ERaimúndez-Álvarez, JTimmer, JHasenauer, CKreutz. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics. 2019;35(17):3073–3082. 10.1093/bioinformatics/btz020

LPSmith, FTBergmann, DChandran, HMSauro. Antimony: a modular model definition language. Bioinformatics. 2009;25(18):2452–2454. 10.1093/bioinformatics/btp401

EDMitra, RDias, RGPosner, WSHlavacek. Using both qualitative and quantitative data in parameter identification for systems biology models. Nature communications. 2018;9(1):3901 10.1038/s41467-018-06439-z

LSchmiester, DWeindl, JHasenauer. Parameterization of mechanistic models from qualitative data using an efficient optimal scaling approach. J Math Biol. 2020;81(2):603–623. 10.1007/s00285-020-01522-w

Lang PF, Shin S, Zavala VM. SBML2Julia: interfacing SBML with efficient nonlinear Julia modelling and solution tools for parameter optimization. arXiv preprint arXiv:2011.02597. 2020.