The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.
Intrinsically disordered regions (IDRs) of proteins do not adopt a highly populated structure in isolation, but sample a wide range of conformations. IDR-mediated interactions are implicated in a variety of cellular processes from signal transduction and liquid-liquid phase transition (1–5). IDRs are subjected to extensive pre- and post-translational regulation to modulate protein function in response to cellular stimuli (6–9). Many functions of IDRs, such as entropic springs, flexible linkers or spacers are directly associated with their structural attributes (10,11).
Proteins containing intrinsically disordered regions are present in a considerable fraction of the proteome of eukaryotic organisms (e.g. 43.6% in humans based on MobiDB-lite predictions). In the human proteome, for example, IDRs are highly enriched in interaction interfaces and post-translational modification sites, which serve as regulatory switches of a variety of biochemical pathways (12). Short linear motifs in the IDRs of viruses lead to functional promiscuity and enable compact viral proteomes to extensively rewire their host cells (13). Given the lack of a requirement for a stable globular fold, interfaces in IDRs also appear to have higher evolutionary plasticity allowing them to proliferate in a proteome by ex nihilo evolution. As a result, IDR interfaces are possibly more widespread than ordered interfaces (14). Despite their central roles in key cellular regulatory processes, only a small fraction of disordered regions have been experimentally characterised (15,16). While difficulties in protein expression, purification and structural characterisation hamper experimental characterization, assigning functional modules to dynamic conformational ensembles presents a technical problem for a database.
Manually curated databases, such as DisProt (17,18) and IDEAL (19), collect annotations from the literature and standardise that information. These databases assign the position of IDRs in the sequence, and aim to assemble disorder-related functional annotations. Other databases focus on functional features: DIBS (15) and MFIB (20) collect regions undergoing folding upon binding, and ELM (16) annotates linear motifs. IDRs forming fuzzy complexes are collected in the FuzDB database (21), which also aims to establish links between structural properties and function. Proteins reported to undergo liquid-liquid phase separation (LLPS) typically contain IDRs, which are available in PhasePro (22), PhaSepDB (23) and LLPSDB (24). Altogether, these specialized datasets exhibit a limited overlap, which presents a bottleneck for comprehensive characterisation of the IDRs contained therein (25).
Large-scale identification of IDRs often depends on primary deposition databases of structural data, like the PDB (26), which can be used to indirectly derive disorder information from missing and mobile residues. Although annotation based on this information is less reliable compared to curated resources, it provides a larger set of examples (27,28). A complementary source of information is provided by sequence based predictors, which typically exploit amino acid biases in regions of proteins to identify IDRs (29). Deriving information from these methods requires a critical assessment of the different techniques, which has been implemented in independent blind tests and in the Critical Assessment of Intrinsic protein Disorder (CAID). These efforts, in particular CAID, enable integrating large-scale predicted information on disordered regions into other core data resources such as InterPro (30), UniProtKB (31) and PDBe (26) via MobiDB-lite (32).
In contrast to structural data, functional annotations related to disordered regions are underrepresented in public databases and ontologies (33). This presents a bottleneck for large-scale functional assignment of disordered regions. Functional predictions are in essence limited to regions that fold upon binding, e.g. ANCHOR (34), DISOPRED3 (35) and MoRFCHiBi (66). Recently, prediction of regions that remain disordered upon binding (36) and of regions that exhibit fuzzy binding (37) became available. A new generation of disorder function predictors is also being developed (36) based on the Critical Assessment of Function Annotation challenge (CAFA) (37) and DisProt-based Disorder Ontology (17).
The objective of the new MobiDB (version 4) is to provide stable and sustainable access to data on disordered regions and proteins. As an ELIXIR resource, this is in line with the overall ELIXIR mission and complies with the goals described in the ELIXIR IDP community whitepaper (38). Sustainability has been improved primarily by optimizing the database schema, and by improving the update process and simplifying the data representation formats. The new version incorporates additional curated data from specialized databases and presents novel annotation features including the binding modes derived from processing PDB data as well as an extended list of predictors. It is now possible to download search results and entire genomes with the possibility to specify the format and selected features. A novel statistics page allows the users to compare annotation coverage of different types of annotations and compare the annotation content at the proteome level for all reference proteomes.
MobiDB data serves both experimental scientists, who are interested in different aspects of disordered protein regions as well as bioinformaticians, who develop softwares for analysis and prediction of disordered regions. In the new version of the database, several new data resources and features were added to increase the coverage and usability of information about protein disorder. Figure 1 gives an overview of the annotations and services provided by MobiDB. All curated disordered regions are mapped across homologs obtained from GeneTree alignments (39) with >80% of sequence similarity and alignment lengths of 10 residues. This expands the curated set of proteins ten fold. Structural and functional properties of disordered regions are based on third party databases and a set of prediction methods.


MobiDB data and services. The MobiDB pipeline (left) includes the input source, output features and software used to generate MobiDB data (center). Different background colors indicate different levels of annotation quality. White background in the pipeline (left) indicates novel features, databases and software integrated into MobiDB v4. MobiDB data can be accessed through a website and an API (right). MobiDB predictions can be generated using the MobiDB-lite software which is available both as a Docker container and as a Python package from GitHub. MobiDB web pages are decorated with BioSchemas profiles which allows external search engines to retrieve disorder annotations exploiting structured data.
The full list of software and resources used in MobiDB is provided in Tables 1 and 2. The different types of annotations and predictions are assembled to provide the user a comprehensive view of properties of disordered regions at the residue level. As compared to the previous releases, MobiDB v4 provides annotations on novel functional aspects of disordered proteins, such as binding modes (36), involvement in post-translational modifications and phase separation (3,4,7).

| Method | Input | Features | Reference |
|---|---|---|---|
| Mobi | Structure | Binding modes, missing residues, mobile residues, high temperature residues | (43) |
| FLIPPER | Structure | Linear interacting peptides | Unpublished |
| RING | Structure | Residue Interaction Network | (40) |
| MobiDB-lite | Sequence | Disorder, cysteine rich, proline rich, polar, negative polyelectrolyte, positive polyelectrolyte, polyampholyte | (32) |
| ESpritz-DisProt | Sequence | Disorder | (54) |
| ESpritz-NMR | Sequence | Disorder | (54) |
| ESpritz-Xray | Sequence | Disorder | (54) |
| IUPred-Long | Sequence | Disorder | (55) |
| IUPred-Sort | Sequence | Disorder | (55) |
| VSL2b | Sequence | Disorder | (56) |
| DisEMBL-465 | Sequence | Disorder | (57) |
| DisEMBL-HotLoops | Sequence | Disorder | (57) |
| GlobPlot | Sequence | Disorder | (58) |
| JRONN | Sequence | Disorder | (59) |
| ANCHOR | Sequence | Linear interacting peptides | (42) |
| FeSS | Sequence | Secondary structure | (60) |
| DynaMine | Sequence | Backbone rigidity | (61) |
| Pfilt | Sequence | Low complexity | (62) |
| SEG | Sequence | Low complexity | (63) |
| Gene3D (*) | Sequence | Conserved domains | (64) |
| Pfam (*) | Sequence | Conserved domains | (65) |

| Database | Features | URL |
|---|---|---|
| CoDNaS | Conformational diversity | http://ufq.unq.edu.ar/codnas/ |
| DIBS | Linear interacting peptides | http://dibs.enzim.ttk.mta.hu/ |
| DisProt | Disorder, linear interacting peptides | https://www.disprot.org/ |
| ELM | Linear interacting peptides | http://elm.eu.org/ |
| FuzDB | Binding modes | http://protdyn-database.org/ |
| IDEAL | Disorder, linear interacting peptides | https://www.ideal-db.org/ |
| MFIB | Linear interacting peptides | http://mfib.enzim.ttk.mta.hu/ |
| PDBe | Protein structures | https://www.ebi.ac.uk/pdbe/ |
| PhasePro | Phase separation | https://phasepro.elte.hu/ |
| UniProtKB | Disorder, transmembrane, coiled coil, signal peptide | https://www.uniprot.org/ |
Disorder predictions are provided via the MobiDB-lite software (32) over the entire UniProtKB set of protein sequences. In addition, MobiDB calculates annotations from PDB structures, based on manually curated annotations from specific third party databases and propagating manual curation by homology. MobiDB-lite is also integrated for example into InterProScan (30) which propagates its predictions onto several other EBI resources like UniProtKB, InterPro and PDBe.
Different levels of reliability and different features are reported as different and independent annotations. Each MobiDB annotation is identified by an ‘evidence-feature-source’ triplet which uniquely identifies one level of quality, type and source of annotation. Possible values for ‘evidence’ are: (i) ‘curated’, the annotation comes from a manually curated database; (ii) ‘derived’, the annotation is inferred automatically processing primary data, for example PDB structures are processed to detect missing, mobile and lip residues, (iii) ‘homology’, annotations are obtained aligning curated data and (iv) ‘prediction’, annotations are automatically extracted from the protein sequence. The ‘feature’ element of the triplet represents the type (or flavour) of the annotation, for example ‘disorder’, ‘missing_residues’, ‘binding_mode’, ‘lip’, etc. The ‘source’ element represents the annotation source which can be the name of a software, a third party database or the type of consensus. Possible values are: ‘mobidb_lite’, ‘disprot’, ‘merged’, ‘th_90’, etc. The full list and explanation of all available triplets is provided on the about page of the website.
The main difference in MobiDB v4 as compared to the previous releases of MobiDB is the addition of novel functional aspects related to disordered regions, which are listed below and with a white background on the first block of Figure 1. These contribute to establishing links between conformational ensembles and their biological roles.
Binding modes of disordered regions refer to the conformational transitions of IDRs upon interacting with specific partners. IDRs can undergo disorder-to-order transitions and fold upon the template, or remain disordered (disorder-to-disorder transition) in a partner-bound form (36). Some IDRs exhibit both behaviors, and exhibit context-dependent binding with different partners or cellular conditions (37). These binding modes were assigned from collecting all bound-state experimental evidence from PDB, as defined in (36).
MobiDB implementation identifies inter-chain binding regions and the mono-/multimeric state using the RING software (40). RING generates residue level contact-networks by identifying all types of non-covalent interactions at atomic level in a protein structure (PDB). The residue interaction network generated by RING is combined with data on disorder derived from missing residues. Comparison of disordered residues in free and bound form of the protein is used to define the different binding modes: disorder-to-order, disorder-to-disorder and context-dependent binding.
Interactions of disordered regions are realized via short peptide motifs (41). These linear motifs can be collected from the sequence within the Eukaryotic Linear Motif database (16) or predicted using automatic methods such as ANCHOR (42). MobiDB provides a different approach, by exploiting a new software, Fast Linear Interacting Peptides Predictor (FLIPPER), which is able to detect linear interacting peptides (LIPs) from protein structures (PDB). FLIPPER is a random forest classifier which exploits geometrical and physicochemical properties of linear interacting peptides. These properties include linearity, solvent accessibility difference in the bound and unbound state and the ratio of inter- vs. intra-chain contacts. The algorithm has been trained on the same examples used by ANCHOR and validated against DIBS and PixelDB examples. The software is a reimplementation of a published component of the previous MobiDB pipeline (43) and it is freely available from GitHub (https://github.com/BioComputingUP/FLIPPER). After processing the entire PDB, FLIPPER recognized about 6800 different proteins with at least one LIP. Approximately 4.5% of the residues in these proteins are contained in LIPs (see MobiDB statistics) which provide a bona fide dataset for the implementation of novel sequence based predictors and statistical/functional analyses.
Posttranslational modifications of disordered regions serve as regulatory points of many biochemical pathways (44). Therefore annotating PTM sites is crucial for functional annotation of disordered regions. Data on all types of PTMs are derived from UniProtKB which propagates a small set of well annotated modifications by sequence similarity. Additional phosphorylation sites are provided from the Scop3P database. The Scop3P sites are extracted by large scale re-processing of 36 projects from the PRIDE database (45) with the ionbot approach (https://ionbot.cloud/).
Regions forming dynamic liquid droplets are a new facet of disordered proteins (46). Recent data indicates that weak interactions involving disordered regions often govern protein phase separation (47). Although the general biological roles of phase separation remain to be elucidated, systematic annotation of regions undergoing LLPS will contribute to elucidating the underlying sequence-codes.
Regions associated with phase separation processes in MobiDB are derived from the PhasePro database (22). This dataset not only assembles proteins, which were reported to phase separate in vitro or in vivo, but also specifies those regions, which mediate this process.
From the technical point of view, the major change is the new format of the entry document in the database. Multi-class annotations, e.g. secondary structure, have been converted to binary classifications by splitting classes into different fields. The grouping of different features on a common field is deprecated since it encodes an arbitrary interpretation of the type of prediction directly in the entry document. Now all predictions appear at the same level and have the same format. It is now possible to integrate new annotations- without refactoring the entry document. This strategy provides a greater flexibility simplifying updates and database maintenance. This enables us to keep MobiDB annotations up to date with UniProtKB releases, with a minimal delay of a few weeks. For new UniProtKB releases MobiDB generates predictions for new sequences by running MobiDB-lite and other tools, updates protein metadata (UniRef IDs, taxonomy, etc.) and removes obsolete entries.
The MobiDB website has been completely redesigned in order to improve user experience and satisfy both general use and detailed computational analyses. The graphical user interface retrieves data from a public API which is now well documented and will remain stable for several years.
The entry page has been completely renewed as compared to MobiDB v3 and is now divided in two views: simple and advanced. The simple view of the entry page offers an overview of the entry, focusing on a few annotations. Switching to the advanced view, the user accesses the full collection of data associated with an entry, with dedicated tools for their visualization. In particular, both views revolve around the feature–viewer (48) to organize and dynamically report annotations along the sequence. An interactable network map and a Mol* instance (49) allows an integrated in-depth exploration of residue contacts and structural features.
The feature–viewer combines annotations into a tree of tracks. Parent tracks summarize the information of children tracks by different consensus strategies. Some tracks, e.g. MobiDB-lite tracks are automatically expanded on page load to highlight disordered sub-regions. For each track a squared icon indicates the evidence quality level of the annotation. When a track has children its name gets bold on mouse over and the evidence icon is outlined. On the advanced view, it is possible to customize the feature viewer by hiding specific tracks. By clicking on a region, it is possible to visualize the corresponding annotation in the PDB structure and on the contacts network.
MobiDB v4 has a new statistics page which provides information about all types of annotations for the entire database and for all proteomes separately. The statistics page provides coverage of a given feature both at the database and proteome level. A search bar on the top allows searching for an organism by providing the identifier of the reference proteome or a string which is evaluated against the organism name and kingdom.
The browse page is designed upon the MobiDB server API. MobiDB provides programmatic access to perform a search through a RESTful web service API. A single entry can be retrieved by using UniProtKB identifiers, while database searches can be performed by specifying query fields directly as URL parameters in the HTTP request. Protein metadata (UniProtKb accession, gene, organism, NCBI taxon ID, etc.) can be searched for exact matches, whereas the disorder content and sequence length accept a range of values. In the case of a free text search (free_text parameter) the server uses the EBI Proteins API (50) to retrieve proteins containing the input string in the protein name. In that case the result order is provided by the UniProtKB annotation score (https://www.uniprot.org/help/annotation_score). Both from the browse page and from the API it is possible to download the full search result in different formats, TSV, FASTA and JSON. Entire genomes can be searched and downloaded in minutes.
To increase MobiDB interoperability and findability, key web pages are decorated with BioSchemas markup (https://bioschemas.org) (51) developed within ELIXIR, the European infrastructure for biological data. BioSchemas extends Schema.org (https://schema.org) for life sciences making the data contained in the database indexable by search engines and other services such as the Google dataset search tool. MobiDB uses DataCatalog and Dataset profiles in the main page, and DataRecord for the entry pages. The DataRecord includes Protein and SequenceAnnotation profiles. These profiles were recently implemented in MobiDB which makes the database, together with DisProt, among the first resources exposing sequence region information using BioSchemas.org.
The rapidly accumulating experimental data on the structure and function of disordered protein regions highlights the importance of systematic analysis of protein functions related to protein disorder. MobiDBv4 provides a major improvement in this respect as compared to previous releases by adding descriptions of functional aspects of disorder, such detailed information about different binding modes, including both disorder-to-order and fuzzy binding; posttranslational modifications; and regions associated with phase separation processes. Interpretation of residue interaction networks is also facilitated by visualising contact maps in an interactive manner. The database data and programming infrastructure has also been upgraded, facilitating systematic analysis, searches and updates. The new website provides more comprehensive information of each entry in a convenient format for both experimental and computational users.
MobiDB aims to provide gold standard disorder predictions, learning from the results of the Critical Assessment of Intrinsic protein Disorder (CAID, http://idpcentral.org/caid) and the Disorder Ontology prediction challenge at the Critical Assessment of protein Function Annotation (CAFA, https://www.biofunctionprediction.org/cafa). These initiatives will further contribute to the improvement of the disorder predictions available in MobiDB. MobiDB predictions are also integrated into several EBI resources including InterPro, UniProtKB and PDBe. MobiDB is a central resource for the IDP community in ELIXIR.
Thanks to the statistics page, for the first time MobiDB allows to answer questions like for example which is the fraction of disordered residues in a given organism, which is the fraction of residues annotated by the majority of predictors or how many proteins have at least one disordered region.
Further developments of the database aim to improve predictions related to protein disorder. On the one hand, the sequence-based predictions of disordered regions have to be standardised and improved, including the predictions of regions, which can undergo liquid–liquid phase separation. These predictions can be facilitated by evolutionary analysis of disordered regions and vice versa prediction of disordered regions can facilitate evolutionary analysis of proteins with low degrees of sequence similarities (52,53).
The other major line of development is functional predictions for disordered regions, including their roles in interaction networks, conformational switches and organisers of liquid droplets. The novel annotations implemented in MobiDB v4 contribute to these efforts. These functional annotations will also be integrated into CAID and CAFA challenges, and will be related to GO classifications. In the present form, MobiDB provides up-to-date knowledge on disordered proteins and regions, which can be widely used by both experimental and computational experts as well as by third party services.
We acknowledge ELIXIR-IIB (elixir-italy.org), the Italian Node of the European ELIXIR infrastructure (elixir-europe.org), for supporting the development and maintenance of MobiDB. MobiDB is a service of the ELIXIR IDP community.
European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie [778247]; Italian Ministry of University and Research (MIUR), PRIN [2017483NH8 to S.T.]; P.R. and W.V. acknowledge funding by the Research Foundation Flanders (FWO) [G.0328.16N]; N.D. acknowledges funding by a Cancer Research UK Senior Cancer Research Fellowship [C68484/A28159]; G.P. is a CONICET researcher; Universidad Nacional de Quilmes [PUNQ 1004/11]; ANPCyT [PICT-2014-3430]. Funding for open access charge: IDPfun (Marie Skłodowska-Curie) [778247].
Conflict of interest statement. None declared.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.