cropPAL provides a powerful tool to investigate subcellular localisation in a growing number of crop species through the unification of disparate datasets and by the provision of web services through our accessible interface. Users can construct powerful queries or interrogate their protein sets resulting in a one-stop-shop for protein localisation and protein location relationships. The compendium of cropPAL houses large scale proteomic (MS/MS), fluorescence protein (FP) localisation as well as Protein-Protein Interaction (PPI) inferred from Arabidopsis experimental PPI data. The compendium of cropPAL2 also contains precompiled bioinformatic predictions for protein subcellular localisations and a consensus call (winner location) taking predictive and experimental information into account. The cropPAL2 search interface provides flexible options of refining or interrogating protein data sets by location, interactions, protein properties and bibliographic information. For bulk downloads of cropPAL releases please visit Research Data Australia
Why cropPAL? Subcellular localisation information can contribute towards our understanding of protein function, protein redundancy and biological relationships. While a variety of technologies are currently employed to determine the subcellular location of proteins, much of this information is not available in an integrated manner. In an attempt to get a clearer picture of existing experimental data and to generally understand subcellular partitioning we have brought together and expanded various data sources to build cropPAL2. The database has a web accessible interface that allows advanced combinatorial queries on the data as well as downloads for downstream applications.
The resources in cropPAL2
cropPAL2 species annotation information
cropPAL2 is updated about once a year, which means experimental and computed data increase with every update. The current version cropPAL2 is built on the Ensembl Pantmart version 34 described in the Gramene release 52 notes. The version 2 of cropPAL expanded the curated experimental and precomputed data to 11 species. The data was linked to the proteome annotations listed below.
|Glycine max (soybean)||Glyma1.0||Glyma1.1|
|Hordeum vulgare (barley)||030312v2||IBSC_1.0|
|Musa acuminata (banana)||MA1||2012-08-Cirad|
|Oryza sativa japonica||IRGSP-1.0||MSU 7.0|
|Solanum lycopersicum (tomato)||SL2.50||ITAG2.3|
|Solanum tuberosum (potato)||3.0||SolTub_3.0|
|Vitis vinifera||IGGP 12x||2012-07-CRIBI|
|Zea mays (corn)||B73_RefGen_v4||MAKER-CSHL|
cropPAL2 experimental data
An overview of the experimental data in cropPAL2 as of June 2017 is shown below. We have identified additional experimental studies that are currently being curated. The curated experimental data will be integrated at our next update. The experimental data is captured and obsolete or non-Ensemble Plant protein IDs are either cross-references or sequences belonging to these IDs are retrieved and BLASTed against the current proteome. This helps retain valuable experimental data and links it to current genome standards as best as we can. We retain all linking paths and intermediate IDs or steps. While not all is available through the web interface, questions about data linkage for specific proteins or experimental data sets can be answered via contacting us directly. This may be helpful when working with a different annotation or transferring cropPAL data analyses from decommissioned cropPAL versions in the future.
|Glycine max (soybean)||0||9830||74969|
|Hordeum vulgare (barley)||116||436||16428|
|Musa acuminata (banana)||5||140||61945|
|Oryza sativa japonica||675||5688||23335|
|Solanum lycopersicum (tomato)||12||8104||21595|
|Solanum tuberosum (potato)||1||1457||22412|
|Zea mays (corn)||229||8951||50471|
There are 10 predictors integrated into cropPAL which use distinct training data sets, input variables and prediction methods. These have been reviewed and compared for their contribution to the SUBA consensus call in our recent study about SUBAcon. Predictors vary in their accuracy for each subcellular compartment. Information describing the performance of individual predictors in specific subcellular compartments in Arabidopsis can be found on the SUBA4 about page .
cropPAL2 Winner locations
The winner-takes-all is a calculate output that attempts to unify prediction as well as available experimental data. This is different to a classifier that is based on a training set and tested performance. The winner-takes-all call is generated by counting up the predicted locations as 1 vote (e.g. 2 x mitochondrial predictions = 1 vote for mitochondrion) and adding any experimental verification as 1 vote each (e.g. 2 x plastidal GFP localisations = 2 votes for plastid). The votes are added up and the winner is the exclusive location suggestion for this protein. In the example, the votes would result in a 2:1 plastidal output with a final call to plastid. This strategies ensures a stronger influence of experimental verifications when they are available.
cropPAL2 Protein-Protein-Interaction Data
Direct experimental evidence of Protein-Protein Interaction data for crop species is still sparse. For Arabidopsis, SUBA4 collated over 26000 experiments that showed the interaction of two or more proteins covering a third of the proteome. The similarity in protein sequence of Arabidopsis PPI pairs across species may help find PPI partners in crop species for hypothesis-driven research. In cropPAL2, we have taken all evidence-based Arabidopsis PPI data and linked it to homologically similar proteins in crop species. This yielded over 800000 suggested PPI partners across the 11 crop species. The homology linking was performed using TreeBEST homology linking as available through Ensembl Plant Mart version 34 (see details below). The cut off for identify similarity was at least 10 for both PPI partners as well as both ways between species. If you need more information, a different cut-off or some other linking of PPI data with crops please contact us directly by email: [email protected]
cropPAL2 Homology Linking
There are two strategies in cropPAL2 linking crop data to other species. One apporach that is used to
determine the best match is a reciprocal BLAST of all crop proteins against Arabidopsis. This generates a
link to the nearest match and description in SUBA4.
The second type of homology linking is performed using the Ensembl Plants gene tree as provided by Gramene. The tree was generated using TreeBeST and was described in more detail in 2016 in DataBase. The TreeBeST homology linking links all cropPAL2 species to each other as well as to Arabidopsis. The gene tree is a more conservative measure than the reciprocal blast