The authors have declared that no competing interests exist.
Conceived and designed the experiments: XFS XJG JHS. Performed the experiments: TC TZ. Analyzed the data: BH HYY. Wrote the paper: TC XFS XJG.
Protein ubiquitination is one of the important post-translational modifications by attaching ubiquitin to specific lysine (K) residues in target proteins, and plays important regulatory roles in many cell processes. Recent studies indicated that abnormal protein ubiquitination have been implicated in many diseases by degradation of many key regulatory proteins including tumor suppressor, oncoprotein, and cell cycle regulator. The detailed information of protein ubiquitination sites is useful for scientists to investigate the mechanism of many cell activities and related diseases.
In this study we established mUbiSida for mammalian Ubiquitination Site Database, which provides a scientific community with a comprehensive, freely and high-quality accessible resource of mammalian protein ubiquitination sites. In mUbiSida, we deposited about 35,494 experimentally validated ubiquitinated proteins with 110,976 ubiquitination sites from five species. The mUbiSiDa can also provide blast function to predict novel protein ubiquitination sites in other species by blast the query sequence in the deposit sequences in mUbiSiDa. The mUbiSiDa was designed to be a widely used tool for biologists and biomedical researchers with a user-friendly interface, and facilitate the further research of protein ubiquitination, biological networks and functional proteomics. The mUbiSiDa database is freely available at
Protein ubiquitination, known as the important protein post-translational modification of targeting proteins by ubiquitins for their subsequent degradation in the ATP-dependent ubiquitin proteasome system (UPS), plays an important role in cell activity
Protein ubiquitination is implemented by ubiquitin binding to the lysine site of a target protein. The location, numbers, and distribution of ubiquitination site are important information for scientists to investigate the mechanism of UPS and relevant diseases
Consequently, in this study we constructed a user-friendly database, mUbiSida, which meets the above requirements. The dataset in mUbiSiDa are mainly collected from published papers. In total, we searched and obtained 104 references containing experimentally validated 35,494 mammalian ubiquitinated proteins from 5 species. Over 95% of these sites are from human and mouse. This comprehensive database enables not only the information retrieval of protein ubiquitination sites, but also the study of cross-regulation between post-translational modification
In conclusion, the aim of the mUbiSiDa database is to provide a user-friendly web interface to browse, search, retrieve and update information on mammalian protein ubiquitinated sites, and to promote the further research of protein ubiquitination, biological networks and functional proteomics.
Our data are collected from two main resources: Firstly, published literatures from PubMed. Through keyword search results in PubMed, we have obtained a mass of related literatures, after which experimentally identified mammalian ubiquitylation sites were downloaded from these literatures Several main references were listed in the section of References
mUbiSiDa was constructed and configured upon a typical LAMP (Linux + Apache + MySQL + PHP) platform. Apache5.0.51b was firstly used to build up a webserver. All dataset were stored in MySQL 5.0, and web interface was achieved by PHP scripts (PHP version 5.2) on Linux, powered by an Apache server. WebPages were designed with html and JavaSript techniques. Website and database were connected and all kinds of function were achieved by PHP techniques.
All functions of mUbiSiDa were shown in Home page (
In the text field of the Search page, the users can input the query strings, such as protein ID, protein name, or others, and then the obtained result pages will be the list of protein entries that matched the query strings. Those keywords matching the query strings in the result pages will be yellow highlighted. The users can view the detailed information about their interested protein by clicking the protein ID on the left column. For further retrieval, combined search is also available, but the most two query strings combined by ‘+’ are only supported in the current version homepage of database. In addition, on each result page, mUbiSiDa also provides customized function to reorganize the feature list of protein entries according to user's interests.
There are three options in Advanced Retrieval: (1) Advanced Search (2) Protein Name Search, (3) Sequence Blast, which is efficient access for the users to obtain their interested information. The users can obtain the specific protein ubiquitination information by entering more restricted words in six text fields combined with ‘AND’, ‘OR’ and ‘BUT’ in “Advanced Search” query page. We specifically designed “Protein Name Search” query page for the users conveniently to obtain their interested protein when knowing their names. There are four text fields for users to input query strings, part or all of the text fields can be used for combined search. When the ID text field is empty, the other three text fields were designed to be fuzzy query. When the ID text field is not empty, it was set to be precise search. The ID is supposed to be UniProt ID.
“Blast search” was also specifically designed for the users to predict the potential ubiquitination site information of new protein in other mammalian species, which is orthologs or paralogs deposited in the database. As a tool for comparing primary biological sequence information, Sequence Blast enables a user to compare a query sequence with all sequences of mUbiSiDa, and identify sequences of mUbiSiDa that resemble the query sequence above a certain threshold. By means of this method, one or few results which sequence is most similar to the input sequence will be obtained after blast process. The Sequence Blast is a high efficient tool for retrieving possible ubiquitinated lysine sites of proteins input by users. Not only will the protein ID, but also related references are posted after using blast. Users can view detailed information of the result by clicking the proteins ID. This tool extends the function of the mUbiSiDa to be used to predict protein ubiquitination sites for most mammalian species.
The dataset in mUbiSiDa can be browsed in four options: (1) Browse data by organism, (2) Browse data by biological Process, (3) Browse data by cellular component, (4) Browse data by molecular function.
In “Browse Data by Organism” page, the total dataset in mUbiSiDa are grouped by organism, which is listed in a table with 5 kinds of organisms collected by mUbiSiDa. The corresponding entry number of each organism is also listed. The users can view all associated entries for one organism by clicking on right numbers. This function is quite useful for users to view all the relevant information in particular species.
The total dataset in mUbiSiDa can also be viewed by Gene Ontology (GO). In the result pages, the gene ontology IDs classified as biological process, cellular component, and molecular function for each protein deposited in mUbiSiDa will be listed. The gene ontology terms, and the corresponding protein entries numbers are also listed. The user can select and click the gene ontology ID to view all proteins of the selected gene ontology ID. The matched results can help the users with a convenient way to view ubiquitinated proteins grouped by GO terms. The user can benefit from this function to do cell research on regulation of protein's function or biological processes.
All the dataset in mUbiSiDa can be downloaded at “Resources” web page.
In order to maintain an up-to-date and comprehensive resource, we designed the submission page for the users to submit their own data to mUbiSiDa. The submission page requires that the users should offer the following submission items: protein name, ID, organism, ubiquitination site(s), PubMed ID, GO ID, sequence, and other information.
Users are allowed to have an access to the detailed information of each entry by clicking the ID on the left column in the result page. All information was carefully checked manually to ensure the accuracy of each entry. The each entry page is divided into six sections. The first section (
mUbiSiDa, the comprehensive mammalian ubiquitination site database, aiming to systematically aggregate all experimentally verified ubiquitinated protein, and provide biologists for analyzing protein stability and mechanism of protein degradation. Most data in this database are collected from human and mouse. It needs emphasizing that potential ubiquitinated lysine site from other organisms can be predicted according to the results of BLAST SEARCH or Multi-species Alignment Analysis. As a tool designed to be used by biologists and researchers, mUbiSiDa will be continually improved and updated to ensure the convenience and utility of the service, accuracy of the information and innovative in style and function.
To obtain an overview of the biological features of human and mouse ubiquitinated proteins, we firstly classified these proteins according to Gene Ontology (See
In the future work, efficiency in search process and new style of search need to be improved to satisfy different needs. The function of mammalian ubiquitination sites prediction will be subsequently added to mUbiSiDa in the future version. With continuous improvement, mUbiSiDa is expected to make a contribution to the researches on regulation of protein's function or biological processes.
(XLSX)