Online Lectures on Bioinformatics
- DATABASE :
- Biological Databases
- NCBI Home
Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. - Entrez Search and Retrieval System
Entrez Programming Utilities are tools that provide access to Entrez data outside of the regular web query interface and may be helpful for retrieving search results for future use in another environment. - KEGG: Kyoto Encyclopedia of Genes and Genomes
A grand challenge in the post-genomic era is a complete computer representation of the cell and the organism, which will enable computational prediction of higher-level complexity of cellular processes and organism behaviors from genomic information. Towards this end we have been developing a bioinformatics resource named KEGG, Kyoto Encyclopedia of Genes and Genomes, as part of the research projects in the Kanehisa Laboratory of Kyoto University Bioinformatics Center. - TIGR Gene Indices
The TIGR Gene Index Project is supported in part by funding from the US Department of Energy, Grant #DE-FG02-99ER62852, and the US National Science Foundation, Grant #DBI-9983070. Additional funds are provided by the US National Science Foundation through grants #DBI-9813392 and #DBI-9975866. - Gramene: A Comparative Mapping Resource for Grains
Gramene is a curated, open-source, Web-accessible data resource for comparative genome analysis in the grasses. Our goal is to facilitate the study of cross-species homology relationships using information derived from public projects involved in genomic and EST sequencing, protein structure and function analysis, genetic and physical mapping, interpretation of biochemical pathways, gene and QTL localization and descriptions of phenotypic characters and mutations. - MaizeDB
The goals of this project are to provide a central repository for public maize information and present it in a way that creates intuitive biological connections for the researcher with minimal effort as well as provide a series of computational tools that directly address the questions of the biologist in an easy-to-use form. - Barley Genomics
AREAS Of RESEARCH: Barley Genome Mapping , Map-Based Cloning, Molecular Breeding, Mutant Isolation & Characterization, Functional Genomics, BAC Address Calculator, Developmental Mutants - EMBL European Bioinformatics Institute
The European Bioinformatics Institute (EBI) is a non-profit academic organisation that forms part of the European Molecular Biology Laboratory (EMBL). The EBI is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures. - A Catalog of Genes for Plant Glycerol Lipid Biosynthesis
The current version of this catalog contains more than 2600 sequence files, many of them with annotation and results of our analysis. This version is updated as of Aug. 1999 and includes essentially all publicly available genomic, cDNA, EST and GSS sequences for 62 plant polypeptides involved in lipid metabolism in higher plant species. An important feature of the catalog are the multiple alignments of amino acid sequences deduced from genomic and EST sequences. This version of the dataset accounts for approximately 70% of the Arabidopsis genome. - Grain Genes: A Small Grains and Sugarcane Database
GBrowse, developed by the GMOD group, is a Genome Browser that provides a wealth of genome annotation for maps in the GrainGenes collection. Users can easily manipulate the view of the chromosome and type of data displayed. - PathDB Pathways
PathDB is a beta level research tool for scientists interested in analyzing their experimental or computational data in the context of biological pathways and networks. - Enzymes and Metabolic Pathways Database
Enzymes and Metabolic Pathways database, EMP, is a unique and most comprehensive electronic source of biochemical data. It covers all aspects of enzymology and metabolism and represents the whole factual content of original journal publications. - Boehringer Mannheim Biochemical Pathways
Roche Applied Science: LightCycler, MagNA Pure LC, Lumi-Imager, PCR - ExPASy Molecular Biology Server
The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE. - Nucleic Acids Research:2000 Biological Database Issue
Nucleic Acids Research (NAR) publishes the results of leading edge research into physical, chemical, biochemical and biological aspects of nucleic acids and proteins involved in nucleic acid metabolism and/or interactions. It enables the rapid publication of papers under the following categories: chemistry, computational biology, genomics, molecular biology, RNA and structural biology. A Survey and Summary section provides a format for brief reviews. The first issue of each year is devoted to biological databases, and an issue in July is devoted to papers describing web-based software resources of value to the biological community. - Yeast Protein Database HOME PAGE
Six database volumes of biological information about proteins comprise Incyte's Proteome BioKnowledge Library. Each volume focuses on a different organism important in pharmaceutical research. - Saccharomyces Genome Database
SGDTM is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. - The Breast Cancer Gene Database
A database of genes involved in breast cancer. It is similar to the Tumor Gene Database (below) but limited in scope to those genes involved in human breast cancer and thus will be able to go into greater depth. The criteria for a gene to be included in this database are that it has been shown to be involved in human breast cancer (rather than an animal model) and that there is some evidence that it plays a functional role in the induction or progression of breast cancer. - The Mammary Transgene Interactive Database
This is an interactive database of literature on research designed to target transgene proteins to the mammary gland. Current emphasis is on biotechnology applications. Addition of tumor model and developmental model literature is planned. - The Small RNA database
Small RNAs are broadly defined as the RNAs not directly involved in protein synthesis. These are grouped under three categories: l) Capped small RNAs; 2) Noncapped small RNAs; and 3) Viral small RNAs. Sequences and references are included, and you can do wais searching with a keyword. - The Tumor Gene Database
A database of genes associated with tumorigenesis and cellular transformation. This database includes oncogenes, proto-oncogenes, tumor supressor genes/anti-oncogenes, regulators and substrates of the above, regions believed to contain such genes such as tumor-associated chromosomal break points and viral integration sites, and other genes and chromosomal regions that seems relevant. - UW E. coli Genome Project
- Protein Model Database
- PDBREPORT database
- Crystallography software
POPULAR DATA REDUCTION PACKAGES:
- Molecular Modeling Software Complete as -Molecular Mechanics,Quantum Chemistry,Intermolecular Interactions and Docking,Solid State and Surfaces simulations,Bioinformatics,Electron Scattering,Electrostatics calculations,Benchmark for computational chemistry applications,Molecular visualization and editing,Molecular Surface Computations,Protein Structure Prediction and analysis,Chemioinformatics,Parallelization, IO and distributed programming in computational chemistry,Molecular format converters and other utilites,Other lists of computational software
- Drug Database :
- DrugBank database
- ZINC, a free database of commercially-available compounds for virtual screening
- Super Drug Database
- Drug Data Base - (Drug Activity Classification)
- PDTD [Potential Drug Target Database]
- Drug Database : list of freeware that provides databases of information and electronic publications related to prescription drugs.
- TB Drug Target Database
- RxNav : A Semantic Navigation Tool for Clinical Drugs
- TTD: Therapeutic Target Database
- LIGAND Information
PROTEIN PROTEIN INTERACTION:
Other Bioinformatics Tools
SAVS : Stuctural Analysis and Verification Server
- ClustalW (http://bioweb.pasteur.fr/seqanal/interfaces/clustalw.html)
Perform a multi-sequence or profile-profile alignment with the program ClustalW. Just access the website directly and paste in all or a selection of your Class II tRNA synthetases in order to execute the program. ClustalW is the most widely used tool in bioinformatics for carrying out multi-sequence alignments. - Psipred (http://bioinf.cs.ucl.ac.uk/psiform.html)
Predict the secondary structure of one of your Class II tRNA synthetases with the Psipred Protein Structure Prediction Server. Paste your sequence in the input sequence window, provide your email address and you will receive after a few minutes a secondary structure prediction of your chosen tRNA synthetase. Sequence and structural alignments as well as secondary predictions form the framework for a successful modeling project. - 3D PSSM (http://www.sbg.bio.ic.ac.uk/~3dpssm/)
A web-based method method for protein fold recognition using sequence profiles coupled with secondary structure. - TMpred (http://www.ch.embnet.org/software/TMPRED_form.html)
A database scoring-based method to predict the transmembrane portions of membrane proteins. - TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2.0/)
A hidden Markov method to predict the transmembrane portions of membrane proteins. - European Bioinformatics Inst. (http://www.ebi.ac.uk/services/index.html)
An up-to-date and well-organized collection of links to bioinformatics tools, databases, and resources. The site provides advice as to the best or most popular tools in a category, and provides short descriptions of all entries. - ExPASy Molecular Biology Server (http://ca.expasy.org/)
Another well-organized directory of online analysis tools, databases, and other resources, with a greater focus on proteins. ``The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures...'' With this server you can start your own homology modeling project of an unknown class II tRNA synthetase, namely Alanyl-tRNA synthetase. You can obtain the sequence in FASTA format from the SwissProt database which can be accessed directly from the ExPASy server with the accession number SYA_ECOLI. As structural template choose one of the provided catalytic domain structures of class II tRNA synthetases. You can also model the other domains for which you need to find an appropriate template from the provided PDB structures. - SwissModel (http://swissmodel.expasy.org/)
For model generation use SwissModel, where you can thread your sequence upon one or several of your chosen templates. SwissModel provides you with an on-line tutorial and will perform refinements on initial models you submit to its server. - Dynamic Programming in Java (http://www.dkfz-heidelberg.de/tbi/bioinfo/PracticalSection/AliApplet/index.html)
This is an alternative Smith-Waterman tutorial which will provide you with a web-based interface for dynamic programming, an animated version of the paper-and-pencil exercise in section 5. - Biology WorkBench (http://workbench.sdsc.edu)
This website allows you to search popular protein and nucleic acid sequence databases. Sequence retrieval is integrated with access to a variety of analysis tools as for example the multi-sequence alignment program ClustalW. The advantage of the Biology Workbench is that all analysis tools are interconnected with each other eliminating the tedious file conversion process, which often needs to be done when accessing tools from distinct locations. - CASP5 (http://predictioncenter.llnl.gov/casp5/Casp5.html)
Every two years a community-wide protein structure prediction contest takes place,where groups complete for prediction of unpublished protein structures. One can check out how well has our Resource done in the last year contest. Just search for Zan Schulten Group results on this site. - PATHWAY ANALYZING TOOLS :
- g:Profiler a web-based toolset for functional profiling of gene lists from large-scale experiments. Easy to use web server
- KOBAS server used for i.e. elucidating pathways in addiction
- takes both FASTA files and lists of genes
- caveats
- excise gi from typical FASTA NCBI entry to get unique IDs
- only about 1/3 of genes will get annotated in the first step
- Li, Chuan-Yun, Xizeng Mao, and Liping Wei. “Genes and (Common) Pathways Underlying Drug Addiction.” PLoS Computational Biology 4, no. 1 (1, 2008) HTML
- GSEA "Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states"
- objections (Damian D, Gorfine M. Statistical concerns about the GSEA procedure): http://www.nature.com/ng/journal/v36/n7/full/ng0704-663a.html and reply: http://www.nature.com/ng/journal/v36/n7/full/ng0704-663b.html
- ErmineJ "ErmineJ performs analyses of gene sets in expression microarray data. A typical goal is to determine whether particular biological pathways are "doing something interesting" in the data. The software is designed to be used by biologists with little or no informatics background."
- BIOTECHNOLOGY :
FREE JOURNAL :
Interesting links for Structural Genomics
Proteins
NR
All non-redundant GenBank CDS translations+PDB+SwissProt+PIR
OWL
A non-redundant composite of 4 publicly-available primary sources: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D.
SWISSPROT
A curated protein sequence database
trEMBL
A supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT
PIR
A comprehensive, annotated, and non-redundant set of protein sequence databases in which entries are classified into family groups and alignments of each group are available.
PDB
An archive of experimentally determined three-dimensional structures of biological macromolecules
UNIGENE
An experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
dbEST
A division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms.
Families
PIR/MIPS
Classification by protein (super)family and homology domains
Proclass
A non-redundant protein database organized according to family relationships as defined collectively by ProSite patterns and PIR superfamilies.
prodom
Protein domain database consists of an automatic compilation of homologous domains. from SWISS-PROT 36 + TREMBL +TREMBL updates
DOMO
Protein domain database consists of an automatic compilation of domains from SwissProt and PIR
SBASE
A protein cluster database
protomap
An classification of all proteins in the swissprot database, into clusters of related proteins.
pfam
A large collection of multiple sequence alignments and hidden Markov models covering many common protein domains.
Picasso
PSSP (Protein Sequence Space Partitioning) is derived from nrdb90 (from Mar'98).
SYSTERS
The clustering of the PIR1 (Rel. 51) and the SWISS-PROT (Rel.34) databases
Molecular Sequence Megaclassification
A server provides access to a non-redundant molecular sequence collection that has been classified by different research groups.
BLOCKS
Multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.
PROSITE
A database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
prints
A compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of OWL.
HSSP
A database of homology-derived secondary structure of proteins.
COG
Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in 8 complete genomes, representing 6 major phylogenetic lineages.
Structure Classfication
Dali/FSSP
A network service for comparing protein structures in 3D.
SCOP
Structural Classification of Proteins.
CATH
A novel hierarchical classification of protein domain structures, which clusters proteins at four major levels, class(C), architecture(A), topology(T) and homologous superfamily (H).
Genome
SGD
A scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae
YPD
A protein database with emphasis on the physical and functional properties of the yeast proteins.
MIPS
The Yeast Genome database
Yeast Gene Duplications
This Web site contains data on duplicated genes in the yeast (Saccharomyces cerevisiae) genome.
atDB
Arabidopsis thaliana Genome Database
Haemophilus influenzae
Genome information for Haemophilus influenzae
FlyBase
A Database of the Drosophila Genome
ACEDB
A Database of the C. elegans Genome
MDG
Mouse Genome Informatics
TIGR Microbial Database
A listing of microbial genomes and chromosomes completed and in progress
Human
GDB
The official central repository for genomic mapping data resulting from the Human Genome Initiative.
HGMD
Human Gene Mutation Database
OMIM
Online Mendelian Inheritance in Man. A catalog of human genes and genetic disorders
CGAP
An interdisciplinary program to establish the information and technological tools needed to decipher the molecular anatomy of a cancer cell.
GeneCard
A database of human genes, their products and their involvement in diseases.
HUGO
Human Gene Nomenclature Committee
TGDB
The Tumor Gene Database
Functions
WIT
An environment for interpreting sequenced genomes for supporting metabolic reconstruction .
KEGG
Kyoto Encyclopedia of Genes and Genomes
DIP
Database of Interacting Proteins
Yeast Expression Database
This website contains the complete data sets for the experiments in the paper - DeRisi et. al. Science 278: 680-686, as well as the images of the whole-genome microarrays.
signaling
HIC-Up
A reesource for structural biologists dealing with hetero-compounds
ReliBase
A database system for analysing receptor/ligand complexes deposited in the Brookhaven Protein Databank.
Prediction
TMpred
A program makes a prediction of membrane-spanning regions and their orientation.
TMAP
Transmembrane protein fragment prediction program
DAS
Transmembrane protein fragment prediction program
SOUSI
Transmembrane protein fragment prediction program
COILS
Coiled coil fragment prediction program
Paircoil
Coiled coil fragment prediction program
The PredictProtein server
PHDsec, PHDacc, PHDhtm, PHDtopology, TOPITS, MaxHom, EvalSec
PREDATOR
A secondary structure prediction
GOR IV
A secondary structure prediction
NNPREDICT
A secondary structure prediction
SSPRED
A secondary structure prediction
123D
A threading program to use residue-residue contact potentials for checking the compatibility of 3D structures with a sequence (1D).
UCLA-DOE
A threading protein structure prediction sever. Besides threading, it also interages some other sequence and structure prediction and analysis software around the world.
Threader
A threading protein structure prediction program
Swiss-Model
An Automated Comparative Protein Modelling Server
MODELLER
A program for homology protein structure modelling by satisfaction of spatial restraints.
Calculations
Peptide Mass
Compute peptide Mass
Compute pI/Mw tool
Compute pI/Mw tool
Translate tool
a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence.
CLUSTALW
A Multiple sequence align program
MSA
A Multiple sequence align program
Multalin
A Multiple sequence align program
ALIGN
A Multiple sequence align program
AMAS
A Multiple sequence align program
NCBI BLAST programs
NCBI's sequence similarity search tool designed to support analysis of nucleotide and protein databases.
GCG
Software for the Analysis of Genes and Proteins
GeneQuiz
A system provides automated analysis of biological sequences.
Others
PRESAGE
A database of proteins for structural genomics, it has both experimental and theorical predition information.
PSI
Protein Structure Initiative Database. A database help selecting and tracking protein targets
PubMed
A literature reference database
ENZYME
A repository of information relative to the nomenclature of enzymes.
TUTORIAL
Terry Gaasterland's TUTORIAL ON The Role of Computational Biology In High-Throughput Structure Determination