INAMBIOINFORMATICS

Modullar Manual

Retrieving Sequence from NCBI

The sequences is retrieved from the NCBI by using the keyword ?Human Erythrocyte Membrane Protein band4.2?to retrieve the protein sequences in FASTA format and save it in notepad.

Template searching using Protein Blast

Band4.2 protein sequences in FASTA format was submitted in the BLAST and to get the template for target sequence. Then identify the suitable templates for that protein based on the sequence similarity (>=40% identity) using BLASTP ?Basic Alignment Tool for Protein.

Retrieve template 3D structure from PDB

Choose the template; those having high similarity with the target sequence are obtained from RCSB- Protein Data Bank.

Modeling the target sequence by using Modeller9V7

Basically Modeller9v7 is used to predict the structure for the target sequence by Homology Modeling in three steps. Template 3D structure file in pdb format and target sequence files in FASTA format are needed to run the Modeller9v7. These files must be in Modeller folder, and also need 3 basic files like sequence in .ali format, align in .py format and model in .py format.

The steps are as follows,

1. Our target sequence must be converted to PIR format and the file must be saved with ?ali?extension.

>P1; epb42
Sequence: epb42::::::: 0.00: 0.00
MGQGEPSQRSTGLAGLYAAPAASPVFIKGSGMDALGIKSCDFQAARNNEEHHTKALSSRRLFVRRGQPFTIILYFRAPVRAFLPALKKVALTAQTGEQPSKINRTQATFPISSLGDRKWWSAVVEERDAQSWTISVTTPADAVIGHYSLLLQVSGRKQLLLGQFTLLFNPWNREDAVFLKNEAQRMEYLLNQNGLIYLGTADCIQAESWDFGQFEGDVIDLSLRLLSKDKQVEKWSQPVHVARVLGALLHFLKEQRVLPTPQTQATQEGALLNKRRGSVPILRQWLTGRGRPVYDGQAWVLAAVACTVLRCLGIPARVVTTFASAQGTGGRLLIDEYYNEEGLQNGEGQRGRIWIFQTSTECWMTRPALPQGYDGWQILHPSAPNGGGVLGSCDLVPVRAVKEGTLGLTPAVSDLFAAINASCVVWKCCEDGTLELTDSNTKYVGNNISTKGVGSDRCEDITQNYKYPEGSLQEKEVLERVEKEKMEREKDNGIRPPSLETASPLYLLLKAPSSLPLRGDAQISVTLVNHSEQEKAVQLAIGVQAVHYNGVLAAKLWRKKLHLTLSANLEKIITIGLFFSNFERNPPENTFLRLTAMATHSESNLSCFAQEDIAICRPHLAIKMPE
KAEQYQPLTASVSLQNSLDAPMEDCVISILGRGLIHRERSYRFRSVWPENTMCAKFQFTPTHVGLQRLTVEVDCNMFQNLTNYKSVTVVAPELSA*

2. Then by using the align.py python script the ?pap?file and the ?ali?files were created.
Align.py ?Script:
from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='1KV3A', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='1KV3A', atom_files='1KV3A.pdb')
aln.append(file='epb42.ali', align_codes='epb42')
aln.align2d()
aln.write(file='epb42-1KV3A.ali', alignment_format='PIR')
aln.write(file='epb42-1KV3A.pap', alignment_format='PAP')
The .ali extended file contains the backbone alignment of the target sequence with the template sequence. Then the Output is epb42-1KV3A .ali file.

3. With the help of epb42-1KV3A .ali file, the model was created by the model.py script. With this py script, three models were created.
Model.py ?Script:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='epb42-1KV3A.ali',
knowns='1KV3A', sequence='epb42',
assess_methods=(assess.DOPE, assess.GA341))
a.starting_model = 1
a.ending_model = 3
a.make()
After these models were created, they undergo validation analysis. But in advanced modeling loop refinement will help to align the looped region in the models.

Loop Refining

After completion of the modeled protein 3D structures, the modeled structure in PyMol viewer was analyzed to find which one is best. From the best model of the Protein structure, loop region is refined with aminoacid sequence from an existing coordinate file. Finally, we got 3different independently optimized loop conformations by setting the loop .ending _model parameter to 3. Then the next image shows the superimposition of 3 different models with existing coordinate file. Then select the best model based on RMSD value. The result file is saved in ?py?format.
# Loop refinement of an existing model
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
# directories for input atom files
env.io.atom_files_directory = './:../atom_files'
# Create a new class based on 'loopmodel' so that we can redefine
# select_loop_atoms (necessary)
class MyLoop(loopmodel):
# This routine picks the residues to be refined by loop modeling
def select_loop_atoms(self):
# 10 residue insertion
return selection(self.residue_range('73', '154'))
m = MyLoop(env,
inimodel='EPB42.pdb', # initial model of the target
sequence='EPB42') # code of the target
m.loop.starting_model= 1 # index of the first loop model
m.loop.ending_model = 3 # index of the last loop model
m.loop.md_level = refine.very_fast # loop refinement method; this yields
# models quickly but of low quality;
# use refine.slow for better models
m.make()

Modelled Structure Validation
The models were validated through the many factors like RMSD value, Procheck result. There are many tools and online servers to provide the validatory function for the protein models. We have considered the RMSD value, the procheck results and number of amino acids in disallowed regions.

INAMBIOINFORMATICS

Monday, March 07, 2016

Modullar Manual

No comments:

About Me