1 Resources

  1. PDB; SCOP 2; CATH
  2. DALI
  3. Secondary structure predictions
  4. PSI-BLAST: simply use the BLASTP server, but check the option “PSI-BLAST” under “Program selection”.
  5. PFAM

2 Exercises

2.1 Protein structures and structural domains

  1. Go to PDB and find the structure 1m3q. What does this protein do? Where does it come from? Fool around the page to get to know it a bit.
  2. Go to 3d view, if possible select the “NGL” viewer (bottom right of the window) and play with the settings. What do you see? Which structural elements can you identify? Try the following:
    • Color structure by chain
    • Check the views: licorice, surface, backbone
    • Color by: structure, hydrophobicity (if you are using the default viewer, click on the three dots next to the polymer, then click “Cartoon representation” or “add/remove representation”. The NGL viewer is by far easier to use)
  3. In a new browser tab, open the structure 1xqo. Compare both structures. Do you find similarities? Differences?
  4. Go to the DALI server and choose “Pairwise”, for pairwise structural alignments. Structural alignments not only look at a sequence, but also on its structure to align two proteins. Align 1m3q and 1xqo (use the “+” button to enter the second sequence), but use the identifiers 1m3qA and 1xqoA to indicate that you want to align the A chains of the structures. When the results appear, click on the “interactive” page.
  5. What do upper and lower case letters mean in the alignment?
  6. Select the 1xqo sequence and click on “Structural alignment”. What do you see?
  7. Now click on “3D superimposition”. What do you see? What is that? Can you guess? (we will discuss it in a minute)

2.2 Simple secondary structure predictions

You will use the following four sequences: a leucine zipper, isomerase and rhodopsin.

>leucine zipper
MKDPAALKRARNTEAARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
>isomerase
APRKFFVGGNWKMNGDKKSLGELIHTLNGAKLSADTEVVCGAPSIYLDFARQKLDAKIGVAAQNCYKVPK
GAFTGEISPAMIKDIGAAWVILGHSERRHVFGESDELIGQKVAHALAEGLGVIACIGEKLDEREAGITEK
VVFEQTKAIADNVKDWSKVVLAYEPVWAIGTGKTATPQQAQEVHEKLRGWLKTHVSDAVAQSTRIIYGGS
VTGGNCKELASQHDVDGFLVGGASLKPEFVDIINAKH
>rhodopsin
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRT
PLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVC
KPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVV
HFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQG
SDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA
  1. Use the “Predict protein” server on all of the three sequences (best open them in three tabs). What differences do you see? What can you learn about the structure of these proteins?

  2. In another tab, go to the Protscale server and enter the rhodopsin sequence. Select “Kyte-Doolittle”. How does it compare to the predictions made with “Predict protein” about the transmembrane segments (section “Topology”)?

  3. Go to the COILS server and compare the coiled-coil predictions for the sequences of the isomerase and the leucine zipper. What do you see?

  4. Go to PDB and view the records 1YSA, 1LN6, 8TIM. How do these three structures relate to the predictions you have made?

2.3 PSI-BLAST, Hmmer, Pfam

In the following, use the following sequence.

>bHLH
MNSSSANITYASRKRRKPVQKTVKPIPAEGIKSNPSKRHRDRLNTELDRL
ASLLPFPQDVINKLDKLSVLRLSVSYLRAKSFFDVALKSSPTERNGGQDN
CRAANFREGLNLQEGEFLLQALNGFVLVVTTDALVFYASSTIQDYLGFQQ
SDVIHQSVYELIHTEDRAEFQRQLHWALNPSQCTESGQGIEEATGLPQTV
VCYNPDQIPPENSPLMERCFICRLRCLLDNSSGFLAMNFQGKLKYLHGQK
KKGKDGSILPPQLALFAIATPLQPPSILEIRTKNFIFRTKHKLDFTPIGC
DAKGRIVLGYTEAELCTRGSGYQFIHAADMLYCAESHIRMIKTGESGMIV
FRLLTKNNRWTWVQSNARLLYKNGRPDYIIVTQRPLTDEEGTEHLRKRNT
KLPFMFTTGEAVLYEATNPFPAIMDPLPLRTKNGTSGKDSATTSTLSKDS
LNPSSLLAAMMQQDESIYLYPASSTSSTAPFENNFFNESMNECRNWQDNT
APMGNDTILKHEQIDQPQDVNSFAGGHPGLFQDSKNSDLYSIMKNLGIDF
EDIRHMQNEKFFRNDFSGEVDFRDIDLTDEILTYVQDSLSKSPFIPSDYQ
QQQSLALNSSCMVQEHLHLEQQQQHHQKQVVVEPQQQLCQKMKHMQVNGM
FENWNSNQFVPFNCPQQDPQQYNVFTDLHGISQEFPYKSEMDSMPYTQNF
ISCNQPVLPQHSKCTELDYPMGSFEPSPYPTTSSLEDFVTCLQLPENQKH
GLNPQSAIITPQTCYAGAVSMYQCQPEPQHTHVGQMQYNPVLPGQQAFLN
KFQNGVLNETYPAELNNINNTQTTTHLQPLHHPSEARPFPDLTSSGFL
  1. Go to PFAM and use the “Sequence search”, enter the sequence above. (if it takes too long, use the identifier AHR_HUMAN).

  2. Once you have a hit, analyse the result. What domains does the protein contain? What is PAS? What is HLH? What does that protein do?

    • Click on the “TreeFam” view (if it does not display, click on “Wormbase”, then click on the default “Model” again). What do you see? How many homologs of AHR_HUMAN are there in zebrafish? (what is zebrafish anyway?) Do they all have the same domains?
    • One of the homologs is called “AHRR”. Find out what it does.
  3. Click on the HLH and PAS domains. You will get to the records describing the domains. For the HLH domain, click on “architectures” (top of the page). What do you find?

    • What is an “architecture”?
    • How many architectures are there for HLH and how many for PAS? What does it indicate?
  4. (additional) Go to chaper 10 of the Comparative Genomics book on NCBI pages and read section 3, “Problem 2” (“Problem 1” was shown by me). Use instructions from 2.2. Remember to set the E value threshold to 10!

3 Homework

No homework today! Yay!