Biomedical Informatics: Mutation- Nursing Report Writing Assignment
Task:
The assignments are intended to make the course participants familiar with some of the popular tools and resources. Some hints are given in order to guide the searches in (sometimes extensive) information provided by the web interfaces. These hints should be considered as just suggestions rather than unique solutions, because sometimes alternative ways to find the same answer are possible as well. The answers should be sent to a.p.goultiaev@umail.leidenuniv.nl before October 18, 2016, 23:59.
Important: Please don’t send attachments or hyperlinks in your emails. It is enough to describe briefly what has been done and what kind of result is obtained (straightforward questions in the assignments explain what is expected to obtain). Exception: assignment 5, for this assignment a Word file with a short description of obtained results can be submitted. The relative weights of assignments are given below (total 10 points). The final course grade is calculated as exam(80%) + assignments(20%).
1. Using the NCBI dbSNP database, (www.ncbi.nlm.nih.gov), retrieve the data on the SNP with accession rs7412. Is it clinically significant? What is the name of mutated gene ? Is this SNP determined by a silent mutation or not? If there is a change in the amino acid sequence of encoded protein, give the wild- type and mutated codons and amino acids. Give some examples of diseases that are possibly associated with this mutation.
2. Using the National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) [NHGRI-EBI Catalog] (www.ebi.ac.uk/gwas), determine how many genome-wide association studies have identified a trait association with the SNP from the previous assignment (rs7412).
3. An enzyme EZH2 is involved in the development of cancers such as lymphoma. Using the Ensembl genome browser (www.ensembl.org), determine the location (chromosome, nucleotide positions) of the EZH2 gene (in the search using EZH2 name, choose the “Best gene match” suggested in the search results). How many splice variant transcripts are known for this gene? Look at the transcript table: how many variations are known in the transcript encoding the largest protein ? How many variations are known for the residue 615 of this protein ? Are they located at the same nucleotide position in the transcript or not ? According to the scores yielded by SIFT program, are these mutations deleterious or not ? What is reported about clinical significance of these mutations by dbSNP/ClinVar databases (click on dbSNP accessions “rs…” in the “Variation ID” column) ?
4. A mutation, located in the gene with GeneID 136371, has been identified in a patient and is suspected to be important. The mutation C>T is mapped to the transcript NM_080871, and is shown in the sequence region below:
TGCCGAGGCCACCACCGCCCGCTGCCT wild type
TGCCGAGGCCACCACTGCCCGCTGCCT mutation
Using Mutation Taster server (www.mutationtaster.org), determine potential importance of the mutation. Use the provided data for the input. Is it silent mutation or there is a change in amino acid sequence (“AA changes”). Follow the link to SNP database (reference ID: rs…). Using the provided info, determine whether this SNP is pathogenic one and the associated disease. Using the dbSNP and ClinVar links, try to determine putative molecular mechanism (“Functional consequence”) underlying the effect of this mutation.
5. Toll-like receptors (TLRs) play an important role in signaling of innate immunity responses to pathogens. A number of studies suggests associations of variations in TLRs with changes in susceptibility and/or resistance to pathogens and in severity of some diseases. The TLRs have similar structures with three major domains: an ectodomain, consisting of multiple leucine-rich repeats (LRRs), transmembrane helix and Toll interleukin-1 receptor domain (TIR). Using bioinformatic approaches, analyze some SNPs in TLR genes.
5.1. Possible associations of the following SNPs with lung functions were studied in a group of patients: rs5743618 (TLR1 gene), rs5743708 (TLR2), rs3775291 (TLR3), rs4986790 (TLR4), rs5743810 (TLR6), rs179008 (TLR7), rs2407992 (TLR8), rs4129009 (TLR10). Determine the following features of these SNPs:
(a) Synonymous / non-synonymous substitution ? What is reported about clinical significance by the dbSNP database?
(b) For a non-synonymous variant, determine amino acid substitution and domain containing it.
(c) SNP frequency according to The 1000 Genomes Project (if available). Data from the 1000G project can be viewed in Ensembl (www.ensembl.org). Note the “minor allele frequency” (MAF) values for all populations and the largest/lowest MAF values in 5 main 1000G population groups.
(d) Determine the predicted effect according to the SIFT score (can be found in the transcript table of Ensembl).
(e) The study of these SNPs found that rs4986790, rs5743810 and rs179008 might be significant, while no associations were revealed for others. Is there any correlation with SNP domain locations, SIFT scores or MAF values ?
5.2. Toll-like receptor TLR3 is activated upon infections by flaviviruses such as Zika and dengue viruses. TLR3 is a receptor for double-helical RNA. Analyze the data on polymorphism of TLR3 amino acids that are involved in RNA binding [these amino acids positions are identified by Liu et al. (2008), Science 320:379-381].
Identify SNPs of amino acids that are essential for RNA binding, their SIFT scores, MAF values and dbSNP reports on clinical significance.
The TLR3 activation during early brain development, mediated by Zika virus infection during pregnancy, was suggested to be associated to disorders in children. Is it possible to suggest that the polymorhism in the TLR3 RNA binding sites could play a role in this process ?
6. (1 point) Mutations in the transthyretin gene can lead to a number of disorders caused by protein misfolding. Transthyretin precursor (NP_000362) contains 147 amino acids, the first 20 residues being a signal peptide. One of the mutations in this gene is the deletion of valine at the position 122 of the mature peptide (corresponding to the SNP rs121918096). Using the SWISS-MODEL homology modelling server (swissmodel.expasy.org), predict the structure of this mutant protein. It is advised to start from searching for templates (it could take a couple of minutes, and meanwhile it is informative to watch the names of procedures/algorithms run by the server). When this is done, you can just use the template at the top of the list (default) for the modelling. According to the model-template alignment, could you consider this prediction as a reliable one ? Does the Val122 deletion occur in a secondary structure element or in a coil region ? (NB. It is possible that Swiss-Model will inform you that your browser does not allow you to view the model itself. Actually such a view is not absolutely necessary for modelling and viewing the alignment to the template, so it is not necessary to change anything in your computer).
7. (2 points) Human MALAT1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) is a long (>8 kbp) noncoding RNA (lncRNA) implicated in a number of cancers. Using the UCSC genome browser (genome.ucsc.edu), determine location (chromosome, positions) of its gene. It is known that the 3′-proximal part (about 500 bp) contains conserved functional RNA secondary structures. Thus select the region corresponding to the 3′-proximal 500 bp of MALAT1 in the browser and get MultiZ alignment (click on the MultiZ Align bar): you should get the output with the blocks of MultiZ alignment. What is the size of the block with the longest projection on the human genome sequence? Try to predict the consensus structure in this alignment block using a few diverse sequences, for instance, human, mouse and chicken. It is suggested to use the program RNAalifold for this task. Thus get DNA
sequences of these three species from the block of interest (click on the corresponding “D” in the list). Make fasta file of three corresponding RNA fragments and submit it to Clustal Omega (www.ebi.ac.uk/Tools/msa/clustalo/ ) in order to make a Clustal-formatted multiple alignment (yet save the file as a text). Use this file for RNAalifold (http://rna.tbi.univie.ac.at ) prediction. Is any conserved structure predicted ? Are there base covariations supportingit ?