class: center, middle, inverse, title-slide .title[ # Practicals 01: Sequence search (intro) ] .subtitle[ ## Bioinformatics SS 24 ] .author[ ### January Weiner ] .date[ ### 2024-04-12 ] --- --- ## Sequence alignments ✪ Examples of alingments of sequences `ACTTGA` and `ACCTGGA` ``` ACTTG-A ACTT-GA AC-TTG-A ---ACTTGA ACTTGA------- || || | || | || || | | | | || ACCTGGA ACCTGGA ACCT-GGA ACC--TGGA ------ACCTGGA ``` This is not a correct alignment (but may make sense in a different context): ``` ACTTG--A || || | ACCTG-GA ``` --- ## Do not confuse alignments with complementary sequences .pull-left[ Alignment: ``` 5'-ACTTG-A-3' || || | 5'-ACCTGGA-3' ``` ] .pull-right[ Complementary sequence: ``` 5'-ACTTGGA-3' ||||||| 3'-TGAACCT-5' ``` ] --- ## Dot plots ✪ ``` A C G G T A A * * C * G * * T * A * * ``` --- ## Dot plots ``` Duplication A C G G T A A C G G T A A * * A * * C * C * G * * G * * T * T * A * * A * * C * G * * T * A * * ``` --- ## Dot plots ``` Duplication Insertion / Deletion A C G G T A A C G G T A A C G G G G G G T A A * * A * * A * * C * C * C * G * * G * * G * * * * * * T * T * T * A * * A * * A * * C * G * * T * A * * ``` --- ## The BLAST algorithm BLAST – Basic Local Alignment Search Tool   --- ## The BLAST algorithm  --- ## The BLAST algorithm * Advantage: `\(O(n)\)` – linear time complexity * Compromise between speed and sensitivity * Heuristic, not exact * Primary output: HSP, high scoring segment pairs (possibly multiple per sequence pair) --- ## Important BLAST parameters * Type of BLAST (which program) * Database (more on that tomorrow) * Word size (lower word size = slower but more sensitive) * Filter low complexity regions (e.g. repeats) --- ## BLAST E-value * Expected number of HSP's which have a score equal or better to the given result `\(E = m\cdot n\cdot 2^{-S'}\)` where `\(S'\)` is a normalized score and `\(m\)` and `\(n\)` are sequence lengths. --- ## BLAST programs |Program |Description |Input |Database | |:-------|:---------------------------------------------|:----------|:----------| |blastn |Search sequence against sequence |Nucleotide |Nucleotide | |blastp |Search sequence against sequence |Protein |Protein | |blastx |Translate query in all open reading frames |Nucleotide |Protein | |tblastn |Translate database in all open reading frames |Protein |Nucleotide | --- ## Databases * [NCBI Genbank](https://www.ncbi.nlm.nih.gov/genbank/) * Refseq (reference sequences, curated) * nr – Non-redundant (almost all known sequences) * Environmental sequences * Entrez Gene – gene centered, provides Entrez identifiers * [SwissProt / Uniprot](https://uniprot.org) – protein sequences * [PDB - Protein Databank](https://www.wwpdb.org) – protein structure database * [Ensembl](https://ensembl.org) – Ensembl genome browser and database