In addition, we extend the applicability of logo plots to new settings by i) allowing each “character” in the plot to be an arbitrary alphanumeric string (potentially including user-defined symbols) and ii) allowing a different “alphabet” of permitted strings at each position. Here we suggest a simple solution to this problem, producing a new sequence logo plot – the Enrichment Depletion Logo or EDLogo plot – that highlights both enrichment and depletion, while minimizing visual clutter. However, we have found that the resulting plots sometimes suffer from visual clutter – too many symbols, which distract from the main patterns of enrichment and depletion. The key idea is to explicitly represent depletions using characters that occupy the negative part of the y axis. suggests several alternatives to the standard logo plot. To better highlight depletions in amino acid motifs, Thomsen et al. The standard logo plot represents strong depletion(s) by the absence of character(s), which can be difficult to discern visually. notes depletion of histone marks H4 A C and H3 K4 M E1 at the gene start and gene end regions in lymphoblastoid cell lines. Another example involves the distribution of histone modifications across the genome: for example, Koch et al. One example of this, highlighted in, involves glycosylation: N-linked glycosylation sites in proteins are known to have the motif N- X- S/ T where X is any amino acid apart from proline P. However, sometimes it may be equally interesting to identify depletions: characters that occur less often than expected. In many applications such enrichments may be the main features of interest, and the standard logo plot serves these applications well. WebLogo, Seq2Logo, iceLogo ).īecause the standard logo plot scales the height of each character proportional to its relative frequency, it tends to visually highlight characters that are enriched that is, at higher than expected frequency. seqLogo, RWebLogo, ggseqlogo ) and web servers (e.g. The visualization is so appealing that methods to produce logo plots are now implemented in many software packages (e.g. The characters are ordered by their relative frequency, and the total height of the stack is determined by the information content of the position. At each position in the alignment, the standard logo plot represents the relative frequency of each character (base, amino acid, etc.) by stacking characters on top of each other, with the height of each character proportional to its relative frequency. Since their introduction in the early 1990s by Schneider and Stephens, sequence logo plots have become widely used for visualizing short conserved patterns known as sequence motifs, in multiple alignments of DNA, RNA and protein sequences. Our new EDLogo plots and flexible software implementation can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc.) across a wide range of applications. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles. And the software includes new Empirical Bayes methods to stabilize estimates of enrichment and depletion, and thus better highlight the most significant patterns in data. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. We provide an easy-to-use and highly customizable R package Logolas to produce a range of logo plots, including EDLogo plots. We introduce a new sequence logo plot, the EDLogo plot, that highlights both enrichment and depletion, while minimizing visual clutter. Current alternatives that try to highlight depletion often produce visually cluttered logos. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Sequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |