genemodel Tutorial

This package provides a simple way to make positionally accurate plots of gene models similar to those found on The Arabidopsis Information Resource (TAIR). These plots are suitable for presentations and publications where showing gene models is appropriate to comunicate genetic research.

Installation

To install the package:

install_bitbucket("greymonroe/genemodel")
library(genemodel)

Example

Now lets look at an example gene that we are going to model. This is gene is AT5G62640. The original gene model information can be found here http://www.arabidopsis.org/servlets/TairObject?type=gene&id=1000654517. It is stored in the Gene Feature table on that page.

Once this table is extracted and saved as .csv or .txt table file, it can be loaded into R as a data.frame object. The package genemodel has this data already stored and can be loaded like this:

data("AT5G62640")

When we look at the strucutre of this data.frame,

head(AT5G62640, 15)
##             type coordinates
## 1            ORF    191-2958
## 2         5' utr       1-190
## 3  coding_region     191-271
## 4  coding_region     551-625
## 5  coding_region     689-782
## 6  coding_region    959-1029
## 7  coding_region   1155-1210
## 8  coding_region   1321-1372
## 9  coding_region   1449-1530
## 10 coding_region   1631-2004
## 11 coding_region   2124-2633
## 12 coding_region   2731-2958
## 13          exon       1-271
## 14        intron     272-550
## 15          exon     551-625

we see that it is a two column data.frame with the first column, “type,” containing the name of the feature type and the second,“coordinates,” specifying the coordinate position within the gene model that the feature occupies.

The feature types that TAIR provides in the Gene Feature table for gene models are:

  • ORF - the Open Reading Frame which extends from the first to last exon
  • Coding Region - segments of the gene that code RNA
  • 5` UTR - Untranlated Region.
  • Exon
  • Intron
  • 3` UTR

genemodel.plot function

Before we can plot the gene, we need to extract some other information from the TAIR gene model description. First we need the start and stop base pair positions for the gene. These can be found in the Map Locations section of the TAIR gene model page and correspond to the Coordinates of Map Type ‘nuc_sequence.’ Next weed the direction of transcription which can also be found in the Map Locations section under Orientation. Again we want the value for corresponding the Map Type that equals ‘nuc_sequence’ See image below from TAIR.

We now have the information necessary to plot the gene with genemodel.plot

genemodel.plot(model=AT5G62640, start=25149433, bpstop=25152541, orientation="reverse", xaxis=T)

genemodel.plot automatically recongizes the types of gene features found in TAIR gene models and plots them in accurate positions and orientation. By default, UTRs are colored light blue, exons are colored dark blue and introns are indicated by the bent line. The direction of transcripton is also marked in way consitent with TAIR notation by the pointed end of the gene model. By only plotting UTRs, coding region and introns, genemodel.plot ignores the ‘ORF’ and “exon’ feature types as they are redundant.

Alternative splicing

With a little creativity, it is easy to imagine using genemodel to plot such things as alternative splicing. For example, an exon and it’s neighboring introns can be removed and replaced by a sinlge intron to create a plot showing a different splice variant.

spl1<-data.frame(
  type=c("5' utr", "coding_region", "intron", "coding_region", "intron", "coding_region","3' utr"), 
  coordinates=c("1-50", "50-100", "100-150", "150-200", "200-250", "250-300","300-350"))

spl2<-data.frame(
  type=c("5' utr", "coding_region", "intron","coding_region","3' utr"), coordinates=c("1-50", "50-100", "100-250", "250-300","300-350"))
par(mfrow=c(2,1))
genemodel.plot(model=spl1, start=1, bpstop=350, orientation="reverse", xaxis=T)
genemodel.plot(model=spl2, start=1, bpstop=350, orientation="reverse", xaxis=F)

mutation.plot function

The next function we will look at is the mutation.plot function which plots mutations at correct positions on an already plotted gene model.

genemodel.plot(model=AT5G62640, start=25149433, bpstop=25152541, orientation="reverse", xaxis=T)
mutation.plot(25150214, 25150214, text="P->S", col="black", drop=-.15, haplotypes=c("red", "blue"))
mutation.plot(25150659, 25150659, text="V->S", col="black", drop=-.15, haplotypes=c("red"))
mutation.plot(25150639, 25150639, text="L->P", col="black", drop=-.35, haplotypes=c("blue"))

mutation.plot adds mutations to a prexisting gene model plot. In this example, amino acid substitutions are shown at exact positions. The colored dots correspond to the hapolotype group that has this mutation. The drop parameter can be used to offset the positionin of close mutation for easy visualization.