This package provides a simple way to make positionally accurate plots of gene models similar to those found on The Arabidopsis Information Resource (TAIR). These plots are suitable for presentations and publications where showing gene models is appropriate to comunicate genetic research.
Now lets look at an example gene that we are going to model. This is
gene is AT5G62640
. The original gene model information can
be found here http://www.arabidopsis.org/servlets/TairObject?type=gene&id=1000654517.
It is stored in the Gene Feature
table on that page.
…
Once this table is extracted and saved as .csv or .txt table file, it can be loaded into R as a data.frame object. The package genemodel has this data already stored and can be loaded like this:
When we look at the strucutre of this data.frame,
## type coordinates
## 1 ORF 191-2958
## 2 5' utr 1-190
## 3 coding_region 191-271
## 4 coding_region 551-625
## 5 coding_region 689-782
## 6 coding_region 959-1029
## 7 coding_region 1155-1210
## 8 coding_region 1321-1372
## 9 coding_region 1449-1530
## 10 coding_region 1631-2004
## 11 coding_region 2124-2633
## 12 coding_region 2731-2958
## 13 exon 1-271
## 14 intron 272-550
## 15 exon 551-625
we see that it is a two column data.frame with the first column, “type,” containing the name of the feature type and the second,“coordinates,” specifying the coordinate position within the gene model that the feature occupies.
The feature types that TAIR provides in the Gene Feature
table for gene models are:
Before we can plot the gene, we need to extract some other
information from the TAIR gene model description. First we need the
start and stop base pair positions for the gene. These can be found in
the Map Locations
section of the TAIR gene model page and
correspond to the Coordinates
of Map Type
‘nuc_sequence.’ Next weed the direction of transcription which can also
be found in the Map Locations
section under
Orientation
. Again we want the value for corresponding the
Map Type
that equals ‘nuc_sequence’ See image below from
TAIR.
We now have the information necessary to plot the gene with
genemodel.plot
genemodel.plot
automatically recongizes the types of
gene features found in TAIR gene models and plots them in accurate
positions and orientation. By default, UTRs are colored light blue,
exons are colored dark blue and introns are indicated by the bent line.
The direction of transcripton is also marked in way consitent with TAIR
notation by the pointed end of the gene model. By only plotting UTRs,
coding region and introns, genemodel.plot
ignores the ‘ORF’
and “exon’ feature types as they are redundant.
With a little creativity, it is easy to imagine using genemodel to plot such things as alternative splicing. For example, an exon and it’s neighboring introns can be removed and replaced by a sinlge intron to create a plot showing a different splice variant.
spl1<-data.frame(
type=c("5' utr", "coding_region", "intron", "coding_region", "intron", "coding_region","3' utr"),
coordinates=c("1-50", "50-100", "100-150", "150-200", "200-250", "250-300","300-350"))
spl2<-data.frame(
type=c("5' utr", "coding_region", "intron","coding_region","3' utr"), coordinates=c("1-50", "50-100", "100-250", "250-300","300-350"))
par(mfrow=c(2,1))
genemodel.plot(model=spl1, start=1, bpstop=350, orientation="reverse", xaxis=T)
genemodel.plot(model=spl2, start=1, bpstop=350, orientation="reverse", xaxis=F)
The next function we will look at is the mutation.plot
function which plots mutations at correct positions on an already
plotted gene model.
genemodel.plot(model=AT5G62640, start=25149433, bpstop=25152541, orientation="reverse", xaxis=T)
mutation.plot(25150214, 25150214, text="P->S", col="black", drop=-.15, haplotypes=c("red", "blue"))
mutation.plot(25150659, 25150659, text="V->S", col="black", drop=-.15, haplotypes=c("red"))
mutation.plot(25150639, 25150639, text="L->P", col="black", drop=-.35, haplotypes=c("blue"))
mutation.plot
adds mutations to a prexisting gene model
plot. In this example, amino acid substitutions are shown at exact
positions. The colored dots correspond to the hapolotype group that has
this mutation. The drop
parameter can be used to offset the
positionin of close mutation for easy visualization.