Visualizing Data

There was a variety of ways to visualize data in one’s introductory statistics course. We will be going over some of those visualization techniques today. While we will not be covering what they mean, we will be covering how to produce these pictures. Additionally, we will be asking questions about visualizing and interpreting pictures on data that we provide.

If you want more examples and tutorials on creating plots in R, we go over some additional topics in our Intro to R course on Vimeo: https://vimeo.com/ondemand/rintro

Boxplots

Below is a video on boxplots. If you prefer to read, then just skip it matey!

Boxplots, or whisker plots, can be easily produce on data using the following function

boxplot()

However, the boxplot command does not provide the exact five number summary. It only provides the visualization of the data. There are three options for this function that we would like to discuss. They involve changing the color of the boxplot, the main title, and the individual boxplot titles. Here is the general format of the boxplot function in its standard form.

boxplot(x,… , col=NULL, main=NULL, names)

col changes the color, main changes the main title, and names changes the names of each boxplot as this function allows for multiple boxplots in one produced image. R comes with many different colors built in.

Let us go over some examples to get a better understanding on how to use this function and its options.

Example I

We have 100 observations from a normal distribution with mean 13 and standard deviation of 2. It is on this file.  The first column is from the normal distribution we just described. The second column is from a gamma distribution which we will discuss later in the next examples. Create a boxplot for this data and save the file. Then create another boxplot with the main title of “N(13,2)”, change the color to navy, and save that image as a file. Make sure to comment on each line of your code.

Answer I

The code, output, and final figures are provided below. To save these images, click on the image. Then go to the top left hand corner and click on “File”. Then click “Save As”, and then click on your desired image file. I typically go with PNG.

mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data
mydata<-as.matrix(mydata) #converting the data into a matrix

boxplot(mydata[,1]) #creating a boxplot of the normal data

boxplot(mydata[,1], main="N(13,2)", col='navy') #apppling changes
Figure 1
Figure 1
Figure 2
Figure 2
Figure 3
Figure 3

Example II

We also have 100 observations from a gamma distribution with both parameters equal to 1. Create a figure with 2 boxplots, one for the normal observations and one for the gamma observations. Save that image. Then change the 1 and 2 on the x axis to the appropriate distribution where it came from. Save that image as a different file.

Answer II

The code, output, and final figures are provided below.

mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data
mydata<-as.matrix(mydata) #converting the data into a matrix

boxplot(mydata[,1], mydata[,2])#boxplot of both data

boxplot(mydata[,1], mydata[,2], names=c('Normal','Gamma'), main='Boxplots', col='light blue') #adding changes; note how names must have the labels in a vector
Figure 4
Figure 4
Figure 5
Figure 5
Figure 6
Figure 6

Dotplots

Below is a video on dot plots. If you prefer to read, then just skip it!

Dotplots can be easily produced by the following function

plot()

We can also add titles, change the color of the dots, and other things. We will be going over how to add a main title, change the y axis title, and change the color of the dots. Those options for the function are, respectively,

plot(data, main=’’, ylab=’’, col=’black’)

Let us see an example.

Example III

Provide a dotplot of the N(13,2) data without changing the standard settings. Save that image. Then provide an image with the main title of “N(13,2)” and a y axis title of “Values”. Change the color of the plots to green.

Answer III

The code, output, and final figures are provided below.

mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data
mydata<-as.matrix(mydata) #converting the data into a matrix

plot(mydata[,1]) #a dotplot of the normal data

plot(mydata[,1], main="N(13,2)", col='green', ylab='Values') #applying changes
Figure 7
Figure 7
Figure 8
Figure 8
Figure 9
Figure 9

Histograms

Below is a video on dot plots. If you prefer to read, then just skip it!

Histograms can be created by using the following function

hist()

You can also change a variety of settings, but we will be going over how to increase the number of columns the histogram uses, change the main title, change the x axis title, and change the color of the columns and the borders. The options for the function are, respectively,

hist(data, breaks= ’Sturges’, main= paste(‘Histogram of’, xname), xlab= xname, col=NULL, border=NULL)

Let us see an example of where this is used.

Example IV

Create a histogram of the N(13, 2) data. Do not change the settings. Save the image. Then create a histogram with the main title as “N(13, 2)”, the x axis title as “Values”, the number of breaks, or columns, to 20, the column color to navy, and the border to light blue.

Answer IV

The code, output, and final figures are provided below.

mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data
mydata<-as.matrix(mydata) #converting the data into a matrix

hist(mydata[,1]) #histogram of the normal data

hist(mydata[,1], breaks=20, main="N(13,2)", col='navy', border='light blue', xlab='Values') #applying changes

Figure 10
Figure 10
Figure 11
Figure 11
Figure 12
Figure 12

Creating Multiple Figures in One Image

It is possible to put multiple images on one figure with the following function

par()

There are two main options that we will discuss. They are changing the amount of figures in the image and the way that the image shape’s property. The property can either be a square or maximum. Square makes the images a fixed size that does not change when the size is change. Maximum will morph as the image size changes. For the layout of the figures on the image, this function sees the image as a grid. You can establish how many rows and columns there will be in the grid. The figures will fill up the image until all the spots are filled. The options in the function for establishing the grid and adjusting the property of the image are, respectively,

par(mfrow=c(1,2), pty=’s’)

The first number for mfrow established the number of rows while the second number established the number of coulmns. “s” stand for square while “m” stand for maximum. After you establish the layout and image’s property, you can simply start listing the figures that you want in the image. Let us see an example.

Example V

Create a boxplot, histogram, and dotplot for the both sets of data. Color code them. Make appropriate titles for each figure.  Save the final image.

Answer V

The code, output, and final figures are provided below. The normal data is navy and the gamma data is orange.

mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data
mydata<-as.matrix(mydata) #converting the data into a matrix

par(mfrow=c(3, 2), pty='m') #setting up grid; i made it have 3 rows and 2 columns; i made it m, but s is fine too

h1<-hist(mydata[,1], col='navy', border='light blue', main="N(3,2)", xlab='Values')  #hist of normal; color will be navy
h2<-hist(mydata[,2], col='orange', border='red', main="Gamma (1,1)", xlab='Values') #hist of gamma; color will be orange

p1<-plot(mydata[,1], col='navy', ylab='Values') #dotplot of normal
p2<-plot(mydata[,2], col='orange', ylab='Values') #dotplot of gamma

w1<-boxplot(mydata[,1], col='navy') #boxplot of normal
w2<-boxplot(mydata[,2], col='orange') #boxplot of gamma
Figure 13
Figure 13
Figure 15
Figure 15

Remember that many of these functions have additional properties that we did not cover at this time. To find further documentation and instruction on using these functions, type a question mark followed by the function’s name in the R console. For example, to find out more about the hist() function, type the following into the R console

?hist