What is the sum of the first 1000 positive integers?
We can use the formula n(n+1)/2 to quickly compute this quantity.
Understanding Basic Data Types and Data Structures in R
To make the best of the R language, you’ll need a strong understanding of the basic data types and data structures and how to operate on them.
Data structures are very important to understand because these are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.
Everything in R is an object.
R has 6 basic data types. (In addition to the five listed below, there is also raw which will not be discussed in this workshop.)
numeric (real or decimal)
Elements of these data types may be combined to form data structures, such as atomic vectors. When we call a vector atomic, we mean that the vector only holds data of a single data type. Below are examples of atomic character vectors, numeric vectors, integer vectors, etc.
character: "a", "swc"
numeric: 2, 15.5
integer: 2L (the L tells R to store this as an integer)
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
R provides many functions to examine features of vectors and other objects, for example
class() - what kind of object is it (high-level)?
typeof() - what is the object’s data type (low-level)?
length() - how long is it? What about two dimensional objects?
attributes() - does it have any metadata?
R has many data structures. These include
Storing data in R with DataFrames
DataFrames are tables
Rows represent observations
Different variables are in columns
Code in R basics
# loading the dslabs package and the murders dataset library(dslabs) data(murders) # determining that the murders dataset is of the "data frame" class class(murders) # finding out more about the structure of the object str(murders) # showing the first 6 lines of the dataset head(murders) # using the accessor operator to obtain the population column murders$population # displaying the variable names in the murders dataset names(murders) # determining how many entries are in a vector pop <- murders$population length(pop) # vectors can be of class numeric and character class(pop) class(murders$state) # logical vectors are either TRUE or FALSE z <- 3 == 2 z class(z) # factors are another type of class class(murders$region) # obtaining the levels of a factor levels(murders$region) Code
The function c(), which stands for concatenate, is useful for creating vectors.
Another useful function for creating vectors is the seq() function, which generates sequences.
Subsetting lets us access specific parts of a vector by using square brackets to access elements of a vector.
Vectors # We may create vectors of class numeric or character with the concatenate function codes <- c(380, 124, 818) country <- c("italy", "canada", "egypt") # We can also name the elements of a numeric vector # Note that the two lines of code below have the same result codes <- c(italy = 380, canada = 124, egypt = 818) codes <- c("italy" = 380, "canada" = 124, "egypt" = 818) # We can also name the elements of a numeric vector using the names() function codes <- c(380, 124, 818) country <- c("italy","canada","egypt") names(codes) <- country # Using square brackets is useful for subsetting to access specific elements of a vector codes codes[c(1,3)] codes[1:2] # If the entries of a vector are named, they may be accessed by referring to their name codes["canada"] codes[c("egypt","italy")]