Search

R for Data Science Notes

What is the sum of the first 1000 positive integers?


We can use the formula n(n+1)/2 to quickly compute this quantity.


Understanding Basic Data Types and Data Structures in R


To make the best of the R language, you’ll need a strong understanding of the basic data types and data structures and how to operate on them.


Data structures are very important to understand because these are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.


Everything in R is an object.


Data Types:


R has 6 basic data types. (In addition to the five listed below, there is also raw which will not be discussed in this workshop.)

  • character

  • numeric (real or decimal)

  • integer

  • logical

  • complex

Elements of these data types may be combined to form data structures, such as atomic vectors. When we call a vector atomic, we mean that the vector only holds data of a single data type. Below are examples of atomic character vectors, numeric vectors, integer vectors, etc.

  • character: "a", "swc"

  • numeric: 2, 15.5

  • integer: 2L (the L tells R to store this as an integer)

  • logical: TRUE, FALSE

  • complex: 1+4i (complex numbers with real and imaginary parts)

R provides many functions to examine features of vectors and other objects, for example

  • class() - what kind of object is it (high-level)?

  • typeof() - what is the object’s data type (low-level)?

  • length() - how long is it? What about two dimensional objects?

  • attributes() - does it have any metadata?


R has many data structures. These include

  • atomic vector

  • list

  • matrix

  • data frame

  • factors


Objects:

variables

functions

data strutures


Classes:


Storing data in R with DataFrames


DataFrames are tables


Rows represent observations

Different variables are in columns


Code in R basics

# loading the dslabs package and the murders dataset
library(dslabs)
data(murders)

# determining that the murders dataset is of the "data frame" class
class(murders)
# finding out more about the structure of the object
str(murders)
# showing the first 6 lines of the dataset
head(murders)

# using the accessor operator to obtain the population column
murders$population
# displaying the variable names in the murders dataset
names(murders)
# determining how many entries are in a vector
pop <- murders$population
length(pop)
# vectors can be of class numeric and character
class(pop)
class(murders$state)

# logical vectors are either TRUE or FALSE
z <- 3 == 2
z
class(z)

# factors are another type of class
class(murders$region)
# obtaining the levels of a factor
levels(murders$region)

Code

Vectors

  • The function c(), which stands for concatenate, is useful for creating vectors.

  • Another useful function for creating vectors is the seq() function, which generates sequences.

  • Subsetting lets us access specific parts of a vector by using square brackets to access elements of a vector.

Vectors

# We may create vectors of class numeric or character with the concatenate function
codes <- c(380, 124, 818)
country <- c("italy", "canada", "egypt")

# We can also name the elements of a numeric vector
# Note that the two lines of code below have the same result
codes <- c(italy = 380, canada = 124, egypt = 818)
codes <- c("italy" = 380, "canada" = 124, "egypt" = 818)

# We can also name the elements of a numeric vector using the names() function
codes <- c(380, 124, 818)
country <- c("italy","canada","egypt")
names(codes) <- country

# Using square brackets is useful for subsetting to access specific elements of a vector
codes[2]
codes[c(1,3)]
codes[1:2]

# If the entries of a vector are named, they may be accessed by referring to their name
codes["canada"]
codes[c("egypt","italy")]

Recent Posts

See All

Socket Programming in Python

Sockets and the socket API are used to send messages across a network. The network can be a logical, local network to the computer, or one that’s physically connected to an external network like the i

Financial Machine Learning

"The essential tool of econometrics is multivariate linear regression, an 18th century technology that was already mastered by Gauss before 1794...It is hard to believe that something as complex as 21

©2020 by Arturo Devesa.