R for Data Science Notes

What is the sum of the first 1000 positive integers?

We can use the formula n(n+1)/2 to quickly compute this quantity.

Understanding Basic Data Types and Data Structures in R

To make the best of the R language, you’ll need a strong understanding of the basic data types and data structures and how to operate on them.

Data structures are very important to understand because these are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.

Everything in R is an object.

Data Types:

R has 6 basic data types. (In addition to the five listed below, there is also raw which will not be discussed in this workshop.)

  • character

  • numeric (real or decimal)

  • integer

  • logical

  • complex

Elements of these data types may be combined to form data structures, such as atomic vectors. When we call a vector atomic, we mean that the vector only holds data of a single data type. Below are examples of atomic character vectors, numeric vectors, integer vectors, etc.

  • character: "a", "swc"

  • numeric: 2, 15.5

  • integer: 2L (the L tells R to store this as an integer)

  • logical: TRUE, FALSE

  • complex: 1+4i (complex numbers with real and imaginary parts)

R provides many functions to examine features of vectors and other objects, for example

  • class() - what kind of object is it (high-level)?

  • typeof() - what is the object’s data type (low-level)?

  • length() - how long is it? What about two dimensional objects?

  • attributes() - does it have any metadata?

R has many data structures. These include

  • atomic vector

  • list

  • matrix

  • data frame

  • factors




data strutures


Storing data in R with DataFrames

DataFrames are tables

Rows represent observations

Different variables are in columns

Code in R basics

# loading the dslabs package and the murders dataset

# determining that the murders dataset is of the "data frame" class
# finding out more about the structure of the object
# showing the first 6 lines of the dataset

# using the accessor operator to obtain the population column
# displaying the variable names in the murders dataset
# determining how many entries are in a vector
pop <- murders$population
# vectors can be of class numeric and character

# logical vectors are either TRUE or FALSE
z <- 3 == 2

# factors are another type of class
# obtaining the levels of a factor



  • The function c(), which stands for concatenate, is useful for creating vectors.

  • Another useful function for creating vectors is the seq() function, which generates sequences.

  • Subsetting lets us access specific parts of a vector by using square brackets to access elements of a vector.


# We may create vectors of class numeric or character with the concatenate function
codes <- c(380, 124, 818)
country <- c("italy", "canada", "egypt")

# We can also name the elements of a numeric vector
# Note that the two lines of code below have the same result
codes <- c(italy = 380, canada = 124, egypt = 818)
codes <- c("italy" = 380, "canada" = 124, "egypt" = 818)

# We can also name the elements of a numeric vector using the names() function
codes <- c(380, 124, 818)
country <- c("italy","canada","egypt")
names(codes) <- country

# Using square brackets is useful for subsetting to access specific elements of a vector

# If the entries of a vector are named, they may be accessed by referring to their name

Recent Posts

See All

Dijkstra shortest path algorithm

Word ladder game (change only one letter to go from Fool to Sage): Fool, Pool, Poll, Pole, Pale, Sale, Sage. How? Dijkstra shortest path algorithm

Deep Learning for Algorithmic Trading

Finance is highly nonlinear and sometimes stock price data can even seem completely random. Machine learning and Deep Learning have found their place in the financial institutions for their power in p

Statistical Arbitrage Trading Pairs

What are z score values? A Z score is the value of a supposedly normal random variable when we subtract the mean and divide by the standard deviation, thus scaling it to the standard normal distributi

©2020 by Arturo Devesa.