### Pivot Tables in R – Basic Pivot table, columns and metrics

## Creating basic pivot tables in R with different metrics (measures)

follow the step by step below or download the R file and load into R studio from github to create basic pivot tables in R:

in this tutorial we’ll create basic pivot tables in R (summary tables), looking at it from an Excel pivot perspective first, then sql, then R:

# clean up the environment

rm(list = ls())

# we’ll need dplyr and tidyr for this, by far easier than with base R

library(dplyr)

library(tidyr)

# read in the Canadian SuperStore dataset from this dropbox address:

df <- read.csv(“https://www.dropbox.com/s/kj9yioc24iq4pdb/superstore.csv?dl=1”)

write.csv(df, “superstore.csv”)

let’s have a look in excel; we’ll create a pivot table on the region, product category and customer segment as attributes and the sales (sum) as metric:

# pivot tables in R takes a very sql-like approach with the dplyr package: let’s say we wish to

# pivot so we see the total sales per product category, region and customer segment.

# in sql, it would be something like:

# SELECT Product.Category, Region, Customer.Segment, SUM(Sales)

# FROM df

# GROUP BY Product.Category, Region, Customer.Segment

# in dplyr, we follow a similar logic:

# first select the columns

pivot <- df %>%

select(Product.Category, Region, Customer.Segment, Sales)head(pivot)

# then group by

pivot <- df %>%

select(Product.Category, Region, Customer.Segment, Sales)%>%

group_by(Product.Category, Region, Customer.Segment)

# then tell R how to sum the sales (sum, count, average etcs):

pivot <- df %>%

select(Product.Category, Region, Customer.Segment, Sales)%>%

group_by(Product.Category, Region, Customer.Segment) %>%

summarise(TotalSales = sum(Sales))head(pivot)

# and so there’s our pivot “select” statement, so to speak.

# let’s add a few things, as we would in Excel pivot or sql – more computed columns,

# adding avg sales and number (count) of sales transactions:

Excel – drag the sales into the values grid a couple more times, changing the metric to average and count instead of sum:

In SQL, add a couple of aggregate columns:

and the R equivalent:

pivot <- df %>%

select(Product.Category, Region, Customer.Segment, Sales)%>%

group_by(Product.Category, Region, Customer.Segment) %>%

summarise(TotalSales = sum(Sales), AvgSales = mean(Sales,na.rm = TRUE),

NumSales = length(!is.na(Sales)))head(pivot)