Error: 'not/a/file.csv' does not exist in current working directory ('/home/runner/work/programming-r/programming-r').
Iteration 2
In this session, we will discuss:
purrr::map()
to read a bunch of filespurrr::walk()
to write a bunch of filesFor coding, we will use r-programming-exercises
:
R/iteration-02-01-reading-files.R
, etc.Using {purrr} to iterate can help you perform many tasks repeatably and reproducibly.
Read Excel files from a directory, then combine into a single data-frame.
When you first call here::here()
, (simplified):
.RProj
file.RProj
as reference-pathhere::here()
prepends reference-path to argumentIf project in /Users/ian/important-project/
:
here("data/file.csv")
"/Users/ian/important-project/data/file.csv"
In the programming-r-exercises
repository:
iteration-02-01-reading-files.R
Here’s our starting code:
data1952 <- read_excel(here("data/gapminder/1952.xlsx"))
data1957 <- read_excel(here("data/gapminder/1957.xlsx"))
data1962 <- read_excel(here("data/gapminder/1952.xlsx"))
data1967 <- read_excel(here("data/gapminder/1967.xlsx"))
data_manual <- bind_rows(data1952, data1957, data1962, data1967)
What problems do you see?
(I see two real problems, and one philosophical problem)
Run this example code, discuss with your neighbor.
I see this as a two step problem:
Let’s work together to improve this code to read data:
data <-
paths |>
# read each file from excel, into data frame
# keep only non-null elements
# set list-names as column `year`
# bind into single data-frame
# convert year to number
print()
If we have a failure, we may not want to stop everything.
Function operators:
poss_read_csv("not/a/file.csv")
Error: 'not/a/file.csv' does not exist in current working directory ('/home/runner/work/programming-r/programming-r').
NULL
poss_read_csv(I("a, b\n 1, 2"), col_types = "dd")
# A tibble: 1 × 2
a b
<dbl> <dbl>
1 1 2
In the programming-r-exercises
repository:
data/gapminder_party/
Create a new function:
possibly_read_excel <- possibly() # we do the rest
Use this function in your script.
Functional programming has three fundamental operations:
filter()
- like spaghetti, not coffee: purrr::keep()
map()
- do this to each element: purrr::map()
reduce()
- combine into new thing: purrr::reduce()
Implement list_rbind()
using functional-programming techniques:
list_rbind2 <- function(df, names_to) {
df |>
purrr::keep(\(x) !is.null(x)) |>
purrr::imap(\(d, name) dplyr::mutate(d, "{names_to}" := name)) |>
purrr::reduce(rbind)
}
NULL
values, purrr::keep()
purrr::imap()
purrr::reduce()
Goal: write out a csv file for each value of clarity
within ggplot2’s diamonds
dataset.
When we read “for each”, we might think of using map()
:
Writing out a file is a side effect.
We aren’t interested in the return value.
{purrr} has a function for that: walk()
(and friends).
iteration-02-02-writing-files.R
# ?dplyr::group_nest(), ?stringr::str_glue()
# from diamonds, create tibble with columns: clarity, data, filename
by_clarity_csv <-
diamonds |>
# nest by clarity
# create column for filename
print()
# ?readr::write_csv()
# using the data and filename, write out csv files
walk2(
by_clarity_csv$data,
by_clarity_csv$filename,
\(data, filename) NULL # replace with actual code
)
Goal: Save histogram for carat
for each value of clarity
within diamonds
dataset.
Create a plot
column, where each element is a ggplot. This will be a list-column.
You can use map()
:
mutate()
, with all the tidy-eval goodness!mutate(
plot = map(data, histogram, carat)
)
equivalent to
plot[[1]] = histogram(data[[1]], carat)
plot[[2]] = histogram(data[[2]], carat)
...
# from diamonds, create tibble with columns: clarity, data, plot, filename
by_clarity_plot <-
diamonds |>
# nest by clarity
group_nest(clarity) |>
# create columns for plot, filename
mutate(
filename = str_glue("clarity-{clarity}.png")#,
#plot = map(),
) |>
print()
# ?ggplot2::ggsave()
ggsave_local <- function(filename, plot) {
}
# using filename and plot, write out plots to png files
walk2(
by_clarity_plot$filename,
by_clarity_plot$plot,
# write plot file to data/clarity directory
ggsave_local
)
library("tidyverse")
library("palmerpenguins")
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point() +
scale_color_discrete(labels = tolower) # tolower is a function
Three fundamental operations in functional programming
Given a list and a function:
We can use map()
, filter()
, reduce()
to “implement”, using purrr:
I claim it’s possible, I don’t claim it’s a good idea.
dpurrr_filter()
dpurrr_filter <- function(df, predicate) {
df |>
as.list() |>
purrr::list_transpose(simplify = FALSE) |>
purrr::keep(predicate) |>
purrr::list_transpose() |>
as.data.frame()
}
dpurrr_filter(mtcars, \(d) d$gear == 3) |> head()
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
3 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
5 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
6 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
dpurrr_mutate()
dpurrr_mutate <- function(df, mapper) {
df |>
as.list() |>
purrr::list_transpose(simplify = FALSE) |>
purrr::map(\(d) c(d, mapper(d))) |>
purrr::list_transpose() |>
as.data.frame()
}
mpg cyl disp hp drat wt qsec vs am gear carb wt_kg
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1190.909
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1306.818
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1054.545
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 1461.364
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 1563.636
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1572.727
dpurrr_summarise()
dpurrr_summarise <- function(df, reducer, .init) {
df |>
as.list() |>
purrr::list_transpose(simplify = FALSE) |>
purrr::reduce(reducer, .init = .init) |>
as.data.frame()
}
First, a little prep work:
mtcars |>
split(mtcars$gear) |>
purrr::map(summariser) |>
ireduce(
reducer = \(acc, x, y) rbind(acc, c(list(gear = y), x)),
.init = data.frame()
)
gear wt_min wt_max
1 3 2.465 5.424
2 4 1.615 3.440
3 5 1.513 3.570
We can agree this presents no danger to dplyr.
In JavaScript, data frames are often arrays of objects (lists), so you’ll see formulations like this (e.g. tidyjs).
purrr::map()
to read a bunch of filespurrr::walk()
to write a bunch of filespurrr::keep()
)Functional programming comes up a lot in JavaScript
Please go to pos.it/conf-workshop-survey.
Your feedback is crucial!
Data from the survey informs curriculum and format decisions for future conf workshops, and we really appreciate you taking the time to provide it.