Raw Data Retrieval, Working with

Databases, and Automation


Section Agenda

  • Downloading data from sources
  • Writing production data to a database
  • Automating the pipeline
  • Email alerting, part 1

Project Overview

Download raw data from an API endpoint

Your first instinct for retrieving raw data may be to download the data as a csv file.

πŸ’‘ There’s a better way!

  • Many data sources offer an API endpoint for downloading data
  • {httr2} provides wrapper functions around curl requests for easier querying and handling of responses

{httr2} is a ground-up rewrite of {httr} that is pipeable! It implements an explicit request object that solves common problems such as built-in rate-limiting, retries, OAuth, secure secrets, and more.

Reading API Documentation

https://wsdot.wa.gov/traffic/api/

Example httr2 request

Here’s how we will query the vesselverbose endpoint at

https://www.wsdot.wa.gov/ferries/api/vessels/rest/vesselverbose?apiaccesscode={WSDOT_ACCESS_CODE}

base_url <- "https://www.wsdot.wa.gov"
endpoint <- "ferries/api/vessels/rest/vesselverbose"

# Compose the API request 
req <- request(base_url) |> 
  req_url_path_append(endpoint) |> 
  req_url_query(apiaccesscode = Sys.getenv("WSDOT_ACCESS_CODE")) 
  
# perform the request
response <- req |> 
  req_perform()

# convert the body of the response to a tibble
response_body <- response |> 
  resp_body_string() |> 
  jsonlite::fromJSON() |> 
  as_tibble()


Activity Time!

Activity

πŸ‘‰ Activity objective: Download the vesselverbose data from the WSDOT API endpoint.

  • Open the project materials/01-raw-data-write/01-raw-data-write.Rproj
  • Open the file 01-raw-data-write.qmd
  • πŸ›‘ We will work through Task 1 - Task 2 only

Where should I put the data?

πŸ“£ Production data belongs in a database.

Database Connection Essentials

2 parts to a database connection:

  1. A driver (on your system or as part of a R package)
  2. A means to interact with the driver (e.g., {DBI} package)

Best case: one of the DBI-compliant db-specific packages

Alternative: odbc::odbc() + an ODBC driver installed on your system

πŸ“£ When possible, use a database-specific R package (e.g., RPostgres, RMariaDB, RSQLite, bigrquery, etc.) instead of odbc + a driver. In many cases, they are more performant (especially in writing data) and may have more translations available for query types.

Interacting with Databases

  • What tables are in a database? DBI::dbListTables(con)

  • Use dplyr to interact with the database table in the same manner you would a local data frame

    df <- dplyr::tbl(con, "my_table")
    df |> filter(...) |> mutate(...) |> group_by(...) |> summarise(...)
  • πŸ“£ Do as much work as possible in the database to save time and resources before bringing the table into local memory.

  • Use dplyr::collect() to bring the table into memory. Try to use collect as late as possible in your queries / transformations

    df |> ... |> collect()


Activity Time!

Activity

πŸ‘‰ Activity objective: Write the vesselverbose data to the database.

  • Return to the file 01-raw-data-write.qmd
  • πŸ›‘ We will work through Task 3 - Task 5 only

How do I automate this process?

🧰 Deploy and schedule your ETL and reports on Posit Connect


  • Connect renders your notebook / runs your code
  • Render on demand or on a schedule
  • Can email the rendered report or notebook to you, collaborators, and/or viewers

Introduction to Emailing with Posit Connect

Posit Connect can send an email when a report or notebook renders

🚫

πŸ₯± This email is non-informative

πŸ”Š Creates noise in the inbox

πŸ™ˆ Does not compel anyone to look at it

Introduction to Emailing with and Connect

Create richly-formatted emails as part of your Quarto notebook

+

=

🧰
Customized emails

A Basic Quarto Email

---
title: Something wonderful 
format: email
---

Some of the content in this document makes its way to the rendered HTML
and some of it is destined for a custom email message. The 'email' block
customizes the subject and the body of the email message, while the
rendered HTML contains only this paragraph.

:::: {email}

This email was sent from Quarto! 

::: {subject}
subject
:::

::::

A Basic Quarto Email with Conditional Send

---
title: Something wonderful 
format: email
---

Some of the content in this document makes its way to the rendered HTML
and some of it is destined for a custom email message. The 'email' block
customizes the subject and the body of the email message, while the
rendered HTML contains only this paragraph.

```{r}
send_email <- sample(c("TRUE", "FALSE"), 1)
```

:::: {email}

:::{email-scheduled}
```{r}
send_email
```
:::

This email was occasionally sent from Quarto! 

::: {subject}
subject
:::

::::


Activity Time!

Activity, Part 1

πŸ‘‰ Activity objective: Work with a simple Quarto email document and preview the email locally

  • Return to the project 01-raw-data-write.Rproj
  • Open the file simple-quarto-email/_simple-quarto-email.qmd
  • We will render this document and preview the email locally

Activity, Part 2

πŸ‘‰ Activity objective: deploy and schedule our production notebook on Posit Connect so it runs automatically and sends an email if there’s something amiss with the raw data.

  • Return to the file 01-raw-data-write.qmd
  • πŸ›‘ We will work Task 6 to the end of the document

Recap of Alerting, Part 1

Where are we