Data Exploration, Validation and Alerting


Section Agenda

  • Ad-hoc data exploration
  • Data validation
  • Emailing, part 2: Condition-based alerts

What is this Data?

Our first introduction to pointblank


pointblank provides data quality assessment and metadata reporting for data frames and database tables. https://github.com/rstudio/pointblank

🧰 The pointblank::scan_data() function provides a HTML report of the input data to help you understand your data.

Sample data scan

pointblank::scan_data(palmerpenguins::penguins)



Activity Time!

Activity

👉 Activity objective: exploring our data

  • Open the project materials/02-data-exploration/02-data-exploration.Rproj
  • Open the file 02-data-exploration.qmd
  • We will work through all the tasks in the notebook

How can I ensure the data in my pipeline is quality data?


🧰 pointblank for data validation

The pointblank data quality workflow

The pointblank data quality workflow

The pointblank data quality workflow

The pointblank data quality workflow

The pointblank data quality workflow

The pointblank data quality workflow

Data validation example

Agent Interrogation

Agent Validation Report

Pointblank data validation report


Activity Time!

Activity

👉 Activity objective: Use pointblank to validate data, remove non-compliant records, and explore validation results.

  • Open the project materials/03-data-clean-validate/03-data-clean-validate.Rproj
  • Open the file _simple-validation.qmd
  • 🛑 We will work through the entire notebook together

There’s much more to pointblank


Create a multiagent to summarize repeated validations to monitor data quality over time.


Use a YAML file to define validations which can be applied across projects and version controlled

Try the comprehensive pointblank test drive on Posit Cloud: https://posit.cloud/project/3411822

Conditional Emails, Part 2

Let’s put together the information from data validation to send conditional emails.

What might happen?

  1. The columns and schema of the upstream raw data may change
  2. More records may fail validation than we are expecting
  3. Everything could be fine

Conditional Emails, Part 2

Creating Condition-Based Emails

  1. Quarto emails are defined by content divs
  1. Quarto emails are defined by content divs
  2. Include conditional content divs with .content-visible when-meta
  1. Quarto emails are defined by content divs
  2. Include conditional content divs with .content-visible when-meta
  3. Set the metadata by injecting YAML into the rendering. Quarto will interpret YAML blocks no matter where they appear in the document
:::: {email}

This email was sent from Quarto! 

::: {subject}
subject
:::

::::
::::: {email}

This email was sent using condition `r variant`

:::: {.content-visible when-meta="is_email_variant_1"}
rest of email body 1

::: {subject}
subject 1
:::
::::

:::: {.content-visible when-meta="is_email_variant_2"}
rest email body 2 

::: {subject}
subject 2
:::
::::

:::::
---
title: Something wonderful
format: email
---

```{r}
variant <- sample(1:2, 1)
```

```{r}
#| output: asis
cat(
  "---",
  paste0("is_email_variant_",variant,": true"),
  "---",
  sep = "\n"
)
```

writes the YAML metadata in-place:

---
is_email_variant_1: true
---


Activity Time!

Activity

👉 Activity objective: See the whole workflow of data validation and conditional emails put together.

  • Return to the project materials/03-data-clean-validate/
  • Open the file 03-data-clean-validate.qmd
  • Set the environment variable CONDITION_OVERRIDE locally
  • 🛑 We will work through Task 1 only
  • Publish the report to Posit Connect and use the CONDITION_OVERRIDE to send yourself the different emails

Be aware…

⚠️ Common mistakes when creating emails

  • There is no interactive runtime for an email
  • Javascript-dependent content will generally not render when emailed because of how email clients process HTML

🧰 Best practices for embedding objects an email

  • {ggplot2} output can be included in the email
  • Create nicely formatted tables with the {gt} package. (Just remember, no interactivity!)
  • If you’d like to include a rendering of a widget (e.g., a dial or info box), use the {webshot2} package to take a capture of the widget and embed it as an image
  • If your email recipient wants more information or interactivity, direct them to a report or dashboard deployed to Connect

Other Alerting Approaches

Send alerts to a Slack channel or MS Teams, or via text message: https://rviews.rstudio.com/2020/06/18/how-to-have-r-notify-you/