Data Science Workflows
with Posit Tools

R Focus

Instructors:
Ryan Johnson
Katie Masiello

TA:
Trevor Nederlof


Introduction

Logistics

πŸ›œ WiFi credentials:

  • Network: Posit Conf 2024

  • Password: conf2024

  • Important locations:

    • Bathrooms: There are gender-neutral bathroom on levels 3, 4, 5, 6 & 7
    • Meditation/prayer room: 503 (available Mon & Tues 7am - 7pm, and Wed 7am - 5pm)
    • Mothers room: 509 (same timings as above)

Logistics

  • Participants who do not wish to be photographed have red lanyards; please note everyone’s lanyard colors before taking a photo and respect their choices.
  • The Code of Conduct and COVID policies can be found at https://posit.co/code-of-conduct/. Please review them carefully. You can report Code of Conduct violations in person, by email, or by phone. Please see the policy linked above for contact information.

Code of Conduct

  • Everyone who comes to learn and enjoy the experience should feel welcome at posit::conf. Posit is committed to providing a professional, friendly and safe environment for all participants at its events, regardless of gender, sexual orientation, disability, race, ethnicity, religion, national origin or other protected class.

  • This code of conduct outlines the expectations for all participants, including attendees, sponsors, speakers, vendors, media, exhibitors, and volunteers. Posit will actively enforce this code of conduct throughout posit::conf.

https://posit.co/code-of-conduct/

Meet the Team!

Ryan Johnson

Data Science Advisor @ Posit

Katie Masiello

Solutions Engineer @ Posit

Trevor Nederlof

Solutions Engineer @ Posit

Meet your Neighbor!

Agenda

Time Activity
~9:00 - 10:30 Workshop Introduction
Reading, Cleaning, Writing and Validating Data
10:30 - 11:00 Coffee break β˜•
~11:00 - 12:30 Creating, Delivering, and Monitoring a model using Vetiver
12:30 - 1:30 Lunch break πŸ₯ͺ
~1:30 - 3:00 Delivery
3:00 - 3:30 Coffee break β˜•
~3:30 - 5:00 Advancing your Workflow

The Sticky Situation

β€œI’m lost / need help”

β€œI’m done and ready to move along”



πŸ‘¨β€πŸ’»Put your sticky note on the back of your laptop screen πŸ‘©β€πŸ’»

Workshop approach

We will use an end-to-end real-world project to demonstrate workflows and best practices using open source packages and Posit professional tools.


Conventions

🧰 Add this to your toolbox.
πŸ“£ I will stand on my soapbox and profess this until I am blue in the face.

Detour warning. We could get really into this, but there’s not time today.

Asking Questions

πŸ‘‰ Submit questions and respond to polls on GitHub Discussions


https://github.com/posit-conf-2024/ds-workflows-r/discussions


You are always welcome to raise your hand! πŸ™‹

Go to the Discussion now and respond to the question!

Getting help (R Functions)

Functions are the 🍞 and 🧈 of R programming!


If you want to access any function’s help page:

# Method 1
help(function_name_here)

# Method 2
?function_name_here

# Method 3
# Highlight the function and press F1 🀯

⛴️ Ready to set sail? 🌊

⛴️ Ready to set sail? 🌊

Washington State Ferry System


⛴️ WSF is the largest operating public ferry system in the US! 🀯

πŸ‹ 21 ferries across Puget Sound and the Salish Sea

Meet the Ferries

Washington State Ferry Depature Delays Project

The Question

Can we predict departure delay for a given route and date?

Our Approach

Use the historical (validated) delay, location, and weather data to create a model that will predict the likelihood of delays!

Project Data

This workshop will use data from two primary data sources:

Project Data Details


β›΄οΈŽ Ferry data (https://wsdot.wa.gov/traffic/api/)

Data Set Description API
Vessel verbose Details about each ferry in the fleet, including name, model, and capacity https://www.wsdot.wa.gov/ferries/api/ vessels/rest/vesselverbose? apiaccesscode={WSDOT_ACCESS_CODE}
Vessel history Historical sailings, including scheduled actual departure time https://www.wsdot.wa.gov/ferries/api/ vessels/rest/vesselhistory/ {VESSELNAME}/{DATESTART}/{DATEEND}? apiaccesscode={WSDOT_ACCESS_CODE}
Terminal locations Terminal names and locations, including latitude and longitude https://www.wsdot.wa.gov/ferries/ api/terminals/rest/terminallocations? apiaccesscode={WSDOT_ACCESS_CODE}

Project Data Details


🌀️ Weather data (https://open-meteo.com/en/docs/historical-weather-api/)

Endpoint Description API
Historical weather Historical hourly weather at a specified latitude and longitude over a date range https://archive-api.open-meteo.com/v1/ archive?{params}

Project Objective

  • Provide users travelers with a self-service tool that predicts the likelihood of a ferry departure delay.


Project Requirements

  • πŸ€– Automate the pipeline
  • ⚠️ Receive alerts if there are issues in the pipeline
  • πŸ”„ Project is easy to maintain and iterate upon
  • Work is reusable by other teams, even if they don’t use R (Lookin’ at you )

Project Overview


Get Your Environment Set Up

Your Tools

Your Tools

Access Your Tools

Visit 🧚 https://ferryland.posit.team 🧚 to access:

Connect Setup // Step 1

Visit: https://pub.ferryland.posit.team

Make sure you have a GitHub account!

Connect Setup // Step 2

Workbench Setup // Step 1

Visit: https://dev.ferryland.posit.team

You do NOT need to re-authenticate with GitHub!

Workbench Setup // Step 2

  1. Click New Session
  2. Start a RStudio Pro session
  3. Create a new project from a version control repository
  4. Select Git
  5. Add the workshops GitHub Repo URL –> https://github.com/posit-conf-2024/ds-workflows-r.git
  6. Call the project directory ds-workflows-r
  7. Leave everything else as default. Select Create Project.

Workbench Setup // Step 2

Project and Activity Navigation

We will work exclusively within the πŸ“ materials directory and associated subfolders.

.
└── materials
    β”œβ”€β”€ 01-raw-data-write
    β”‚   β”œβ”€β”€ ...
    β”‚   β”œβ”€β”€ 01-raw-data-write.Rproj
    β”‚   └── 01-raw-data-write.qmd
    β”œβ”€β”€ 02-data-exploration
    β”‚   β”œβ”€β”€ ...
    β”‚   β”œβ”€β”€ 02-data-exploration.Rproj
    β”‚   └── 02-data-exploration.qmd
    └── ...
        └── ...

πŸ’‘ Within each directory, there is a .Rproj file.

  1. For an activity, open the respective .Rproj in the activity folder
  2. renv::restore() for each activity

Saving your work πŸ’Ύ

  • All source material can be found on the GitHub page: https://github.com/posit-conf-2024/ds-workflows-r
  • The environment we’re working on will stay on for a few days after conf…but that’s it!
  • If you would like to save your work, we recommend:
    • Exporting any source code to your local machine.
    • Fork the project to a personal GitHub Repo.