8  Working with LLMs in Databricks

Catch up

library(dplyr)
library(dbplyr)
library(DBI)

con <- dbConnect(
  odbc::databricks(),
  HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e"
)

8.1 Accessing AI functions

Use the sentiment classification function

  1. Create a quick review table using the following code:
reviews <- tribble(
  ~name,     ~review, 
  "adam",    "This is the best toaster I have ever bought",
  "berry",   "Toaster arrived broken, waiting for replancement",
  "charles", "The washing machine is as advertised, can't wait to use it",
  "dan",     "Not sure how to feel about this tevelision, nice brightness but bad definition"
) |> 
  select(review) 
  1. Copy the reviews data frame to your SQL session. Assign it to a variable called tbl_reviews
tbl_reviews <- copy_to(con, reviews, overwrite = TRUE)
  1. Create a new field called “sentiment”, use ai_analyze_sentiment() to analyze the “review” field
tbl_reviews |> 
  mutate(sentiment = ai_analyze_sentiment(review))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                             sentiment
#>   <chr>                                                              <chr>    
#> 1 This is the best toaster I have ever bought                        positive 
#> 2 Toaster arrived broken, waiting for replancement                   negative 
#> 3 The washing machine is as advertised, cant wait to use it          positive 
#> 4 Not sure how to feel about this tevelision, nice brightness but b… mixed

8.2 Specify array

Using array() to run the classification function

  1. Use ai_classify() to find out if we need to follow up with customer. The two options should be: ‘order complete’, and ‘need follow up’. Use array() as if you would be using the c() function. Name the new field “follow_up”
tbl_reviews |> 
  mutate(
    follow_up = ai_classify(review, array("order complete", "need follow up"))
    )
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                        follow_up     
#>   <chr>                                                         <chr>         
#> 1 This is the best toaster I have ever bought                   order complete
#> 2 Toaster arrived broken, waiting for replancement              need follow up
#> 3 The washing machine is as advertised, cant wait to use it     order complete
#> 4 Not sure how to feel about this tevelision, nice brightness … need follow up
  1. Add a step that keeps only those orders that need follow up
tbl_reviews |> 
  mutate(
    follow_up = ai_classify(review, array("order complete", "need follow up"))
    ) |> 
  filter(follow_up == "need follow up")
#> # Source:   SQL [2 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                        follow_up     
#>   <chr>                                                         <chr>         
#> 1 Toaster arrived broken, waiting for replancement              need follow up
#> 2 Not sure how to feel about this tevelision, nice brightness … need follow up

8.3 Process complex output

Working STRUCT output from an ‘ai’ function

  1. Use ai_extract() to pull the type of product being referred to in the review. Pass ‘product’ as the extract argument, and pass it inside an array() call. Name the new field “product”
tbl_reviews |> 
  mutate(product = ai_extract(review, array("product")))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                  product             
#>   <chr>                                                   <chr>               
#> 1 This is the best toaster I have ever bought             "{\"product\":\"toa…
#> 2 Toaster arrived broken, waiting for replancement        "{\"product\":\"Toa…
#> 3 The washing machine is as advertised, cant wait to use… "{\"product\":\"was…
#> 4 Not sure how to feel about this tevelision, nice brigh… "{\"product\":\"tel…
  1. Append a compute() step, and assign to a new variable called tbl_review
tbl_review <- tbl_reviews |> 
  mutate(product = ai_extract(review, array("product"))) |> 
  compute()
  1. Preview tbl_review
tbl_review 
#> # Source:   table<`dbplyr_MjqcmuPoJy`> [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                  product             
#>   <chr>                                                   <chr>               
#> 1 This is the best toaster I have ever bought             "{\"product\":\"toa…
#> 2 Toaster arrived broken, waiting for replancement        "{\"product\":\"Toa…
#> 3 The washing machine is as advertised, cant wait to use… "{\"product\":\"was…
#> 4 Not sure how to feel about this tevelision, nice brigh… "{\"product\":\"tel…
  1. Pass tbl_review to show_query() to confirm that it is pulling from a new temporary table
tbl_review |> 
  show_query()
#> <SQL>
#> SELECT *
#> FROM `dbplyr_MjqcmuPoJy`
  1. Coerce “product” to a character
tbl_review |> 
  mutate(product = as.character(product))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                     product          
#>   <chr>                                                      <chr>            
#> 1 This is the best toaster I have ever bought                {toaster}        
#> 2 Toaster arrived broken, waiting for replancement           {Toaster}        
#> 3 The washing machine is as advertised, cant wait to use it  {washing machine}
#> 4 Not sure how to feel about this tevelision, nice brightne… {television}
  1. Wrap the as.character() call, inside a tolower() call
tbl_review |> 
  mutate(product = tolower(as.character(product)))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                     product          
#>   <chr>                                                      <chr>            
#> 1 This is the best toaster I have ever bought                {toaster}        
#> 2 Toaster arrived broken, waiting for replancement           {toaster}        
#> 3 The washing machine is as advertised, cant wait to use it  {washing machine}
#> 4 Not sure how to feel about this tevelision, nice brightne… {television}
  1. Add a count step, that breaks down the reviews by product
tbl_review |> 
  mutate(product = tolower(as.character(product))) |> 
  count(product)
#> # Source:   SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   product                 n
#>   <chr>             <int64>
#> 1 {toaster}               2
#> 2 {washing machine}       1
#> 3 {television}            1

8.4 Introducing chattr

Request a simple example from LLM

  1. Load the chattr library
library(chattr)
  1. Call chattr_app() and then select the first model (DBRX)
chattr_app()
  1. In the app type, and run: “show me a simple example of a ggplot using the mtcars data set”

  2. Copy the code into a script and test