8 Working with LLMs in Databricks

Catch up

library(dplyr)
library(dbplyr)
library(DBI)

con <- dbConnect(
  odbc::databricks(),
  HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e"
)

8.1 Accessing AI functions

Use the sentiment classification function

Create a quick review table using the following code:

reviews <- tribble(
  ~name,     ~review, 
  "adam",    "This is the best toaster I have ever bought",
  "berry",   "Toaster arrived broken, waiting for replancement",
  "charles", "The washing machine is as advertised, can't wait to use it",
  "dan",     "Not sure how to feel about this tevelision, nice brightness but bad definition"
) |> 
  select(review)

Copy the reviews data frame to your SQL session. Assign it to a variable called tbl_reviews

tbl_reviews <- copy_to(con, reviews, overwrite = TRUE)

Create a new field called “sentiment”, use ai_analyze_sentiment() to analyze the “review” field

tbl_reviews |> 
  mutate(sentiment = ai_analyze_sentiment(review))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                             sentiment
#>   <chr>                                                              <chr>    
#> 1 This is the best toaster I have ever bought                        positive 
#> 2 Toaster arrived broken, waiting for replancement                   negative 
#> 3 The washing machine is as advertised, cant wait to use it          positive 
#> 4 Not sure how to feel about this tevelision, nice brightness but b… mixed

8.2 Specify array

Using array() to run the classification function

Use ai_classify() to find out if we need to follow up with customer. The two options should be: ‘order complete’, and ‘need follow up’. Use array() as if you would be using the c() function. Name the new field “follow_up”

tbl_reviews |> 
  mutate(
    follow_up = ai_classify(review, array("order complete", "need follow up"))
    )
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                        follow_up     
#>   <chr>                                                         <chr>         
#> 1 This is the best toaster I have ever bought                   order complete
#> 2 Toaster arrived broken, waiting for replancement              need follow up
#> 3 The washing machine is as advertised, cant wait to use it     order complete
#> 4 Not sure how to feel about this tevelision, nice brightness … need follow up

Add a step that keeps only those orders that need follow up

tbl_reviews |> 
  mutate(
    follow_up = ai_classify(review, array("order complete", "need follow up"))
    ) |> 
  filter(follow_up == "need follow up")
#> # Source:   SQL [2 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                        follow_up     
#>   <chr>                                                         <chr>         
#> 1 Toaster arrived broken, waiting for replancement              need follow up
#> 2 Not sure how to feel about this tevelision, nice brightness … need follow up

8.3 Process complex output

Working STRUCT output from an ‘ai’ function

Use ai_extract() to pull the type of product being referred to in the review. Pass ‘product’ as the extract argument, and pass it inside an array() call. Name the new field “product”

tbl_reviews |> 
  mutate(product = ai_extract(review, array("product")))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                  product             
#>   <chr>                                                   <chr>               
#> 1 This is the best toaster I have ever bought             "{\"product\":\"toa…
#> 2 Toaster arrived broken, waiting for replancement        "{\"product\":\"Toa…
#> 3 The washing machine is as advertised, cant wait to use… "{\"product\":\"was…
#> 4 Not sure how to feel about this tevelision, nice brigh… "{\"product\":\"tel…

Append a compute() step, and assign to a new variable called tbl_review

tbl_review <- tbl_reviews |> 
  mutate(product = ai_extract(review, array("product"))) |> 
  compute()

Preview tbl_review

tbl_review 
#> # Source:   table<`dbplyr_MjqcmuPoJy`> [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                  product             
#>   <chr>                                                   <chr>               
#> 1 This is the best toaster I have ever bought             "{\"product\":\"toa…
#> 2 Toaster arrived broken, waiting for replancement        "{\"product\":\"Toa…
#> 3 The washing machine is as advertised, cant wait to use… "{\"product\":\"was…
#> 4 Not sure how to feel about this tevelision, nice brigh… "{\"product\":\"tel…

Pass tbl_review to show_query() to confirm that it is pulling from a new temporary table

tbl_review |> 
  show_query()
#> <SQL>
#> SELECT *
#> FROM `dbplyr_MjqcmuPoJy`

Coerce “product” to a character

tbl_review |> 
  mutate(product = as.character(product))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                     product          
#>   <chr>                                                      <chr>            
#> 1 This is the best toaster I have ever bought                {toaster}        
#> 2 Toaster arrived broken, waiting for replancement           {Toaster}        
#> 3 The washing machine is as advertised, cant wait to use it  {washing machine}
#> 4 Not sure how to feel about this tevelision, nice brightne… {television}

Wrap the as.character() call, inside a tolower() call

tbl_review |> 
  mutate(product = tolower(as.character(product)))
#> # Source:   SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   review                                                     product          
#>   <chr>                                                      <chr>            
#> 1 This is the best toaster I have ever bought                {toaster}        
#> 2 Toaster arrived broken, waiting for replancement           {toaster}        
#> 3 The washing machine is as advertised, cant wait to use it  {washing machine}
#> 4 Not sure how to feel about this tevelision, nice brightne… {television}

Add a count step, that breaks down the reviews by product

tbl_review |> 
  mutate(product = tolower(as.character(product))) |> 
  count(product)
#> # Source:   SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#>   product                 n
#>   <chr>             <int64>
#> 1 {toaster}               2
#> 2 {washing machine}       1
#> 3 {television}            1

8.4 Introducing `chattr`

Request a simple example from LLM

Load the chattr library

library(chattr)

Call chattr_app() and then select the first model (DBRX)

chattr_app()

In the app type, and run: “show me a simple example of a ggplot using the mtcars data set”
Copy the code into a script and test

Catch up

8.1 Accessing AI functions

8.2 Specify array

8.3 Process complex output

8.4 Introducing chattr

8.4 Introducing `chattr`