library(dplyr)
library(dbplyr)
library(DBI)
<- dbConnect(
con ::databricks(),
odbcHTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e"
)
8 Working with LLMs in Databricks
Catch up
8.1 Accessing AI functions
Use the sentiment classification function
- Create a quick review table using the following code:
<- tribble(
reviews ~name, ~review,
"adam", "This is the best toaster I have ever bought",
"berry", "Toaster arrived broken, waiting for replancement",
"charles", "The washing machine is as advertised, can't wait to use it",
"dan", "Not sure how to feel about this tevelision, nice brightness but bad definition"
|>
) select(review)
- Copy the
reviews
data frame to your SQL session. Assign it to a variable calledtbl_reviews
<- copy_to(con, reviews, overwrite = TRUE) tbl_reviews
- Create a new field called “sentiment”, use
ai_analyze_sentiment()
to analyze the “review” field
|>
tbl_reviews mutate(sentiment = ai_analyze_sentiment(review))
#> # Source: SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review sentiment
#> <chr> <chr>
#> 1 This is the best toaster I have ever bought positive
#> 2 Toaster arrived broken, waiting for replancement negative
#> 3 The washing machine is as advertised, cant wait to use it positive
#> 4 Not sure how to feel about this tevelision, nice brightness but b… mixed
8.2 Specify array
Using array() to run the classification function
- Use
ai_classify()
to find out if we need to follow up with customer. The two options should be: ‘order complete’, and ‘need follow up’. Usearray()
as if you would be using thec()
function. Name the new field “follow_up”
|>
tbl_reviews mutate(
follow_up = ai_classify(review, array("order complete", "need follow up"))
)#> # Source: SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review follow_up
#> <chr> <chr>
#> 1 This is the best toaster I have ever bought order complete
#> 2 Toaster arrived broken, waiting for replancement need follow up
#> 3 The washing machine is as advertised, cant wait to use it order complete
#> 4 Not sure how to feel about this tevelision, nice brightness … need follow up
- Add a step that keeps only those orders that need follow up
|>
tbl_reviews mutate(
follow_up = ai_classify(review, array("order complete", "need follow up"))
|>
) filter(follow_up == "need follow up")
#> # Source: SQL [2 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review follow_up
#> <chr> <chr>
#> 1 Toaster arrived broken, waiting for replancement need follow up
#> 2 Not sure how to feel about this tevelision, nice brightness … need follow up
8.3 Process complex output
Working STRUCT output from an ‘ai’ function
- Use
ai_extract()
to pull the type of product being referred to in the review. Pass ‘product’ as the extract argument, and pass it inside anarray()
call. Name the new field “product”
|>
tbl_reviews mutate(product = ai_extract(review, array("product")))
#> # Source: SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review product
#> <chr> <chr>
#> 1 This is the best toaster I have ever bought "{\"product\":\"toa…
#> 2 Toaster arrived broken, waiting for replancement "{\"product\":\"Toa…
#> 3 The washing machine is as advertised, cant wait to use… "{\"product\":\"was…
#> 4 Not sure how to feel about this tevelision, nice brigh… "{\"product\":\"tel…
- Append a
compute()
step, and assign to a new variable calledtbl_review
<- tbl_reviews |>
tbl_review mutate(product = ai_extract(review, array("product"))) |>
compute()
- Preview
tbl_review
tbl_review #> # Source: table<`dbplyr_MjqcmuPoJy`> [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review product
#> <chr> <chr>
#> 1 This is the best toaster I have ever bought "{\"product\":\"toa…
#> 2 Toaster arrived broken, waiting for replancement "{\"product\":\"Toa…
#> 3 The washing machine is as advertised, cant wait to use… "{\"product\":\"was…
#> 4 Not sure how to feel about this tevelision, nice brigh… "{\"product\":\"tel…
- Pass
tbl_review
toshow_query()
to confirm that it is pulling from a new temporary table
|>
tbl_review show_query()
#> <SQL>
#> SELECT *
#> FROM `dbplyr_MjqcmuPoJy`
- Coerce “product” to a character
|>
tbl_review mutate(product = as.character(product))
#> # Source: SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review product
#> <chr> <chr>
#> 1 This is the best toaster I have ever bought {toaster}
#> 2 Toaster arrived broken, waiting for replancement {Toaster}
#> 3 The washing machine is as advertised, cant wait to use it {washing machine}
#> 4 Not sure how to feel about this tevelision, nice brightne… {television}
- Wrap the
as.character()
call, inside atolower()
call
|>
tbl_review mutate(product = tolower(as.character(product)))
#> # Source: SQL [4 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> review product
#> <chr> <chr>
#> 1 This is the best toaster I have ever bought {toaster}
#> 2 Toaster arrived broken, waiting for replancement {toaster}
#> 3 The washing machine is as advertised, cant wait to use it {washing machine}
#> 4 Not sure how to feel about this tevelision, nice brightne… {television}
- Add a count step, that breaks down the reviews by product
|>
tbl_review mutate(product = tolower(as.character(product))) |>
count(product)
#> # Source: SQL [3 x 2]
#> # Database: Spark SQL 3.1.1[token@Spark SQL/hive_metastore]
#> product n
#> <chr> <int64>
#> 1 {toaster} 2
#> 2 {washing machine} 1
#> 3 {television} 1
8.4 Introducing chattr
Request a simple example from LLM
- Load the
chattr
library
library(chattr)
- Call
chattr_app()
and then select the first model (DBRX)
chattr_app()
In the app type, and run: “show me a simple example of a ggplot using the mtcars data set”
Copy the code into a script and test