library(DBI)
<- dbConnect(
con ::odbc(),
odbcDriver = "/Library/simba/spark/lib/libsparkodbc_sb64-universal.dylib",
Host = Sys.getenv("DATABRICKS_HOST"),
Port = 443,
AuthMech = 3,
HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e",
Protocol = "https",
ThriftTransport = 2,
SSL = 1,
UID = "token",
PWD = Sys.getenv("DATABRICKS_TOKEN")
)
1 Connecting and interacting
1.1 Connect to a database
Click on the
Connections
tabClick on the
New Connection
buttonSelect
warehouse
Click OK
1.2 Explore the database using the RStudio IDE
Expand the
hive_metastore
catalogExpand the
defaults
schemaExpand the
cars
tableClick on the table icon to the right of the
cars
table(Optional) Expand and explore the other tables
Click on the disconnect icon to close the connection
1.3 Connecting using odbc::odbc()
https://solutions.posit.co/connections/db/databases/databricks/#using-the-odbcodbc-function
- Use the following code to start a new connection
- Disconnect from the database using
dbDisconnect()
dbDisconnect(con)
1.4 Connecting using odbc::databricks()
https://solutions.posit.co/connections/db/databases/databricks/#using-the-new-odbcdatabricks-function
- Use the newer
odbc::databricks()
as the driver, which automatically sets the defaults for you. Also, it will automatically look for the Databricks Host and Token, so you won’t to specify it in the code.
<- dbConnect(
con ::databricks(),
odbcHTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e"
)
1.5 Interact with database
- Use
dbListTables()
to retrieve a list of tables
dbListTables(con)[1:4]
#> [1] "allergies" "cars" "conditions" "covid_stats"
- Use
dbListFields()
to get the names of the fields of a given table
dbListFields(con, "cars")
#> [1] "row_names" "mpg" "cyl" "disp" "hp" "drat"
#> [7] "wt" "qsec" "vs" "am" "gear" "carb"
- Use
dbGetQuery()
to run a quick query
dbGetQuery(con, "SELECT * FROM cars LIMIT 5")
#> row_names mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
1.6 Using knitr
chunks
- Use the SQL chunk
SELECT * FROM cars LIMIT 5
row_names | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
- Use the
output.var
option to load results to a variable
SELECT * FROM cars LIMIT 5
- Test the variable
sql_top5#> row_names mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
- Disconnect from the database using
dbDisconnect()
dbDisconnect(con)
1.7 RStudio SQL Script
Try out the new SQL Script support in RStudio
Open the example-connecting.sql file
Click the Preview button. It is located in the top-right area of the script
In the script, change customer to orders
Click on Preview again