1  Connecting and interacting

1.1 Connect to a database

  1. Click on the Connections tab

  2. Click on the New Connection button

  3. Select warehouse

  4. Click OK

1.2 Explore the database using the RStudio IDE

  1. Expand the hive_metastore catalog

  2. Expand the defaults schema

  3. Expand the cars table

  4. Click on the table icon to the right of the cars table

  5. (Optional) Expand and explore the other tables

  6. Click on the disconnect icon to close the connection

1.3 Connecting using odbc::odbc()

https://solutions.posit.co/connections/db/databases/databricks/#using-the-odbcodbc-function

  1. Use the following code to start a new connection
library(DBI)

con <- dbConnect(
  odbc::odbc(),
  Driver = "/Library/simba/spark/lib/libsparkodbc_sb64-universal.dylib",
  Host = Sys.getenv("DATABRICKS_HOST"),
  Port = 443,
  AuthMech = 3,
  HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e",
  Protocol = "https",
  ThriftTransport = 2,
  SSL = 1,
  UID = "token",
  PWD = Sys.getenv("DATABRICKS_TOKEN")
)
  1. Disconnect from the database using dbDisconnect()
dbDisconnect(con)

1.4 Connecting using odbc::databricks()

https://solutions.posit.co/connections/db/databases/databricks/#using-the-new-odbcdatabricks-function

  1. Use the newer odbc::databricks() as the driver, which automatically sets the defaults for you. Also, it will automatically look for the Databricks Host and Token, so you won’t to specify it in the code.
con <- dbConnect(
  odbc::databricks(),
  HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e"
)

1.5 Interact with database

  1. Use dbListTables() to retrieve a list of tables
dbListTables(con)[1:4]
#> [1] "allergies"   "cars"        "conditions"  "covid_stats"
  1. Use dbListFields() to get the names of the fields of a given table
dbListFields(con, "cars")
#>  [1] "row_names" "mpg"       "cyl"       "disp"      "hp"        "drat"     
#>  [7] "wt"        "qsec"      "vs"        "am"        "gear"      "carb"
  1. Use dbGetQuery() to run a quick query
dbGetQuery(con, "SELECT * FROM cars LIMIT 5")
#>           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

1.6 Using knitr chunks

  1. Use the SQL chunk
SELECT * FROM cars LIMIT 5
5 records
row_names mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
  1. Use the output.var option to load results to a variable
SELECT * FROM cars LIMIT 5
  1. Test the variable
sql_top5
#>           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
  1. Disconnect from the database using dbDisconnect()
dbDisconnect(con)

1.7 RStudio SQL Script

Try out the new SQL Script support in RStudio

  1. Open the example-connecting.sql file

  2. Click the Preview button. It is located in the top-right area of the script

  3. In the script, change customer to orders

  4. Click on Preview again