Zepl
Search
⌃K

Snowflake

Create a Snowflake Data Source

  1. 1.
    Navigate to: Resources > Data Sources
  2. 2.
    Select the Snowflake Icon:

Data Source Inputs

Name (required): Once set, this cannot be edited
The following inputs are can be edited by any user on the Zepl platform:
  • Account (required):
    • Enter your Snowflake URL without the '.snowflakecomputing.com'
  • Warehouse (optional - can be specified at the beginning of the session)
  • Database (optional - can be specified in each query)
  • Schema (optional - can be specified in each query)
  • Credential Type:
The following inputs must be entered by each user, unless a Shared Credential Type is selected:
  • Username / Password: See Configure Authentication below for configuration details
  • SSO: See Configure Authentication below for configuration details
  • Role (optional):
    • This value must correspond to a Snowflake Role which has been assigned to the entered user credential. If nothing is entered, the users DEFAULT role will be selected.

Using the Snowflake Data Source

Read from Snowflake

Python
Scala
PySpark
R
SQL
%python
import pandas as pd
# Create a Snowflake Cursor Object
cur = z.getDatasource("TPCH_SF10")
# Create a Snowflake Connection Object. Append '_con' to your data source name to get a Connection object
con = z.getDatasource("TPCH_SF10_con")
# Un-comment if warehouse is not specified in the Data Source
# cur.execute("USE WAREHOUSE <warehouse>")
# execute query
cur.execute("SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 100")
# convert datasource into pandas dataframe
df = cur.fetch_pandas_all()
# print dataframe as table
z.show(df)

Considerations while reading from Snowflake

Use z.getDatasource(TPCH_SF10) to return multiple Snowflake object types:
If the Snowflake Warehouse is not specified in the data source object, you will need to specify this in the code. Uncomment the line of code that specifies warehouse in each example below.
In the Python example, uncomment this line of code: cur.execute("USE WAREHOUSE <warehouse>")
%spark
// # Un-comment if warehouse is not specified in the Data Source
// When un-commented, add .options(sfOptions) to connection string below
// var sfOptions = Map(
// "sfDatabase" -> "SNOWFLAKE_SAMPLE_DATA",
// "sfSchema" -> "TPCH_SF10_CUSTOMER",
// "sfWarehouse" -> "COMPUTE_WH"
// )
val df = z.getDatasource("TPCH_SF10")
.asInstanceOf[org.apache.spark.sql.DataFrameReader]
.option("query", "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 100")
.load()
z.show(df)
%pyspark
# Un-comment if warehouse is not specified in the Data Source
# When un-commented, add .options(**sfOptions) to connection string below
# sfOptions = {
# "sfDatabase": "SNOWFLAKE_SAMPLE_DATA",
# "sfSchema": "TPCH_SF10",
# "sfWarehouse": "DEMO_WH" # Must match warehouse set in datasource
# }
df = z.getDatasource("TPCH_SF10")\
.option("query", "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 100")\
.load()
z.show(df)
%r
# conn is an DBIConnection object returned by dbConnect() from DPLYR library
conn <- z.getDatasource("TPCH_SF10")
# Un-comment if warehouse is not specified in the Data Source
#dbGetQuery(conn$con, "USE WAREHOUSE COMPUTE_WH")
# use dbGetQuery() from DPLYR library to run query
res <- dbGetQuery(conn$con, "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 100")
# z.show() works for the results returned from dbGetQuery()
z.show(res)
%datasource.TPCH_SF10
SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.CUSTOMER LIMIT 100

Write to Snowflake

Python
Scala
PySpark
R
%python
from snowflake.connector.pandas_tools import write_pandas
import pandas as pd
# Append '_con' to your data source name to get a Connection object
con = z.getDatasource("TPCH_SF10_con")
# Uncomment if Database and Schema are not included in the data source inputs
# cur = ctx.cursor()
# cur.execute("USE SCHEMA DB.SCHEMA")
# Create random pandas DataFrame to write to Snowflake (4 x 15)
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD')))
# Write data from the DataFrame to the a table named "TRIPS_FORECAST".
# Table in Snowflake must exist and have an identical schema (4 x 15) as the DataFrame
# Column names are case sensitive. Snowflake expects UPPER CASE column names, so add this line to convert df column names to upper case: df = df.rename(lambda x: x.upper(), axis=1)
success, nchunks, nrows, _ = write_pandas(con, df, "TRIPS_FORECAST")
# Print Results
print("Success: ", success)
print("Number of records added: ", nrows)

Considerations while writing to Snowflake

Use z.getDatasource(TPCH_SF10) to return multiple Snowflake object types:
Requirements to consider when using write_pandas(con, pandas.DataFrame, table_name): Snowflake Python Documentation
  • Before invoking this method, the table must exist in Snowflake
  • Both the pandas.DataFrame and the table_namereferenced in the function call must have the same schema
  • The pandas.DataFrame referenced in the function call must have all UPPER CASE column names. Snowflake will insert NULL values in your table if this condition is not met. Use this code snippet to convert column names to upper case before invoking the write function:df = df.rename(lambda x: x.upper(), axis=1)
If the Snowflake Warehouse is not specified in the data source object, you will need to specify this in the code. Uncomment the line of code that specifies warehouse in each example below.
In the Python example, uncomment this line of code: cur.execute("USE WAREHOUSE <warehouse>")
val SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
var sfOptions = Map(
"sfURL" -> "<account_name>.snowflakecomputing.com",
"sfUser" -> "<user_name>",
"sfPassword" -> "<password>",
"sfDatabase" -> "<database>",
"sfSchema" -> "<schema>",
"sfWarehouse" -> "<warehouse>"
)
df.write
.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("dbtable", "<TABLE NAME TO WRITE TO>")
.mode("append")
.save()
%pyspark
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
sfOptions = {
"sfURL": "<account_name>.snowflakecomputing.com",
"sfUser": "<user_name>",
"sfPassword": "<password>",
"sfDatabase": "<database>",
"sfSchema": "<schema>",
"sfWarehouse": "<warehouse>"
}
df.write\
.format(SNOWFLAKE_SOURCE_NAME)\
.options(**sfOptions)\
.option("dbtable", "<TABLE NAME TO WRITE TO>")\
.mode("append")\
.save()
%r
# conn is an DBIConnection object returned by dbConnect() from DPLYR library
conn <- z.getDatasource("TPCH_SF10")
# Un-comment if NO warehouse has been set in the datasource above
dbGetQuery(conn$con, "USE WAREHOUSE DEMO_WH")
# Write to Snowflake. "SAMPLE_DB.PUBLIC.TRIPS_FORECAST" will be created by R
dbWriteTable(conn$con, "SAMPLE_DB.PUBLIC.TRIPS_FORECAST", res)

Configure Authentication

Username / Password

By default, Snowflake is configured to use username and passwords for authentication. Each user will enter his or her Snowflake username and password into the data source inputs. This only is required once per user, however, if a user does not enter his or her credential information, they will be able to access specified Snowflake database, warehouse, or schema.
Any user Role that is entered into Zepl's Role field, must be a Role that is assigned to the Snowflake user account in Snowflake. If nothing is entered in the Role field, Zepl will use the Snowflake user's default role.

SSO

If your Snowflake instance is configured to login through an external authentication provider, use the SSO Data Source configuration option. Your Snowflake administrator will need to go through the steps below to enable connectivity between Zepl and Snowflake.

Configure SSO Authentication in Snowflake

Step 1. Create an OAuth integration

To create an OAuth integration, execute the following CREATE SECURITY INTEGRATION command in Snowflake with an account that has sufficient privilege.
CREATE SECURITY INTEGRATION ZEPL_SSO_INTEGRATION
TYPE = OAUTH
ENABLED = TRUE
OAUTH_CLIENT = CUSTOM
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = 'https://app.zepl.com/callbacks/snowflake-datasource-sso'
OAUTH_ISSUE_REFRESH_TOKENS = TRUE
OAUTH_REFRESH_TOKEN_VALIDITY = 7776000
Only users with the ACCOUNTADMIN role or a role with the global CREATE INTEGRATION privilege can execute this SQL
For more information on customizing the configurations for an OAuth integration, please see OAuth Custom Client Parameters​.
Please note that OAuth integration should be specified as following:
  • OAUTH_CLIENT_TYPE must be set to CONFIDENTIAL. It allows the client (Zepl) to store a secret.
  • OAUTH_REDIRECT_URI must be set to https://app.zepl.com/callbacks/snowflake-datasource-sso. After a user is authenticated, the web browser is redirected to this URI.
  • OAUTH_ISSUE_REFRESH_TOKENS must be set to TRUE. It allows the client (Zepl) to exchange a refresh token for an access token when the current access token has expired.
  • OAUTH_REFRESH_TOKEN_VALIDITY specifies how long an authentication should be valid (in seconds). The value is default to 7776000 (90 days). After the specified period passed after the authentication, you should re-authenticate in the detail page of the data source to continue the use of the data source.
You can also authenticate through an external, SAML 2.0-compliant identity provider (IdP) by enabling federated authentication. See Configuring Snowflake to Use Federated Authentication for more details.

Step 2. Create a Snowflake data source in Zepl

After integrating OAuth, create a Snowflake data source in Zepl using SSO credentials:
  1. 1.
    Execute the Snowflake function SELECT SYSTEM$SHOW_OAUTH_CLIENT_SECRETS('ZEPL_SSO_INTEGRATION') in Snowflake to retrieve the client ID and secret for the integration you’ve created in the previous step and note the client ID and secret for the later step. Note that the integration name is case-sensitive and must be uppercase and enclosed in single quotes.
  2. 2.
    In the Zepl application, go to the Data Sources page and choose Snowflake in the Add Data Sources section.
  3. 3.
    Enter the unique name for the data source in Name field and account name provided by Snowflake in Account field.
  4. 4.
    In My Credentials section, choose SSO.
  5. 5.
    Click the Connect to Snowflake button. It will open the pop-up with the following title: Log in to Snowflake to continue to ZEPL_SSO_INTEGRATION. If it does not open the pop-up, please make sure that your browser is not blocking the pop-up. It will show the error message if the provided account name is invalid or does not match with the client ID.
  6. 6.
    In the pop-up, enter your Snowflake credentials and choose the Log In button. You can also sign in via the IdP if federated authentication is enabled in the provided Snowflake account. The pop-up will be closed and you will see the message that you are successfully authenticated.
  7. 7.
    Click the Save button. If the provided client secret is invalid or does not match with the client ID, you will see the error message and the data source will not be saved.
By default, Snowflake blocks the use of ACCOUNTADMIN and SECURITYADMIN roles for OAuth sessions. Authentication will fail if either role is added to the Zepl Data Source.

Limitation

  • Expiration of the authentication: The integration uses the refresh token to authenticate with Snowflake. The refresh tokens generated by Snowflake have a limited lifetime up to 90 days due to security reasons. The expiration of the refresh token can be configured by specifying the OAUTH_REFRESH_TOKEN_VALIDITY parameter. Every (up to) 90 days, the user who created the Snowflake data source with SSO should go to the detail page of the Snowflake data source and re-authenticate with a Snowflake account by clicking the Connect to Snowflake button to continue the use of the data source.
The OAuth integration (client ID and secret) does not have an expiration and a user does not have to re-create the OAuth integration
  • JDBC interpreter is not supported.
  • Test connection is not supported.