MySQL

Create an MySQL Datasource

  1. Navigate to: Resources > Data Sources

  2. Select the MySQL Icon:

Data Source Inputs

  • Name (required)

  • Host (required):

    • Enter the hostname of your mysql database.

    • Example (using MySQL in AWS RDS): zepl-db-test.sddks8d7d89s.us-west-1.rds.amazonaws.com

  • Port (required): 3306 is the default port for MySQL

  • Database (optional):

    • If this is filled in, Zepl will append the database name to your connection string.

    • Example:zepl-db-test.sddks8d7d89s.us-west-1.rds.amazonaws.com/<database name>

  • Credential Type: See Data Source Security​

  • Username (required - can be entered later): Username used for connecting to MySQL

  • Password (required - can be entered later): Password used for connecting to MySQL

Use MySQL

Read

Python
Scala
PySpark
R
SQL
Python
%python
import pandas as pd
# Connect to MySQL using attached Data Source object called zepl_docs_mysql
conn = z.getDatasource("zepl_docs_mysql")
​
# Create cursor object
cursor = conn.cursor()
cursor.execute("USE employees")
​
# Query Table customers, fetch results, and return results to a list object (res)
cursor.execute('SELECT * FROM customers')
res = cursor.fetchall()
​
# Create Pandas DataFrame from query results
df = pd.DataFrame(data=res, columns = cursor.column_names)
​
# Display results
z.show(df)

Use z.getDatasource() to return MySQL connection object: mysql.connector.connect​

Scala
%spark
var url = "jdbc:mysql://<host>:<port>/<database>"
var table_name = "<table name>"
var driver = "com.mysql.jdbc.Driver" // Do not change
var user = "<username>"
var pass = "<password>"
​
// Create JDBC connection to MySQL using spark.read operations
val df = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table_name)
.option("user", user)
.option("password", pass)
.load()
​
// Display DataFrame
z.show(df)
PySpark

Option 1: Use Zepl's Data Source

Please follow guidance from the Python tab

Option 2: Using SparkSQL

%pyspark
​
url = "jdbc:mysql://<hostname>:<port>/<database_name>"
table_name = "<table_name>"
driver = "com.mysql.jdbc.Driver" # Do not change
user = "<username>"
password = "<password>"
​
# Create JDBC connection to MySQL using spark.read operations
df = spark.read\
.format("jdbc")\
.option("driver", driver)\
.option("url", url)\
.option("dbtable", table_name)\
.option("user", user)\
.option("password", password)\
.load()
​
# Display DataFrame
z.show(df)
R
%r
# TODO:
SQL
%datasource.<data source name here>
# Specify db to query from
USE employees;
​
# Read data through 'select' statement
SELECT * FROM customers;

Write

Python
Scala
PySpark
R
SQL
Python
%python
import pandas as pd
​
# Connect to MySQL using attached Data Source. Create connection object
conn = z.getDatasource("zepl_docs_mysql")
# Create cursor object
cursor = conn.cursor()
​
# Create Database
cursor.execute("CREATE DATABASE if not exists employees DEFAULT CHARACTER SET 'utf8'")
​
# Select Database - Not required if added to the Data Source connection
cursor. execute("USE employees")
​
# Create Table
cursor.execute("CREATE TABLE if not exists customers (name VARCHAR(255), address VARCHAR(255))")
​
# Create data to Insert
data=(['Sean Connery', '6925 Hollywood Blvd, Hollywood, CA 90028'], ['Dolly Parton', '6712 Hollywood Boulevard, Hollywood, CA 90028'])
df = pd.DataFrame(data, columns=("name", "address"))
​
# Insert Data
sql = "INSERT INTO customers(name, address) values(%s, %s)"
cursor.execute(sql, data[0])
cursor.execute(sql, data[1])
​
# Commit insert statments
conn.commit()
​
# Close connection
cursor.close()
Scala
%spark
import java.util.Properties
​
// Create Properties object for write function
val properties = new Properties();
​
// Create inputs
var url = "jdbc:mysql://<host>:<port>/<database>"
var table_name = "<table name>"
var driver = "com.mysql.jdbc.Driver" // Do not change
var user = "<username>"
var pass = "<password>"
​
// Set Properties
properties.setProperty("driver", driver);
properties.setProperty("user", user);
properties.setProperty("password", pass);
​
// Write df to MySQL table
df.write.mode("append").jdbc(url, table_name, properties)
PySpark

Option 1: Using Zepl's DataSource

Please follow guidance from the Python tab

Option 2: Using SparkSQL

%pyspark
url = "jdbc:mysql://<hostname>:<port>/<database_name>"
table_name = "<table_name>"
driver = "com.mysql.jdbc.Driver" # Do not change
user = "<username>"
password = "<password>"
​
df.write.format('jdbc').options(
url=url,
driver=driver,
dbtable=table_name,
user=user,
password=password)\
.mode('append')\
.save()
R
%r
# TODO:
SQL
%datasource.zepl_docs_mysql
# Specify db to query from
USE employees;
​
# Write data through 'insert' statement
INSERT INTO customers(name, address) values("Chris Rock", "7021 Hollywood Boulevard, Hollywood, CA 90028");

Configure Authentication in MySQL

In your MySQL database be sure that these items are configured properly:

  • MySQL should be accessible to Zepl's whitelisted IP addresses

  • The user associated with your user name and password have access to the desired databases and tables

  • User should have read and write permissions on the desired databases and tables

​