Zepl
Search…
⌃K

PostgreSQL

Create an PostgreSQL Datasource

  1. 1.
    Navigate to: Resources > Data Sources
  2. 2.
    Select the PostgreSQL Icon:

Data Source Inputs

  • Name (required)
  • Host (required):
    • Enter the hostname of your PostgreSQL database.
    • Example (using PostgreSQL in AWS RDS): postgresql-zepl-docs.si89zol2ksop.us-west-1.rds.amazonaws.com
  • Port (required): 5432 is the default port for PostgreSQL
  • Database (optional):
    • If this is filled in, Zepl will append the database name to your connection string.
    • If this is blank, the connection may use the default database set by your administrator
    • Example:postgresql-zepl-docs.si89zol2ksop.us-west-1.rds.amazonaws.com
      /<database name>
  • Credential Type: See Data Source Security
  • Username (required - can be entered later): Username used for connecting to PostgreSQL
  • Password (required - can be entered later): Password used for connecting to PostgreSQL

Use PostgreSQL

Read

Python
Scala
PySpark
R
SQL
%python
import pandas as pd
#Open a connection and create a cursor object to perform database operations
conn = z.getDatasource("zepl_docs_postgresql")
cur = conn.cursor()
### Option 1: Read into Python Pandas DataFrame object ###
df = pd.read_sql_query("SELECT * FROM zepl_doc;", conn)
### Option 2: Read into Python list object ###
cur.execute("SELECT * FROM zepl_doc;")
res = cur.fetchall()
# Display Results from dataframe (df) or list object (res)
z.show(df)
# Make the changes to the database persistent
conn.commit()
# Close communication with the database
cur.close()
conn.close()
Use z.getDatasource() to return PostgreSQL connection object: psycopg2.extensions.connection
%spark
// Fill in your variables, replace <variablename>
var url = "jdbc:postgresql://<host>:<port>/<database>"
var table_name = "<table name>"
var driver = "org.postgresql.Driver" // Do not change
var user = "<username>"
var pass = "<password>"
// Create JDBC connection to PostgreSQL using spark.read operations
val df = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table_name)
.option("user", user)
.option("password", pass)
.load()
// Display DataFrame
z.show(df)
This method does not use the Zepl data source. Please remove all PostgreSQL data sources attached to the notebook

Option 1: Use Zepl's Data Source

Please follow guidance from the Python tab

Option 2: Using SparkSQL

%pyspark
url = "jdbc:mysql://<hostname>:<port>/<database_name>"
table_name = "<table_name>"
driver = "com.mysql.jdbc.Driver" # Do not change
user = "<username>"
password = "<password>"
df.write.format('jdbc').options(
url=url,
driver=driver,
dbtable=table_name,
user=user,
password=password)\
.mode('append')\
.save()
TODO:
%datasource.<data source name here>
# Query database table zepl_doc
SELECT * FROM zepl_doc

Write

Python
Scala
PySpark
R
%python
#Open a connection and create a cursor object to perform database operations
conn = z.getDatasource("zepl_docs_postgresql")
cur = conn.cursor()
# Create table called zepl_doc if not exists
cur.execute("CREATE TABLE if not exists zepl_doc (id serial PRIMARY KEY, num integer, data varchar);")
# Insert data (100, "xyz") into the table zepl_doc
cur.execute("INSERT INTO zepl_doc (num, data) VALUES (%s, %s)", (100, "xyz"))
# Persist changes to the database
conn.commit()
# Close communication with the database
cur.close()
conn.close()
%spark
import java.util.Properties
// Create Properties object for write function
val properties = new Properties();
// Create inputs
var url = "jdbc:postgresql://<host>:<port>/<database>"
var table_name = "<table name>"
var driver = "org.postgresql.Driver" // Do not change
var user = "<username>"
var pass = "<password>"
// Set Properties
properties.setProperty("driver", driver);
properties.setProperty("user", user);
properties.setProperty("password", pass);
// If the DataFrame contains the Primary Key field (in this case 'id'), must drop this before calling 'df.write' to avoid primary key collision
var data = df.drop("id")
// Write SparkSQL DataFrame to PostgreSQL table
data.write.mode("append").jdbc(url, table_name, properties)
This method does not use the Zepl data source our data source. Please remove all PostgreSQL data sources from the notebook.

Option 1: Use Zepl's Data Source

Please follow guidance from the Python tab

Option 2: Using SparkSQL

%pyspark
# Fill in your variables, replace <variablename>
url = "jdbc:postgresql://<host>:<port>/<database>"
table_name = "<table name>"
driver = "org.postgresql.Driver" # Do not change
user = "<username>"
password = "<password>"
# If the DataFrame contains the Primary Key field (in this case 'id'), must drop this before calling 'df.write' to avoid primary key collision
data = df.drop("id")
data.write.format('jdbc').options(
url=url,
driver=driver,
dbtable=table_name,
user=user,
password=password)\
.mode('append')\
.save()
TODO:

Configure Authentication in PostgreSQL

In your PostgreSQL database be sure that these items are configured properly:
  • PostgreSQL should be accessible to Zepl's whitelisted IP addresses
  • The user associated with your user name and password have access to the desired databases and tables
  • User should have read and write permissions on the desired databases and tables