Zepl
Search…
PostgreSQL

  1. 1.
    Navigate to: Resources > Data Sources
  2. 2.
    Select the PostgreSQL Icon:

  • Name (required)
  • Host (required):
    • Enter the hostname of your PostgreSQL database.
    • Example (using PostgreSQL in AWS RDS): postgresql-zepl-docs.si89zol2ksop.us-west-1.rds.amazonaws.com
  • Port (required): 5432 is the default port for PostgreSQL
  • Database (optional):
    • If this is filled in, Zepl will append the database name to your connection string.
    • If this is blank, the connection may use the default database set by your administrator
    • Example:postgresql-zepl-docs.si89zol2ksop.us-west-1.rds.amazonaws.com
      /<database name>
  • Credential Type: See Data Source Security​
  • Username (required - can be entered later): Username used for connecting to PostgreSQL
  • Password (required - can be entered later): Password used for connecting to PostgreSQL

Python
Scala
PySpark
R
SQL
%python
import pandas as pd
​
#Open a connection and create a cursor object to perform database operations
conn = z.getDatasource("zepl_docs_postgresql")
cur = conn.cursor()
​
### Option 1: Read into Python Pandas DataFrame object ###
df = pd.read_sql_query("SELECT * FROM zepl_doc;", conn)
​
### Option 2: Read into Python list object ###
cur.execute("SELECT * FROM zepl_doc;")
res = cur.fetchall()
​
# Display Results from dataframe (df) or list object (res)
z.show(df)
​
# Make the changes to the database persistent
conn.commit()
​
# Close communication with the database
cur.close()
conn.close()
Use z.getDatasource() to return PostgreSQL connection object: psycopg2.extensions.connection​
%spark
// Fill in your variables, replace <variablename>
var url = "jdbc:postgresql://<host>:<port>/<database>"
var table_name = "<table name>"
var driver = "org.postgresql.Driver" // Do not change
var user = "<username>"
var pass = "<password>"
​
// Create JDBC connection to PostgreSQL using spark.read operations
val df = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table_name)
.option("user", user)
.option("password", pass)
.load()
​
// Display DataFrame
z.show(df)
This method does not use the Zepl data source. Please remove all PostgreSQL data sources attached to the notebook

Please follow guidance from the Python tab

%pyspark
url = "jdbc:mysql://<hostname>:<port>/<database_name>"
table_name = "<table_name>"
driver = "com.mysql.jdbc.Driver" # Do not change
user = "<username>"
password = "<password>"
​
df.write.format('jdbc').options(
url=url,
driver=driver,
dbtable=table_name,
user=user,
password=password)\
.mode('append')\
.save()
TODO:
%datasource.<data source name here>
​
# Query database table zepl_doc
SELECT * FROM zepl_doc

Python
Scala
PySpark
R
%python
​
#Open a connection and create a cursor object to perform database operations
conn = z.getDatasource("zepl_docs_postgresql")
cur = conn.cursor()
​
# Create table called zepl_doc if not exists
cur.execute("CREATE TABLE if not exists zepl_doc (id serial PRIMARY KEY, num integer, data varchar);")
​
# Insert data (100, "xyz") into the table zepl_doc
cur.execute("INSERT INTO zepl_doc (num, data) VALUES (%s, %s)", (100, "xyz"))
​
# Persist changes to the database
conn.commit()
​
# Close communication with the database
cur.close()
conn.close()
%spark
import java.util.Properties
​
// Create Properties object for write function
val properties = new Properties();
​
// Create inputs
var url = "jdbc:postgresql://<host>:<port>/<database>"
var table_name = "<table name>"
var driver = "org.postgresql.Driver" // Do not change
var user = "<username>"
var pass = "<password>"
​
// Set Properties
properties.setProperty("driver", driver);
properties.setProperty("user", user);
properties.setProperty("password", pass);
​
// If the DataFrame contains the Primary Key field (in this case 'id'), must drop this before calling 'df.write' to avoid primary key collision
var data = df.drop("id")
​
// Write SparkSQL DataFrame to PostgreSQL table
data.write.mode("append").jdbc(url, table_name, properties)
This method does not use the Zepl data source our data source. Please remove all PostgreSQL data sources from the notebook.

Please follow guidance from the Python tab

%pyspark
# Fill in your variables, replace <variablename>
url = "jdbc:postgresql://<host>:<port>/<database>"
table_name = "<table name>"
driver = "org.postgresql.Driver" # Do not change
user = "<username>"
password = "<password>"
​
# If the DataFrame contains the Primary Key field (in this case 'id'), must drop this before calling 'df.write' to avoid primary key collision
data = df.drop("id")
​
data.write.format('jdbc').options(
url=url,
driver=driver,
dbtable=table_name,
user=user,
password=password)\
.mode('append')\
.save()
TODO:

In your PostgreSQL database be sure that these items are configured properly:
  • PostgreSQL should be accessible to Zepl's whitelisted IP addresses
  • The user associated with your user name and password have access to the desired databases and tables
  • User should have read and write permissions on the desired databases and tables
Copy link
On this page
Create an PostgreSQL Datasource
Data Source Inputs
Use PostgreSQL
Read
Write
Open examples in Zepl
Configure Authentication in PostgreSQL