Zepl
Search…
PostgreSQL

Create an PostgreSQL Datasource

  1. 1.
    Navigate to: Resources > Data Sources
  2. 2.
    Select the PostgreSQL Icon:

Data Source Inputs

  • Name (required)
  • Host (required):
    • Enter the hostname of your PostgreSQL database.
    • Example (using PostgreSQL in AWS RDS): postgresql-zepl-docs.si89zol2ksop.us-west-1.rds.amazonaws.com
  • Port (required): 5432 is the default port for PostgreSQL
  • Database (optional):
    • If this is filled in, Zepl will append the database name to your connection string.
    • If this is blank, the connection may use the default database set by your administrator
    • Example:postgresql-zepl-docs.si89zol2ksop.us-west-1.rds.amazonaws.com
      /<database name>
  • Credential Type: See Data Source Security​
  • Username (required - can be entered later): Username used for connecting to PostgreSQL
  • Password (required - can be entered later): Password used for connecting to PostgreSQL

Use PostgreSQL

Read

Python
Scala
PySpark
R
SQL
1
%python
2
import pandas as pd
3
​
4
#Open a connection and create a cursor object to perform database operations
5
conn = z.getDatasource("zepl_docs_postgresql")
6
cur = conn.cursor()
7
​
8
### Option 1: Read into Python Pandas DataFrame object ###
9
df = pd.read_sql_query("SELECT * FROM zepl_doc;", conn)
10
​
11
### Option 2: Read into Python list object ###
12
cur.execute("SELECT * FROM zepl_doc;")
13
res = cur.fetchall()
14
​
15
# Display Results from dataframe (df) or list object (res)
16
z.show(df)
17
​
18
# Make the changes to the database persistent
19
conn.commit()
20
​
21
# Close communication with the database
22
cur.close()
23
conn.close()
Copied!
Use z.getDatasource() to return PostgreSQL connection object: psycopg2.extensions.connection​
1
%spark
2
// Fill in your variables, replace <variablename>
3
var url = "jdbc:postgresql://<host>:<port>/<database>"
4
var table_name = "<table name>"
5
var driver = "org.postgresql.Driver" // Do not change
6
var user = "<username>"
7
var pass = "<password>"
8
​
9
// Create JDBC connection to PostgreSQL using spark.read operations
10
val df = spark.read
11
.format("jdbc")
12
.option("driver", driver)
13
.option("url", url)
14
.option("dbtable", table_name)
15
.option("user", user)
16
.option("password", pass)
17
.load()
18
​
19
// Display DataFrame
20
z.show(df)
Copied!
This method does not use the Zepl data source. Please remove all PostgreSQL data sources attached to the notebook

Option 1: Use Zepl's Data Source

Please follow guidance from the Python tab

Option 2: Using SparkSQL

1
%pyspark
2
url = "jdbc:mysql://<hostname>:<port>/<database_name>"
3
table_name = "<table_name>"
4
driver = "com.mysql.jdbc.Driver" # Do not change
5
user = "<username>"
6
password = "<password>"
7
​
8
df.write.format('jdbc').options(
9
url=url,
10
driver=driver,
11
dbtable=table_name,
12
user=user,
13
password=password)\
14
.mode('append')\
15
.save()
Copied!
1
TODO:
Copied!
1
%datasource.<data source name here>
2
​
3
# Query database table zepl_doc
4
SELECT * FROM zepl_doc
Copied!

Write

Python
Scala
PySpark
R
1
%python
2
​
3
#Open a connection and create a cursor object to perform database operations
4
conn = z.getDatasource("zepl_docs_postgresql")
5
cur = conn.cursor()
6
​
7
# Create table called zepl_doc if not exists
8
cur.execute("CREATE TABLE if not exists zepl_doc (id serial PRIMARY KEY, num integer, data varchar);")
9
​
10
# Insert data (100, "xyz") into the table zepl_doc
11
cur.execute("INSERT INTO zepl_doc (num, data) VALUES (%s, %s)", (100, "xyz"))
12
​
13
# Persist changes to the database
14
conn.commit()
15
​
16
# Close communication with the database
17
cur.close()
18
conn.close()
Copied!
1
%spark
2
import java.util.Properties
3
​
4
// Create Properties object for write function
5
val properties = new Properties();
6
​
7
// Create inputs
8
var url = "jdbc:postgresql://<host>:<port>/<database>"
9
var table_name = "<table name>"
10
var driver = "org.postgresql.Driver" // Do not change
11
var user = "<username>"
12
var pass = "<password>"
13
​
14
// Set Properties
15
properties.setProperty("driver", driver);
16
properties.setProperty("user", user);
17
properties.setProperty("password", pass);
18
​
19
// If the DataFrame contains the Primary Key field (in this case 'id'), must drop this before calling 'df.write' to avoid primary key collision
20
var data = df.drop("id")
21
​
22
// Write SparkSQL DataFrame to PostgreSQL table
23
data.write.mode("append").jdbc(url, table_name, properties)
Copied!
This method does not use the Zepl data source our data source. Please remove all PostgreSQL data sources from the notebook.

Option 1: Use Zepl's Data Source

Please follow guidance from the Python tab

Option 2: Using SparkSQL

1
%pyspark
2
# Fill in your variables, replace <variablename>
3
url = "jdbc:postgresql://<host>:<port>/<database>"
4
table_name = "<table name>"
5
driver = "org.postgresql.Driver" # Do not change
6
user = "<username>"
7
password = "<password>"
8
​
9
# If the DataFrame contains the Primary Key field (in this case 'id'), must drop this before calling 'df.write' to avoid primary key collision
10
data = df.drop("id")
11
​
12
data.write.format('jdbc').options(
13
url=url,
14
driver=driver,
15
dbtable=table_name,
16
user=user,
17
password=password)\
18
.mode('append')\
19
.save()
Copied!
1
TODO:
Copied!

Configure Authentication in PostgreSQL

In your PostgreSQL database be sure that these items are configured properly:
  • PostgreSQL should be accessible to Zepl's whitelisted IP addresses
  • The user associated with your user name and password have access to the desired databases and tables
  • User should have read and write permissions on the desired databases and tables
Last modified 1yr ago