Zepl
Search…
Google BigQuery

  1. 1.
    Navigate to: Resources > Data Sources
  2. 2.
    Select the BigQuery Icon:

  • Name (required)
  • Google Cloud Project ID (required): Retrieve this from your GCP console
  • Credential Type: Select Shared This is the recommended type by Zepl, to reduce the number of users that require access to the service account key See Data Source Security
  • Query Priority: For more information, see Google's Documentation​
    • Interactive: This is the default value and will count towards your concurrent rate limit and your daily limit​
    • Batch: This method queues each batch query on your behalf, and starts the query as soon as idle resources are available in the BigQuery shared resource pool.
    ​
  • Service account key (JSON):
    • Copy the entire JSON key located in the service account credential file downloaded from BigQuery.
    • See Configure Authentication for more details

Python
SQL
%python
from google.cloud import bigquery
import pandas as pd
​
# Construct a BigQuery client object.
client = z.getDatasource("BigQuery_Sample")
​
# Create Query
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20
"""
​
# Execute Query and return Results
query_job = client.query(query).result()
​
# Convert query result object into a pandas dataframe
df = query_job.to_dataframe()
​
z.show(df)
Use z.getDatasource() to return BigQuery client object: google.cloud.bigquery.client.Client​
// TODO: After Robert completes this

Python
SQL

%python
import pandas as pd
​
# Create data frame
data = {"names": ["Edward Norton", "Nicholas Cage", "Sean Connery", "Ed Harris"],
"Row1": [5, 6, 7, 8],
"Row2": [9, 10, 11, 12],
"Row3": [13, 14, 15, 16],
"Row4": [17, 18, 19, 20],
"Row5": [21, 22, 23, 24]}
​
# Create Pandas DataFrame
df = pd.DataFrame.from_dict(data)

%python
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
​
# Create BigQuery Client Object
client = z.getDatasource("BigQuery_Sample")
​
# Get Project ID
project_id = client.project
​
# Create BigQuery Dataset Reference Name (ID)
dataset_id = "write_dataset4"
dataset_id_full = "{}.{}".format(project_id, dataset_id)
​
# Create BigQuery Dataset Object - Check if it already exists
dataset = bigquery.Dataset(dataset_id_full)
try:
client.get_dataset(dataset_id) # Make an API request.
print("Dataset {} already exists".format(dataset_id))
except NotFound:
# Create the new BigQuery dataset.
dataset = client.create_dataset(dataset)
​
​
# Create BigQuery Table Reference Name (ID)
table_id = "write_df_to_table"
table_id_full = "{}.{}.{}".format(project_id, dataset_id, table_id)
​
# Create BigQuery Table Object
table_ref = dataset.table(table_id)
​
# Configure the query job.
job_config = bigquery.QueryJobConfig()
​
# Set the destination to write to the table object created above
job_config.destination = table_ref
​
# Create LoadJobConfig to specify the table schema and write characteristics
job_config = bigquery.LoadJobConfig(
# Specify a (partial) schema. All columns are always written to the table. The schema is used to assist in data type definitions
schema=[
# Specify the type of columns whose type cannot be auto-detected. For example the "title" column uses pandas dtype "object", so its data type is ambiguous
bigquery.SchemaField("names", bigquery.enums.SqlTypeNames.STRING)
],
# Optionally, set the write disposition. BigQuery appends loaded rows to an existing table by default, but with WRITE_TRUNCATE write disposition it replaces the table with the loaded data
write_disposition="WRITE_TRUNCATE",
)

%python
# Execute write
job = client.load_table_from_dataframe(
df, table_id_full, job_config=job_config
)
job.result()
​
# Display results jf loaded succesfully
table = client.get_table(table_id_full)
print( "Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id_full))
//TODO: After Robert completes
Documentation:
If support is needed for Scala, PySpark, and R, please contact [email protected]​

  1. 1.
    Login to Google Cloud Platform
  2. 2.
    Select or Create a Project
  3. 3.
    If not already enabled, select "Enable APIs and Services". Enable the BigQuery APIs for this project

  1. 1.
    Navigate to "APIs & Services" > Credentials
  2. 2.
    Create Credentials > Service Account
  3. 3.
    Service account details > Name (can be changed later): "zepl-bigquery-service-account"
  4. 4.
    Grant this service account access to project (optional) > Role: Scroll down to BigQuery and select the desired Roles to grant access to
  5. 5.
    Grant User (optional) ...
  6. 6.
    Edit the service account just created > Select Add Key > Create new Key > JSON > Create This is the key that will be entered into Zepl's BigQuery Data Source. DO NOT LOSE THIS KEY.
  7. 7.
    Save
Copy link
On this page
Create an BigQuery Datasource
Data Source Inputs
Use BigQuery
Read
Write
Open all examples in Zepl
Configure Authentication