Zepl
Search…
Google BigQuery

Create an BigQuery Datasource

  1. 1.
    Navigate to: Resources > Data Sources
  2. 2.
    Select the BigQuery Icon:

Data Source Inputs

  • Name (required)
  • Google Cloud Project ID (required): Retrieve this from your GCP console
  • Credential Type: Select Shared This is the recommended type by Zepl, to reduce the number of users that require access to the service account key See Data Source Security
  • Query Priority: For more information, see Google's Documentation​
    • Interactive: This is the default value and will count towards your concurrent rate limit and your daily limit​
    • Batch: This method queues each batch query on your behalf, and starts the query as soon as idle resources are available in the BigQuery shared resource pool.
    ​
  • Service account key (JSON):
    • Copy the entire JSON key located in the service account credential file downloaded from BigQuery.
    • See Configure Authentication for more details

Use BigQuery

Read

Python
SQL
1
%python
2
from google.cloud import bigquery
3
import pandas as pd
4
​
5
# Construct a BigQuery client object.
6
client = z.getDatasource("BigQuery_Sample")
7
​
8
# Create Query
9
query = """
10
SELECT name, SUM(number) as total_people
11
FROM `bigquery-public-data.usa_names.usa_1910_2013`
12
WHERE state = 'TX'
13
GROUP BY name, state
14
ORDER BY total_people DESC
15
LIMIT 20
16
"""
17
​
18
# Execute Query and return Results
19
query_job = client.query(query).result()
20
​
21
# Convert query result object into a pandas dataframe
22
df = query_job.to_dataframe()
23
​
24
z.show(df)
Copied!
Use z.getDatasource() to return BigQuery client object: google.cloud.bigquery.client.Client​
1
// TODO: After Robert completes this
Copied!

Write

Python
SQL

Create Sample Pandas DataFrame for Writing

1
%python
2
import pandas as pd
3
​
4
# Create data frame
5
data = {"names": ["Edward Norton", "Nicholas Cage", "Sean Connery", "Ed Harris"],
6
"Row1": [5, 6, 7, 8],
7
"Row2": [9, 10, 11, 12],
8
"Row3": [13, 14, 15, 16],
9
"Row4": [17, 18, 19, 20],
10
"Row5": [21, 22, 23, 24]}
11
​
12
# Create Pandas DataFrame
13
df = pd.DataFrame.from_dict(data)
Copied!

Create Client, Dataset, Table, and LoadJobConfig

1
%python
2
from google.cloud import bigquery
3
from google.cloud.exceptions import NotFound
4
​
5
# Create BigQuery Client Object
6
client = z.getDatasource("BigQuery_Sample")
7
​
8
# Get Project ID
9
project_id = client.project
10
​
11
# Create BigQuery Dataset Reference Name (ID)
12
dataset_id = "write_dataset4"
13
dataset_id_full = "{}.{}".format(project_id, dataset_id)
14
​
15
# Create BigQuery Dataset Object - Check if it already exists
16
dataset = bigquery.Dataset(dataset_id_full)
17
try:
18
client.get_dataset(dataset_id) # Make an API request.
19
print("Dataset {} already exists".format(dataset_id))
20
except NotFound:
21
# Create the new BigQuery dataset.
22
dataset = client.create_dataset(dataset)
23
​
24
​
25
# Create BigQuery Table Reference Name (ID)
26
table_id = "write_df_to_table"
27
table_id_full = "{}.{}.{}".format(project_id, dataset_id, table_id)
28
​
29
# Create BigQuery Table Object
30
table_ref = dataset.table(table_id)
31
​
32
# Configure the query job.
33
job_config = bigquery.QueryJobConfig()
34
​
35
# Set the destination to write to the table object created above
36
job_config.destination = table_ref
37
​
38
# Create LoadJobConfig to specify the table schema and write characteristics
39
job_config = bigquery.LoadJobConfig(
40
# Specify a (partial) schema. All columns are always written to the table. The schema is used to assist in data type definitions
41
schema=[
42
# Specify the type of columns whose type cannot be auto-detected. For example the "title" column uses pandas dtype "object", so its data type is ambiguous
43
bigquery.SchemaField("names", bigquery.enums.SqlTypeNames.STRING)
44
],
45
# Optionally, set the write disposition. BigQuery appends loaded rows to an existing table by default, but with WRITE_TRUNCATE write disposition it replaces the table with the loaded data
46
write_disposition="WRITE_TRUNCATE",
47
)
Copied!

Write DataFrame to Table

1
%python
2
# Execute write
3
job = client.load_table_from_dataframe(
4
df, table_id_full, job_config=job_config
5
)
6
job.result()
7
​
8
# Display results jf loaded succesfully
9
table = client.get_table(table_id_full)
10
print( "Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id_full))
Copied!
1
//TODO: After Robert completes
Copied!
Documentation:
If support is needed for Scala, PySpark, and R, please contact [email protected]​

Configure Authentication

Enable BigQuery API

  1. 1.
    Login to Google Cloud Platform
  2. 2.
    Select or Create a Project
  3. 3.
    If not already enabled, select "Enable APIs and Services". Enable the BigQuery APIs for this project

Create a Service Account Credential for Authentication from Zepl

  1. 1.
    Navigate to "APIs & Services" > Credentials
  2. 2.
    Create Credentials > Service Account
  3. 3.
    Service account details > Name (can be changed later): "zepl-bigquery-service-account"
  4. 4.
    Grant this service account access to project (optional) > Role: Scroll down to BigQuery and select the desired Roles to grant access to
  5. 5.
    Grant User (optional) ...
  6. 6.
    Edit the service account just created > Select Add Key > Create new Key > JSON > Create This is the key that will be entered into Zepl's BigQuery Data Source. DO NOT LOSE THIS KEY.
  7. 7.
    Save
Last modified 1mo ago