Google BigQuery
- 1.Navigate to: Resources > Data Sources
- 2.Select the BigQuery Icon:
.png?alt=media&token=cf9dbc16-c89c-49af-a198-aab40583867e)
- Name (required)
- Google Cloud Project ID (required): Retrieve this from your GCP console
- Credential Type: Select Shared This is the recommended type by Zepl, to reduce the number of users that require access to the service account key See Data Source Security
- Interactive: This is the default value and will count towards your concurrent rate limit and your daily limit
- Batch: This method queues each batch query on your behalf, and starts the query as soon as idle resources are available in the BigQuery shared resource pool.
- Service account key (JSON):
- Copy the entire JSON key located in the service account credential file downloaded from BigQuery.
Python
SQL
%python
from google.cloud import bigquery
import pandas as pd
# Construct a BigQuery client object.
client = z.getDatasource("BigQuery_Sample")
# Create Query
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20
"""
# Execute Query and return Results
query_job = client.query(query).result()
# Convert query result object into a pandas dataframe
df = query_job.to_dataframe()
z.show(df)
// TODO: After Robert completes this
Python
SQL
%python
import pandas as pd
# Create data frame
data = {"names": ["Edward Norton", "Nicholas Cage", "Sean Connery", "Ed Harris"],
"Row1": [5, 6, 7, 8],
"Row2": [9, 10, 11, 12],
"Row3": [13, 14, 15, 16],
"Row4": [17, 18, 19, 20],
"Row5": [21, 22, 23, 24]}
# Create Pandas DataFrame
df = pd.DataFrame.from_dict(data)
%python
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
# Create BigQuery Client Object
client = z.getDatasource("BigQuery_Sample")
# Get Project ID
project_id = client.project
# Create BigQuery Dataset Reference Name (ID)
dataset_id = "write_dataset4"
dataset_id_full = "{}.{}".format(project_id, dataset_id)
# Create BigQuery Dataset Object - Check if it already exists
dataset = bigquery.Dataset(dataset_id_full)
try:
client.get_dataset(dataset_id) # Make an API request.
print("Dataset {} already exists".format(dataset_id))
except NotFound:
# Create the new BigQuery dataset.
dataset = client.create_dataset(dataset)
# Create BigQuery Table Reference Name (ID)
table_id = "write_df_to_table"
table_id_full = "{}.{}.{}".format(project_id, dataset_id, table_id)
# Create BigQuery Table Object
table_ref = dataset.table(table_id)
# Configure the query job.
job_config = bigquery.QueryJobConfig()
# Set the destination to write to the table object created above
job_config.destination = table_ref
# Create LoadJobConfig to specify the table schema and write characteristics
job_config = bigquery.LoadJobConfig(
# Specify a (partial) schema. All columns are always written to the table. The schema is used to assist in data type definitions
schema=[
# Specify the type of columns whose type cannot be auto-detected. For example the "title" column uses pandas dtype "object", so its data type is ambiguous
bigquery.SchemaField("names", bigquery.enums.SqlTypeNames.STRING)
],
# Optionally, set the write disposition. BigQuery appends loaded rows to an existing table by default, but with WRITE_TRUNCATE write disposition it replaces the table with the loaded data
write_disposition="WRITE_TRUNCATE",
)
%python
# Execute write
job = client.load_table_from_dataframe(
df, table_id_full, job_config=job_config
)
job.result()
# Display results jf loaded succesfully
table = client.get_table(table_id_full)
print( "Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id_full))
//TODO: After Robert completes
Documentation:
- 1.Login to Google Cloud Platform
- 2.Select or Create a Project
- 3.If not already enabled, select "Enable APIs and Services". Enable the BigQuery APIs for this project
- 1.Navigate to "APIs & Services" > Credentials
- 2.Create Credentials > Service Account
- 3.Service account details > Name (can be changed later): "zepl-bigquery-service-account"
- 4.Grant this service account access to project (optional) > Role: Scroll down to BigQuery and select the desired Roles to grant access to
- 5.Grant User (optional) ...
- 6.Edit the service account just created > Select Add Key > Create new Key > JSON > Create This is the key that will be entered into Zepl's BigQuery Data Source. DO NOT LOSE THIS KEY.
- 7.Save
Last modified 1yr ago