Navigate to: Resources > Data Sources
Select the BigQuery Icon:
Name (required)
Google Cloud Project ID (required): Retrieve this from your GCP console
Credential Type: Select Shared This is the recommended type by Zepl, to reduce the number of users that require access to the service account key See Data Source Security
Query Priority: For more information, see Google's Documentation
Interactive: This is the default value and will count towards your concurrent rate limit and your daily limit
Batch: This method queues each batch query on your behalf, and starts the query as soon as idle resources are available in the BigQuery shared resource pool. This
Service account key (JSON):
Copy the entire JSON key located in the service account credential file downloaded from BigQuery.
See Configure Authentication for more details
%pythonfrom google.cloud import bigqueryimport pandas as pd# Construct a BigQuery client object.client = z.getDatasource("BigQuery_Sample")# Create Queryquery = """SELECT name, SUM(number) as total_peopleFROM `bigquery-public-data.usa_names.usa_1910_2013`WHERE state = 'TX'GROUP BY name, stateORDER BY total_people DESCLIMIT 20"""# Execute Query and return Resultsquery_job = client.query(query).result()# Convert query result object into a pandas dataframedf = query_job.to_dataframe()z.show(df)
Use z.getDatasource()
to return BigQuery client object: google.cloud.bigquery.client.Client
// TODO: After Robert completes this
%pythonimport pandas as pd# Create data framedata = {"names": ["Edward Norton", "Nicholas Cage", "Sean Connery", "Ed Harris"],"Row1": [5, 6, 7, 8],"Row2": [9, 10, 11, 12],"Row3": [13, 14, 15, 16],"Row4": [17, 18, 19, 20],"Row5": [21, 22, 23, 24]}# Create Pandas DataFramedf = pd.DataFrame.from_dict(data)
%pythonfrom google.cloud import bigqueryfrom google.cloud.exceptions import NotFound# Create BigQuery Client Objectclient = z.getDatasource("BigQuery_Sample")# Get Project IDproject_id = client.project# Create BigQuery Dataset Reference Name (ID)dataset_id = "write_dataset4"dataset_id_full = "{}.{}".format(project_id, dataset_id)# Create BigQuery Dataset Object - Check if it already existsdataset = bigquery.Dataset(dataset_id_full)try:client.get_dataset(dataset_id) # Make an API request.print("Dataset {} already exists".format(dataset_id))except NotFound:# Create the new BigQuery dataset.dataset = client.create_dataset(dataset)# Create BigQuery Table Reference Name (ID)table_id = "write_df_to_table"table_id_full = "{}.{}.{}".format(project_id, dataset_id, table_id)# Create BigQuery Table Objecttable_ref = dataset.table(table_id)# Configure the query job.job_config = bigquery.QueryJobConfig()# Set the destination to write to the table object created abovejob_config.destination = table_ref# Create LoadJobConfig to specify the table schema and write characteristicsjob_config = bigquery.LoadJobConfig(# Specify a (partial) schema. All columns are always written to the table. The schema is used to assist in data type definitionsschema=[# Specify the type of columns whose type cannot be auto-detected. For example the "title" column uses pandas dtype "object", so its data type is ambiguousbigquery.SchemaField("names", bigquery.enums.SqlTypeNames.STRING)],# Optionally, set the write disposition. BigQuery appends loaded rows to an existing table by default, but with WRITE_TRUNCATE write disposition it replaces the table with the loaded datawrite_disposition="WRITE_TRUNCATE",)
%python# Execute writejob = client.load_table_from_dataframe(df, table_id_full, job_config=job_config)job.result()# Display results jf loaded succesfullytable = client.get_table(table_id_full)print( "Loaded {} rows and {} columns to {}".format(table.num_rows, len(table.schema), table_id_full))
//TODO: After Robert completes
Documentation:
If support is needed for Scala, PySpark, and R, please contact [email protected]
Login to Google Cloud Platform
Select or Create a Project
If not already enabled, select "Enable APIs and Services". Enable the BigQuery APIs for this project
Navigate to "APIs & Services" > Credentials
Create Credentials > Service Account
Service account details > Name (can be changed later): "zepl-bigquery-service-account"
Grant this service account access to project (optional) > Role: Scroll down to BigQuery and select the desired Roles to grant access to
Grant User (optional) ...
Edit the service account just created > Select Add Key > Create new Key > JSON > Create This is the key that will be entered into Zepl's BigQuery Data Source. DO NOT LOSE THIS KEY.
Save