DataRobot

Connect Zepl to DataRobot

DataRobot supports several client libraries to make it easy to connect your Python and R code with your DR account. In order to connect Zepl with DataRobot, you will need to check if desired library is installed and bring your DR API Key to Zepl.

Install client libraries

What libraries do I need?

Place this code at the beginning of your notebook and ensure it runs every time your container starts up.

Python
R
Python
%python
!pip install datarobot
R
%r
install.packages("datarobot")

Authorize Zepl to access DataRobot

In your DataRobot account:

Follow these steps to access your API key in your DataRobot account: https://community.datarobot.com/t5/resources/where-can-i-find-my-api-key/ta-p/4648​

In your Zepl Org:

There are two ways to bring your DataRobot API key to Zepl:

  1. Paste your DataRobot API key in your notebook code

  2. (Recommended) Create and attach a secret store data source​

Your secret store will look something like this:

Getting Started Notebook

Open in Zepl

Connect to DataRobot

To use Zepl with DataRobot, you first need to establish a connection between your machine and the DataRobot instance. The fastest way to do that is by pasting your DataRobot API key as a string in your code. If you want to do this in a more secure way, use Zepl's secret store data source.

Python
R
Python
%python
import datarobot as dr
​
# Enter your API Key here or use the secure secret store method below
dr.Client(token='addyourAPIkey' , endpoint='https://app.datarobot.com/api/v2')
​
# Uncomment to use the secret store to securely access your API key in Zepl. Follow the documentation here: https://new-docs.zepl.com/docs/connect-to-data/secret-store
# token = z.getDatasource("datarobot_api")['token']
# dr.Client(token=token , endpoint='https://app.datarobot.com/api/v2')
R
%r
# import library
library(datarobot)
​
# Enter your API Key here or use the secure secret store method below
datarobot::ConnectToDataRobot(token ='addyourAPIkey', endpoint = 'https://app.datarobot.com/api/v2'))
​
# Uncomment to use the secret store to securely access your API key in Zepl. Follow the documentation here: https://new-docs.zepl.com/docs/connect-to-data/secret-store
# token <- z.getDatasource("datarobot_api")[["token"]]
# datarobot::ConnectToDataRobot(token = token, endpoint = 'https://app.datarobot.com/api/v2')

Creating a Project

For Classification, Regression and Multiclass Classification, the process of starting a project (and modeling) is very straightforward. All you have to do is use the datarobot.Project.start method.

If you open a new window and log in to your DataRobot account, you can watch the project start up. This might take some time, depending on the number of workers available.

Python
R
Python
%python
#I can link directly to my data (file, url) or I can also pass a pandas dataframe to the sourcedata variable
url_to_data = "https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv"
​
# Start project with all available workers (worker_count = -1)
project = dr.Project.start(sourcedata = url_to_data,
project_name = '00_Zepl_Starter_NB_Python',
target = 'readmitted',
worker_count = -1)
​
# Force our Python Kernel to wait until DataRobot has finished modeling before executing the next series of commands.
project.wait_for_autopilot()
R
%r
# I can link directly to my data (file, url) or I can also pass a dataframe to the dataSource variable
url_to_data <- "https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv"
​
# Start project with all available workers (worker_count = -1)
project <- StartProject(dataSource = url_to_data,
projectName = '00_Zepl_Starter_NB_R',
target = 'readmitted',
workerCount = -1)
​
# Force our R Kernel to wait until DataRobot has finished modeling before executing the next series of commands.
WaitForAutopilot(project = project)

Model deployment

If you wish to deploy a model, all you have to do is use the Deployment.create_from_learning_model method. You also need to have the prediction server that you want to host this deployment. Available prediction servers can be retrieved using the PredictionServer.list method.

Python
R
Python
%python
# Get list of prediction servers
prediction_server = dr.PredictionServer.list()[0]
​
# Create a deployment
deployment = dr.Deployment.create_from_learning_model(
most_accurate_model.id, label='New Deployment', description='A new deployment',
default_prediction_server_id=prediction_server.id)
​
# Verify deployment was created succesfully
deployment
R
%r
# Get list of prediction servers
prediction_server <- ListPredictionServers()[[1]]
​
# Create a deployment
deployment <- CreateDeployment(model = most_accurate_model,
label = 'New Deployment (R)',
description = 'A new deployment',
defaultPredictionServerId = prediction_server$id)
​
# Verify deployment was created succesfully
deployment

Model scoring

Now that we have deployed the model let's score using DataRobot’s Batch Prediction API. Note that there are multiple ways to score data and this is just one of them.

Python
Python
%python
# Create dataframe
scoring = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes_scoring.csv', nrows=100)
​
# Write dataframe to CSV on the Zepl container
scoring.to_csv('scoring.csv',index=False)
​
# Score predictions and output a results in a new file named, predicted.csv
dr.BatchPredictionJob.score_to_file(
deployment.id,
'scoring.csv',
'./predicted.csv')

Additional Documentation: