DataRobot supports several client libraries to make it easy to connect your Python and R code with your DR account. In order to connect Zepl with DataRobot, you will need to check if desired library is installed and bring your DR API Key to Zepl.
What libraries do I need?
Place this code at the beginning of your notebook and ensure it runs every time your container starts up.
%python!pip install datarobot
Follow these steps to access your API key in your DataRobot account: https://community.datarobot.com/t5/resources/where-can-i-find-my-api-key/ta-p/4648
There are two ways to bring your DataRobot API key to Zepl:
Paste your DataRobot API key in your notebook code
(Recommended) Create and attach a secret store data source
Your secret store will look something like this:
To use Zepl with DataRobot, you first need to establish a connection between your machine and the DataRobot instance. The fastest way to do that is by pasting your DataRobot API key as a string in your code. If you want to do this in a more secure way, use Zepl's secret store data source.
%pythonimport datarobot as dr# Enter your API Key here or use the secure secret store method belowdr.Client(token='addyourAPIkey' , endpoint='https://app.datarobot.com/api/v2')# Uncomment to use the secret store to securely access your API key in Zepl. Follow the documentation here: https://new-docs.zepl.com/docs/connect-to-data/secret-store# token = z.getDatasource("datarobot_api")['token']# dr.Client(token=token , endpoint='https://app.datarobot.com/api/v2')
%r# import librarylibrary(datarobot)# Enter your API Key here or use the secure secret store method belowdatarobot::ConnectToDataRobot(token ='addyourAPIkey', endpoint = 'https://app.datarobot.com/api/v2'))# Uncomment to use the secret store to securely access your API key in Zepl. Follow the documentation here: https://new-docs.zepl.com/docs/connect-to-data/secret-store# token <- z.getDatasource("datarobot_api")[["token"]]# datarobot::ConnectToDataRobot(token = token, endpoint = 'https://app.datarobot.com/api/v2')
For Classification, Regression and Multiclass Classification, the process of starting a project (and modeling) is very straightforward. All you have to do is use the
If you open a new window and log in to your DataRobot account, you can watch the project start up. This might take some time, depending on the number of workers available.
%python#I can link directly to my data (file, url) or I can also pass a pandas dataframe to the sourcedata variableurl_to_data = "https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv"# Start project with all available workers (worker_count = -1)project = dr.Project.start(sourcedata = url_to_data,project_name = '00_Zepl_Starter_NB_Python',target = 'readmitted',worker_count = -1)# Force our Python Kernel to wait until DataRobot has finished modeling before executing the next series of commands.project.wait_for_autopilot()
%r# I can link directly to my data (file, url) or I can also pass a dataframe to the dataSource variableurl_to_data <- "https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv"# Start project with all available workers (worker_count = -1)project <- StartProject(dataSource = url_to_data,projectName = '00_Zepl_Starter_NB_R',target = 'readmitted',workerCount = -1)# Force our R Kernel to wait until DataRobot has finished modeling before executing the next series of commands.WaitForAutopilot(project = project)
If you wish to deploy a model, all you have to do is use the
Deployment.create_from_learning_model method. You also need to have the prediction server that you want to host this deployment. Available prediction servers can be retrieved using the
%python# Get list of prediction serversprediction_server = dr.PredictionServer.list()# Create a deploymentdeployment = dr.Deployment.create_from_learning_model(most_accurate_model.id, label='New Deployment', description='A new deployment',default_prediction_server_id=prediction_server.id)# Verify deployment was created succesfullydeployment
%r# Get list of prediction serversprediction_server <- ListPredictionServers()[]# Create a deploymentdeployment <- CreateDeployment(model = most_accurate_model,label = 'New Deployment (R)',description = 'A new deployment',defaultPredictionServerId = prediction_server$id)# Verify deployment was created succesfullydeployment
Now that we have deployed the model let's score using DataRobot’s
Batch Prediction API. Note that there are multiple ways to score data and this is just one of them.
%python# Create dataframescoring = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes_scoring.csv', nrows=100)# Write dataframe to CSV on the Zepl containerscoring.to_csv('scoring.csv',index=False)# Score predictions and output a results in a new file named, predicted.csvdr.BatchPredictionJob.score_to_file(deployment.id,'scoring.csv','./predicted.csv')