Upload Files

This section describes how to upload data from your own filesystem, download files to a container, and programmatically delete those files when you are done.

Uploading Files

If you have data files on your local machine that you want to analyze with Zepl, you can upload the file by clicking the right menu bar in your notebook and choosing the Upload file button. Uploaded files are only accessible through the notebook in which they were uploaded. To access the same file in a different notebook, the file will need to be uploaded to each notebook separately.

Common file formats uploaded include:

  • .CSV: Used to load small sample data files

  • .PARQUET: Used to upload sample data files

  • .PKL: Bring pre-trained models to your Zepl notebooks

Zepl supports files up to 25MB in size. Each notebook may not exceed 100MB in total

Accessing Data

Once the file is uploaded to the notebook, you can access the file through the following URL (where <file-name> is the name of the file):

http://zdata/<file-name>

Direct Access

Use the examples below to load you data directly from the file system endpoint (URL) where the data is located. This method will directly load data into into a data object using the language of your choice.

Python
Scala
PySpark
R
SparkR
Python
%python
​
import pandas as pd
pandas_df = pd.read_csv('http://zdata/titanic3.csv', sep=';', header='infer')
Scala
%spark
​
import org.apache.spark.SparkFiles
​
sc.addFile("http://zdata/titanic3.csv")
val sparkDF = spark.read.format("csv")
.option("delimiter", ";")
.option("header", "true")
.option("inferSchema", "true")
.load(SparkFiles.get("titanic3.csv"))
PySpark
%spark.pyspark
​
from pyspark import SparkFiles
​
sc.addFile('http://zdata/titanic3.csv')
sparkDF = spark.read.format('csv').options(delimiter=';', header='true', inferSchema='true').load(SparkFiles.get('titanic3.csv'))
R
%r
table <- read.table("http://zdata/titanic3.csv", header = TRUE, sep = ",", dec = ".")
SparkR
%spark.r
​
spark.addFile("http://zdata/titanic3.csv")
sparkDF <- read.df(path = spark.getSparkFiles("bank.csv"), source = "csv", delimiter = ";", header = "true", inferSchema = "true")

Load file to container filesystem

Zepl allows users to download data files directly to container's file system. Often this method is used when the original file needs to be modified after your notebook has executed.

1. Load file

%python
!wget http://zdata/<file-name>

2. (optional) List loaded data

%python
!ls <file-name>

3. Load data into a data object with the language of your choice

Python
Scala
PySpark
R
SparkR
Python
%python
​
import pandas as pd
pandas_df = pd.read_csv('titanic3.csv', sep=';', header='infer')
Scala
%spark
​
import org.apache.spark.SparkFiles
​
val sparkDF = spark.read.format("csv")
.option("delimiter", ";")
.option("header", "true")
.option("inferSchema", "true")
.load("titanic3.csv")
PySpark
%spark.pyspark
​
from pyspark import SparkFiles
​
sparkDF = spark.read.format('csv').options(delimiter=';', header='true', inferSchema='true').load('titanic3.csv')
R
%r
​
table <- read.table("titanic3.csv", header = TRUE, sep = ",", dec = ".")
SparkR
%spark.r
​
sparkDF <- read.df(path = "titanic3.csv", source = "csv", delimiter = ";", header = "true", inferSchema = "true")

Editing Data

You cannot edit data directly within Zepl, but you can overwrite the data file by uploading a file with the same name.

Overwritten data cannot be recovered.

Deleting Data

To delete data, click the red "x" button next to the data file in the Files tab in your notebook.

Deleted data cannot be recovered.