Data Visualization

Overview

Zepl offers a UI-driven, language agnostic visualization engine with a one single line of code:z.show(DataFrame). This reduces the number of lines of coded required for all users to easily create charts and graphs.

Supported Charting Types:

  • Table

  • Bar

  • Line

  • Pie

  • Area

  • Line

  • Scatter

  • Heat Map

  • Radar

  • Sankey

  • Plotly Editor: Additional chart types are available through our Plotly editor

Requirements

Calling Built in Plotting Function

  1. Create a tabular object in Python (Pandas DataFrame), Scala (Spark DataFrame), R (list object), or SQL

  2. Use z.show(df) function to render the visualizations options

Visualization Example: OPEN IN ZEPL​

Python
Spark
R
SQL
Python
%python
import pandas as pd
​
df = pd.read_csv("https://s3-datasource-tutorial.s3.amazonaws.com/titanic3.csv")
​
z.show(df)
Spark
%spark
​
val df = spark.read.option("header",true).csv("./titanic3.csv")
​
z.show(df)
R
%r
data <- read.csv("./titanic3.csv")
​
z.show(data)
SQL
%datasource.DATA_SOURCE_NAME
​
SELECT * FROM sample_data_table

Setting Plot Limits

By default, Zepl's UI Editor visualizes the first 1000 data points passed to the z.show() function. For example, if your Pandas DataFrame (df) contains 2000 rows, only the first 1000 rows will be displayed in the visualization editor. This setting can be increased by any user with the Organization Owner Security Policy.

Increase limit

Increasing this setting may cause notebooks to load slower or crash at load time. We recommend not increasing this value beyond 5000.

If you are experiencing slow notebook load times, please contact [email protected]​

  1. Navigate to Resources > Interpreters

  2. Select the interpreter. This limit value is set per interpreter. For example, if the visualizations requiring additional data points are generated using Python, then select %python

  3. In the text field labeled, "Max number of dataframe rows to display", enter the desired number to increase this value to

  4. Select Apply

Plotly Charting Editor

How to access the Plotly Editor

  1. Run a notebook paragraph with this function z.show(df)

  2. Select the last symbol in the charting list called "Plotly Chart"

  3. Select the "Plotly Chart Editor" button ​

Create a Chart

  1. Select the "+ Trace" button and select your trace "Type". This should show a list of charting options:

  2. Select the input values for your charting type. This will be the columns from the DataFrame that was passed into the z.show(df) function.

Create a Transformation

Transformations allow users to Filter, Split, Aggregate, and Sort your data.

  1. Select "+ Transform" and choose your transformation

  2. Next Select the "Target" or "By" value. This variable will be used to Transform the data set in you graph.

  3. Below is an example of using multiple Transformations (Filter and Split) on the same dataset:

For large datasets, Zepl recommends that users transform their data before passing their data to z.show(df).This will result in optimal performance and visualization experience.

Style Options

General:

  1. Defaults: Background color, color scales, and fonts

  2. Title: Set your titles and font type here

  3. Modebar

  4. Size and Margins: Relative Size and Graph position

  5. Interactions: Drag, Click, and Hover drill down behaviors

  6. Meta Text: Pass data to your graph's titles and text outputs

Traces: The "Traces" section is specific to the type of graph selected above. Each trace will provide options specific to its corresponding trace type (Bar, line, etc...)

Axes:

  1. Titles: Set the title for each axis here. This may also be done by clicking the text along each axis on the interactive graph on the right, which reads "Click to enter X axis title"

  2. Range: Set the Range scale (Linear, Log, Date, Categorical, Multicategorical) and range of the axis. This is done automatically by Plotly when rendering the graph. Enable or disable Zoom.

  3. Lines: Hide or Show grid Axis, Grid, or Zero Line

  4. Tick Labels: Position where axis labels appear and set font size/type

  5. Tick Markets: Enable or disable vertical markers for each axis label

  6. Range Slider: Enables horizontal zoom. Best used for time series charts

  7. Spike Lines: Show data spikes

Legend: The "Legend" section allows users to change the size, position, text font and color of the graph legend.

Annotate Options

Text: Add text based overlays on each graph

Shapes: Add colored shape based overlays on each graph

Images: Add images as an overlay on each graph

Zepl & Zeppelin Visualizations

Zepl also offers Zeppelin compatible visualizations (Table, Bar, Pie, Area, Line, and Scatter). Zepl has also expanded the capabilities from the Zeppelin notebook visualizations to support additional chart types, such as Heatmap, Radar, Sankey, and Plotly (see above).

Each Zepl and Zeppelin visualizations contains separate settings for the user to drag, drop, and select the appropriate values for the desired chart. Each chart option will contain two sections with chart specific options in each, Charts and Parameters. To access these settings, fist select a chart (Bar Chart). Then, select the Settings button.

Creating Charts

The Chart section allows users to drag and drop column names from the DataFrame that was passed into the z.show(DataFrame)function. When you expand this section you will see a list of column names and chart specific options.

Below is an example specific to the Bar Chart chart type:

  • Chart specific options

  • Available Columns: Drag and drop values into below boxes

  • XAxis:

    • Specifies variable for x-axis

    • This can have 1 or many values

  • Y-Axis: Select this value to aggregate on a SUM or a COUNT of this column value

    • Specifies variable for y-axis

    • This can have 1 or many values

  • Category (split by): Split the Y-Axis value by this variable

Selecting Parameters

The parameters section contains individual chart parameters. Traditionally these values are configured within your code, however, this gives the user the ability to manipulate these settings through the user interface. These charting options may vary depending on the chart type selected. Below is a continued example from the Bar Chart selected above.

Frequently used parameters:

  • mainTitle: Set the title of this chart

  • xAxisName and yAxisName: Set label for x and y axis

  • xAxisUnit and yAxisUnit: Set a text based label to appear next to x and y axis values

  • colorSet: Set graph color options

  • enableTooltip: Display or hide interactive tooltip

Common Visualizations Libraries

Zepl supports the use of visualization libraries specific to the programming language of your choice. Each paragraph in the Zepl Notebook can render charts and graphs to the paragraph output. Below are several of the most commonly used visualization libraries for Python, Spark, and R.

All examples below can be found here: Open In Zepl​

Matplotlib

Matplotlib v3.1.1 is already installed in Zepl's General Puprose Image. Using matplotlib in the Zepl notebook is the same process as any other notebook or python environment. See the example below for using matplotlib:

Plotting Pandas DataFrame
Plotting Series
Plotting Pandas DataFrame
%python
# Reference Documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
import matplotlib.pyplot as plt
import pandas as pd
​
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()
Plotting Series
%python
# Reference Documentation: https://matplotlib.org/tutorials/introductory/pyplot.html
import matplotlib.pyplot as plt
​
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()

Seaborn

seaborn v0.9.0 is already installed in Zepl's General Puprose Image. Using seaborn in the Zepl notebook is the same process as any other notebook or python environment. See the example below for using seaborn:

Boxplot
Multiple Facets
Boxplot
%python
# Reference Documentation: https://seaborn.pydata.org/examples/grouped_boxplot.html
import seaborn as sns
# Themes are supported in later versions of seaborn (v0.11.0). If required stop your container and update seaborn `!pip install -U seaborn`
# sns.set_theme(style="ticks", palette="pastel")
​
# Load the example tips dataset
tips = sns.load_dataset("tips")
​
# Draw a nested boxplot to show bills by day and time
sns.boxplot(x="day", y="total_bill",
hue="smoker", palette=["m", "g"],
data=tips)
sns.despine(offset=10, trim=True)
Multiple Facets
%python
# Reference Documentation: https://seaborn.pydata.org/examples/many_facets.html
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
​
sns.set_theme(style="ticks")
​
# Create a dataset with many short random walks
rs = np.random.RandomState(4)
pos = rs.randint(-1, 2, (20, 5)).cumsum(axis=1)
pos -= pos[:, 0, np.newaxis]
step = np.tile(range(5), 20)
walk = np.repeat(range(20), 5)
df = pd.DataFrame(np.c_[pos.flat, step, walk],
columns=["position", "step", "walk"])
​
# Initialize a grid of plots with an Axes for each walk
grid = sns.FacetGrid(df, col="walk", hue="walk", palette="tab20c",
col_wrap=4, height=1.5)
​
# Draw a horizontal line to show the starting point
grid.map(plt.axhline, y=0, ls=":", c=".5")
​
# Draw a line plot to show the trajectory of each random walk
grid.map(plt.plot, "step", "position", marker="o")
​
# Adjust the tick positions and labels
grid.set(xticks=np.arange(5), yticks=[-3, 3],
xlim=(-.5, 4.5), ylim=(-3.5, 3.5))
​
# Adjust the arrangement of the plots
grid.fig.tight_layout(w_pad=1)

Plotly

While Zepl's Plotly Charting Editor, is great for building UI driven visuals, there may be a need to programmatically generate plotly charts, themes, and parameters. Plotly v4.2.1 is already installed in Zepl's General Puprose Image.

Example of using Plotly library and displaying figures to a paragraph output:

Plotly Example - Throws Error
Plotly Example - Throws Error
%python
# Reference Documentation: https://plotly.com/python/creating-and-updating-figures/
import plotly.graph_objects as go
​
fig = go.Figure(
data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],
layout=go.Layout(
title=go.layout.Title(text="A Figure Specified By A Graph Object")
)
)
​
fig.show()

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

To resolve this error, add the following function to any notebook that needs to render plotly charts:

%python
import plotly
# Create plot div
def plot(plot_def, **kwargs):
kwargs['output_type'] = 'div'
plot_str = plotly.offline.plot(plot_def, **kwargs)
print('%%angular <div>%s</div>' % plot_str)

Final result would look like this:

Display Plotly Chart in Paragrph Output
Display Plotly Chart in Paragrph Output
%python
# Create plot div
def plot(plot_def, **kwargs):
kwargs['output_type'] = 'div'
plot_str = plotly.offline.plot(plot_def, **kwargs)
print('%%angular <div>%s</div>' % plot_str)
# Reference Documentation: https://plotly.com/python/creating-and-updating-figures/
import plotly.graph_objects as go
​
fig = go.Figure(
data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],
layout=go.Layout(
title=go.layout.Title(text="A Figure Specified By A Graph Object")
)
)
​
plot(fig)

ggplot2

ggplot2 v3.3.2 is already installed in Zepl's General Puprose Image. Using ggplot2 in the Zepl notebook is the same process as any other R environment. See the example below for using ggplot2:

Scatter Plot Example
Scatter Plot Example
%r
#Refernce Documentation: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Jitter%20Plot
​
# load package and data
options(scipen=999) # turn-off scientific notation like 1e+48
library(ggplot2)
theme_set(theme_bw()) # pre-set the bw theme.
data("midwest", package = "ggplot2")
# midwest <- read.csv("http://goo.gl/G1K41K") # bkup data source
​
# Scatterplot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method="loess", se=F) +
xlim(c(0, 0.1)) +
ylim(c(0, 500000)) +
labs(subtitle="Area Vs Population",
y="Population",
x="Area",
title="Scatterplot",
caption = "Source: midwest")
​
plot(gg)

HTML / CSS / JavaScript

Zepl paragraphs can render any HTML, CSS, or JavaScript. This approach is commonly used to pass processed data into flexible front end charting libraries like Highcharts or D3.

Starter Code: How to render HTML / CSS / and JS in any language, use the following starter code:

Python
Scala
PySpark
Untitled
Python
%python
print("""%html ... Your HTML / CSS / JS here ...""")
Scala
%spark
println("""%html ... Your HTML / CSS / JS here ...""")
​
PySpark
%pyspark
print("""%html ... Your HTML / CSS / JS here ...""")
%r
print('%html ... Your HTML / CSS / JS here ... ')

Extended examples:

Highcharts
D3
Highcharts
%python
# Part of the code is from
# http://www.highcharts.com/maps/demo/map-drilldown
print('''%html
<div id="highchart_container" style="height: 500px; min-width: 310px; max-width: 800px; margin: 0 auto"></div>
<h6><a href="https://shop.highsoft.com/highmaps" target="_blank">highmap<a></h6>
<script>
function myChartRendering() {
/*
TODO:
- Check data labels after drilling. Label rank? New positions?
- Not US Mainland text
- Separators
*/
var data = Highcharts.geojson(Highcharts.maps['countries/us/us-all']),
// Some responsiveness
small = $('#highchart_container').width() < 400;
// Set drilldown pointers
$.each(data, function (i) {
this.drilldown = this.properties['hc-key'];
this.value = i; // Non-random bogus data
});
// Instanciate the map
Highcharts.mapChart('highchart_container', {
chart: {
events: {
drilldown: function (e) {
if (!e.seriesOptions) {
var chart = this,
mapKey = 'countries/us/' + e.point.drilldown + '-all',
// Handle error, the timeout is cleared on success
fail = setTimeout(function () {
if (!Highcharts.maps[mapKey]) {
chart.showLoading('<i class="icon-frown"></i> Failed loading ' + e.point.name);
fail = setTimeout(function () {
chart.hideLoading();
}, 1000);
}
}, 3000);
// Show the spinner
chart.showLoading('<i class="icon-spinner icon-spin icon-3x"></i>'); // Font Awesome spinner
// Load the drilldown map
$.getScript('https://code.highcharts.com/mapdata/' + mapKey + '.js', function () {
data = Highcharts.geojson(Highcharts.maps[mapKey]);
// Set a non-random bogus value
$.each(data, function (i) {
this.value = i;
});
// Hide loading and add series
chart.hideLoading();
clearTimeout(fail);
chart.addSeriesAsDrilldown(e.point, {
name: e.point.name,
data: data,
dataLabels: {
enabled: true,
format: '{point.name}'
}
});
});
}
this.setTitle(null, { text: e.point.name });
},
drillup: function () {
this.setTitle(null, { text: 'USA' });
}
}
},
title: {
text: 'Highcharts Map Drilldown'
},
subtitle: {
text: 'USA',
floating: true,
align: 'right',
y: 50,
style: {
fontSize: '16px'
}
},
legend: small ? {} : {
layout: 'vertical',
align: 'right',
verticalAlign: 'middle'
},
colorAxis: {
min: 0,
minColor: '#E6E7E8',
maxColor: '#005645'
},
mapNavigation: {
enabled: true,
buttonOptions: {
verticalAlign: 'bottom'
}
},
plotOptions: {
map: {
states: {
hover: {
color: '#EEDD66'
}
}
}
},
series: [{
data: data,
name: 'USA',
dataLabels: {
enabled: true,
format: '{point.properties.postal-code}'
}
}],
drilldown: {
activeDataLabelStyle: {
color: '#FFFFFF',
textDecoration: 'none',
textOutline: '1px #000000'
},
drillUpButton: {
relativeTo: 'spacingBox',
position: {
x: 0,
y: 60
}
}
}
});
}
jQuery.getScript('//code.jquery.com/jquery-3.1.1.min.js',
function() {
$("#highchart_container").text("loadding... jquery");
jQuery.getScript('//code.highcharts.com/maps/highmaps.js',
function() {
$("#highchart_container").text("loadding... Highcmaps");
jQuery.getScript('//code.highcharts.com/maps/modules/data.js',
function() {
$("#highchart_container").text("loadding... Highcmaps.. modules");
jQuery.getScript('//code.highcharts.com/maps/modules/drilldown.js',
function() {
$("#highchart_container").text("loadding... drilldown");
jQuery.getScript('//code.highcharts.com/mapdata/countries/us/us-all.js',
function() {
$("#highchart_container").text("loadding... us map data...");
jQuery("#highchart_container").ready,
jQuery.Deferred(function( deferred ){
$("#highchart_container").text("loadding... ready...");
jQuery( deferred.resolve );
if (!!Highcharts) {
myChartRendering();
}
})
})
})
})
})
});
</script>
''')
D3
%python
# Reference Documentation: https://www.d3-graph-gallery.com/graph/ridgeline_basic.html
print("""%html
​
<!-- Code from d3-graph-gallery.com -->
<meta charset="utf-8">
​
<!-- Load d3.js -->
<script src="https://d3js.org/d3.v4.js"></script>
​
<!-- Create a div where the graph will take place -->
<div id="my_dataviz"></div>
​
<script>
​
// set the dimensions and margins of the graph
var margin = {top: 60, right: 30, bottom: 20, left:110},
width = 460 - margin.left - margin.right,
height = 400 - margin.top - margin.bottom;
​
// append the svg object to the body of the page
var svg = d3.select("#my_dataviz")
.append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform",
"translate(" + margin.left + "," + margin.top + ")");
​
//read data
d3.csv("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv", function(data) {
​
// Get the different categories and count them
var categories = data.columns
var n = categories.length
​
// Add X axis
var x = d3.scaleLinear()
.domain([-10, 140])
.range([ 0, width ]);
svg.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(x));
​
// Create a Y scale for densities
var y = d3.scaleLinear()
.domain([0, 0.4])
.range([ height, 0]);
​
// Create the Y axis for names
var yName = d3.scaleBand()
.domain(categories)
.range([0, height])
.paddingInner(1)
svg.append("g")
.call(d3.axisLeft(yName));
​
// Compute kernel density estimation for each column:
var kde = kernelDensityEstimator(kernelEpanechnikov(7), x.ticks(40)) // increase this 40 for more accurate density.
var allDensity = []
for (i = 0; i < n; i++) {
key = categories[i]
density = kde( data.map(function(d){ return d[key]; }) )
allDensity.push({key: key, density: density})
}
​
// Add areas
svg.selectAll("areas")
.data(allDensity)
.enter()
.append("path")
.attr("transform", function(d){return("translate(0," + (yName(d.key)-height) +")" )})
.datum(function(d){return(d.density)})
.attr("fill", "#69b3a2")
.attr("stroke", "#000")
.attr("stroke-width", 1)
.attr("d", d3.line()
.curve(d3.curveBasis)
.x(function(d) { return x(d[0]); })
.y(function(d) { return y(d[1]); })
)
​
})
​
// This is what I need to compute kernel density estimation
function kernelDensityEstimator(kernel, X) {
return function(V) {
return X.map(function(x) {
return [x, d3.mean(V, function(v) { return kernel(x - v); })];
});
};
}
function kernelEpanechnikov(k) {
return function(v) {
return Math.abs(v /= k) <= 1 ? 0.75 * (1 - v * v) / k : 0;
};
}
​
</script>
""")
​

​