Zepl offers a UI-driven, language agnostic visualization engine with a one single line of code:z.show(DataFrame)
. This reduces the number of lines of coded required for all users to easily create charts and graphs.
Supported Charting Types:
Table
Bar
Line
Pie
Area
Line
Scatter
Heat Map
Radar
Sankey
Plotly Editor: Additional chart types are available through our Plotly editor
Create a tabular object in Python (Pandas DataFrame), Scala (Spark DataFrame), R (list object), or SQL
Use z.show(df)
function to render the visualizations options
%pythonimport pandas as pddf = pd.read_csv("https://s3-datasource-tutorial.s3.amazonaws.com/titanic3.csv")z.show(df)
%sparkval df = spark.read.option("header",true).csv("./titanic3.csv")z.show(df)
%rdata <- read.csv("./titanic3.csv")z.show(data)
%datasource.DATA_SOURCE_NAMESELECT * FROM sample_data_table
By default, Zepl's UI Editor visualizes the first 1000 data points passed to the z.show() function. For example, if your Pandas DataFrame (df) contains 2000 rows, only the first 1000 rows will be displayed in the visualization editor. This setting can be increased by any user with the Organization Owner Security Policy.
Increase limit
Increasing this setting may cause notebooks to load slower or crash at load time. We recommend not increasing this value beyond 5000.
If you are experiencing slow notebook load times, please contact [email protected]
Navigate to Resources > Interpreters
Select the interpreter. This limit value is set per interpreter. For example, if the visualizations requiring additional data points are generated using Python, then select %python
In the text field labeled, "Max number of dataframe rows to display", enter the desired number to increase this value to
Select Apply
Run a notebook paragraph with this function z.show(df)
Select the last symbol in the charting list called "Plotly Chart"
Select the "Plotly Chart Editor" button
Select the "+ Trace" button and select your trace "Type". This should show a list of charting options:
Select the input values for your charting type. This will be the columns from the DataFrame that was passed into the z.show(df)
function.
Transformations allow users to Filter, Split, Aggregate, and Sort your data.
Select "+ Transform" and choose your transformation
Next Select the "Target" or "By" value. This variable will be used to Transform the data set in you graph.
Below is an example of using multiple Transformations (Filter and Split) on the same dataset:
For large datasets, Zepl recommends that users transform their data before passing their data to z.show(df).
This will result in optimal performance and visualization experience.
General:
Defaults: Background color, color scales, and fonts
Title: Set your titles and font type here
Modebar
Size and Margins: Relative Size and Graph position
Interactions: Drag, Click, and Hover drill down behaviors
Meta Text: Pass data to your graph's titles and text outputs
Traces: The "Traces" section is specific to the type of graph selected above. Each trace will provide options specific to its corresponding trace type (Bar, line, etc...)
Axes:
Titles: Set the title for each axis here. This may also be done by clicking the text along each axis on the interactive graph on the right, which reads "Click to enter X axis title"
Range: Set the Range scale (Linear, Log, Date, Categorical, Multicategorical) and range of the axis. This is done automatically by Plotly when rendering the graph. Enable or disable Zoom.
Lines: Hide or Show grid Axis, Grid, or Zero Line
Tick Labels: Position where axis labels appear and set font size/type
Tick Markets: Enable or disable vertical markers for each axis label
Range Slider: Enables horizontal zoom. Best used for time series charts
Spike Lines: Show data spikes
Legend: The "Legend" section allows users to change the size, position, text font and color of the graph legend.
Text: Add text based overlays on each graph
Shapes: Add colored shape based overlays on each graph
Images: Add images as an overlay on each graph
Zepl also offers Zeppelin compatible visualizations (Table, Bar, Pie, Area, Line, and Scatter). Zepl has also expanded the capabilities from the Zeppelin notebook visualizations to support additional chart types, such as Heatmap, Radar, Sankey, and Plotly (see above).
Each Zepl and Zeppelin visualizations contains separate settings for the user to drag, drop, and select the appropriate values for the desired chart. Each chart option will contain two sections with chart specific options in each, Charts and Parameters. To access these settings, fist select a chart (Bar Chart). Then, select the Settings button.
The Chart section allows users to drag and drop column names from the DataFrame that was passed into the z.show(DataFrame)
function. When you expand this section you will see a list of column names and chart specific options.
Below is an example specific to the Bar Chart chart type:
Chart specific options
Available Columns: Drag and drop values into below boxes
XAxis:
Specifies variable for x-axis
This can have 1 or many values
Y-Axis: Select this value to aggregate on a SUM or a COUNT of this column value
Specifies variable for y-axis
This can have 1 or many values
Category (split by): Split the Y-Axis value by this variable
The parameters section contains individual chart parameters. Traditionally these values are configured within your code, however, this gives the user the ability to manipulate these settings through the user interface. These charting options may vary depending on the chart type selected. Below is a continued example from the Bar Chart selected above.
Frequently used parameters:
mainTitle: Set the title of this chart
xAxisName and yAxisName: Set label for x and y axis
xAxisUnit and yAxisUnit: Set a text based label to appear next to x and y axis values
colorSet: Set graph color options
enableTooltip: Display or hide interactive tooltip
Zepl supports the use of visualization libraries specific to the programming language of your choice. Each paragraph in the Zepl Notebook can render charts and graphs to the paragraph output. Below are several of the most commonly used visualization libraries for Python, Spark, and R.
All examples below can be found here: Open In Zepl
Matplotlib v3.1.1 is already installed in Zepl's General Puprose Image. Using matplotlib in the Zepl notebook is the same process as any other notebook or python environment. See the example below for using matplotlib:
%python# Reference Documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.htmlimport matplotlib.pyplot as pltimport pandas as pdts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))ts = ts.cumsum()ts.plot()
%python# Reference Documentation: https://matplotlib.org/tutorials/introductory/pyplot.htmlimport matplotlib.pyplot as pltplt.plot([1, 2, 3, 4])plt.ylabel('some numbers')plt.show()
seaborn v0.9.0 is already installed in Zepl's General Puprose Image. Using seaborn in the Zepl notebook is the same process as any other notebook or python environment. See the example below for using seaborn:
%python# Reference Documentation: https://seaborn.pydata.org/examples/grouped_boxplot.htmlimport seaborn as sns# Themes are supported in later versions of seaborn (v0.11.0). If required stop your container and update seaborn `!pip install -U seaborn`# sns.set_theme(style="ticks", palette="pastel")# Load the example tips datasettips = sns.load_dataset("tips")# Draw a nested boxplot to show bills by day and timesns.boxplot(x="day", y="total_bill",hue="smoker", palette=["m", "g"],data=tips)sns.despine(offset=10, trim=True)
%python# Reference Documentation: https://seaborn.pydata.org/examples/many_facets.htmlimport numpy as npimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltsns.set_theme(style="ticks")# Create a dataset with many short random walksrs = np.random.RandomState(4)pos = rs.randint(-1, 2, (20, 5)).cumsum(axis=1)pos -= pos[:, 0, np.newaxis]step = np.tile(range(5), 20)walk = np.repeat(range(20), 5)df = pd.DataFrame(np.c_[pos.flat, step, walk],columns=["position", "step", "walk"])# Initialize a grid of plots with an Axes for each walkgrid = sns.FacetGrid(df, col="walk", hue="walk", palette="tab20c",col_wrap=4, height=1.5)# Draw a horizontal line to show the starting pointgrid.map(plt.axhline, y=0, ls=":", c=".5")# Draw a line plot to show the trajectory of each random walkgrid.map(plt.plot, "step", "position", marker="o")# Adjust the tick positions and labelsgrid.set(xticks=np.arange(5), yticks=[-3, 3],xlim=(-.5, 4.5), ylim=(-3.5, 3.5))# Adjust the arrangement of the plotsgrid.fig.tight_layout(w_pad=1)
While Zepl's Plotly Charting Editor, is great for building UI driven visuals, there may be a need to programmatically generate plotly charts, themes, and parameters. Plotly v4.2.1 is already installed in Zepl's General Puprose Image.
Example of using Plotly library and displaying figures to a paragraph output:
%python# Reference Documentation: https://plotly.com/python/creating-and-updating-figures/import plotly.graph_objects as gofig = go.Figure(data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],layout=go.Layout(title=go.layout.Title(text="A Figure Specified By A Graph Object")))fig.show()
ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed
To resolve this error, add the following function to any notebook that needs to render plotly charts:
%pythonimport plotly# Create plot divdef plot(plot_def, **kwargs):kwargs['output_type'] = 'div'plot_str = plotly.offline.plot(plot_def, **kwargs)print('%%angular <div>%s</div>' % plot_str)
Final result would look like this:
%python# Create plot divdef plot(plot_def, **kwargs):kwargs['output_type'] = 'div'plot_str = plotly.offline.plot(plot_def, **kwargs)print('%%angular <div>%s</div>' % plot_str)# Reference Documentation: https://plotly.com/python/creating-and-updating-figures/import plotly.graph_objects as gofig = go.Figure(data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],layout=go.Layout(title=go.layout.Title(text="A Figure Specified By A Graph Object")))plot(fig)
ggplot2 v3.3.2 is already installed in Zepl's General Puprose Image. Using ggplot2 in the Zepl notebook is the same process as any other R environment. See the example below for using ggplot2:
%r#Refernce Documentation: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Jitter%20Plot# load package and dataoptions(scipen=999) # turn-off scientific notation like 1e+48library(ggplot2)theme_set(theme_bw()) # pre-set the bw theme.data("midwest", package = "ggplot2")# midwest <- read.csv("http://goo.gl/G1K41K") # bkup data source# Scatterplotgg <- ggplot(midwest, aes(x=area, y=poptotal)) +geom_point(aes(col=state, size=popdensity)) +geom_smooth(method="loess", se=F) +xlim(c(0, 0.1)) +ylim(c(0, 500000)) +labs(subtitle="Area Vs Population",y="Population",x="Area",title="Scatterplot",caption = "Source: midwest")plot(gg)
Zepl paragraphs can render any HTML, CSS, or JavaScript. This approach is commonly used to pass processed data into flexible front end charting libraries like Highcharts or D3.
Starter Code: How to render HTML / CSS / and JS in any language, use the following starter code:
%pythonprint("""%html ... Your HTML / CSS / JS here ...""")
%sparkprintln("""%html ... Your HTML / CSS / JS here ...""")
%pysparkprint("""%html ... Your HTML / CSS / JS here ...""")
%rprint('%html ... Your HTML / CSS / JS here ... ')
Extended examples:
%python# Part of the code is from# http://www.highcharts.com/maps/demo/map-drilldownprint('''%html<div id="highchart_container" style="height: 500px; min-width: 310px; max-width: 800px; margin: 0 auto"></div><h6><a href="https://shop.highsoft.com/highmaps" target="_blank">highmap<a></h6><script>function myChartRendering() {/*TODO:- Check data labels after drilling. Label rank? New positions?- Not US Mainland text- Separators*/var data = Highcharts.geojson(Highcharts.maps['countries/us/us-all']),// Some responsivenesssmall = $('#highchart_container').width() < 400;// Set drilldown pointers$.each(data, function (i) {this.drilldown = this.properties['hc-key'];this.value = i; // Non-random bogus data});// Instanciate the mapHighcharts.mapChart('highchart_container', {chart: {events: {drilldown: function (e) {if (!e.seriesOptions) {var chart = this,mapKey = 'countries/us/' + e.point.drilldown + '-all',// Handle error, the timeout is cleared on successfail = setTimeout(function () {if (!Highcharts.maps[mapKey]) {chart.showLoading('<i class="icon-frown"></i> Failed loading ' + e.point.name);fail = setTimeout(function () {chart.hideLoading();}, 1000);}}, 3000);// Show the spinnerchart.showLoading('<i class="icon-spinner icon-spin icon-3x"></i>'); // Font Awesome spinner// Load the drilldown map$.getScript('https://code.highcharts.com/mapdata/' + mapKey + '.js', function () {data = Highcharts.geojson(Highcharts.maps[mapKey]);// Set a non-random bogus value$.each(data, function (i) {this.value = i;});// Hide loading and add serieschart.hideLoading();clearTimeout(fail);chart.addSeriesAsDrilldown(e.point, {name: e.point.name,data: data,dataLabels: {enabled: true,format: '{point.name}'}});});}this.setTitle(null, { text: e.point.name });},drillup: function () {this.setTitle(null, { text: 'USA' });}}},title: {text: 'Highcharts Map Drilldown'},subtitle: {text: 'USA',floating: true,align: 'right',y: 50,style: {fontSize: '16px'}},legend: small ? {} : {layout: 'vertical',align: 'right',verticalAlign: 'middle'},colorAxis: {min: 0,minColor: '#E6E7E8',maxColor: '#005645'},mapNavigation: {enabled: true,buttonOptions: {verticalAlign: 'bottom'}},plotOptions: {map: {states: {hover: {color: '#EEDD66'}}}},series: [{data: data,name: 'USA',dataLabels: {enabled: true,format: '{point.properties.postal-code}'}}],drilldown: {activeDataLabelStyle: {color: '#FFFFFF',textDecoration: 'none',textOutline: '1px #000000'},drillUpButton: {relativeTo: 'spacingBox',position: {x: 0,y: 60}}}});}jQuery.getScript('//code.jquery.com/jquery-3.1.1.min.js',function() {$("#highchart_container").text("loadding... jquery");jQuery.getScript('//code.highcharts.com/maps/highmaps.js',function() {$("#highchart_container").text("loadding... Highcmaps");jQuery.getScript('//code.highcharts.com/maps/modules/data.js',function() {$("#highchart_container").text("loadding... Highcmaps.. modules");jQuery.getScript('//code.highcharts.com/maps/modules/drilldown.js',function() {$("#highchart_container").text("loadding... drilldown");jQuery.getScript('//code.highcharts.com/mapdata/countries/us/us-all.js',function() {$("#highchart_container").text("loadding... us map data...");jQuery("#highchart_container").ready,jQuery.Deferred(function( deferred ){$("#highchart_container").text("loadding... ready...");jQuery( deferred.resolve );if (!!Highcharts) {myChartRendering();}})})})})})});</script>''')
%python# Reference Documentation: https://www.d3-graph-gallery.com/graph/ridgeline_basic.htmlprint("""%html<!-- Code from d3-graph-gallery.com --><meta charset="utf-8"><!-- Load d3.js --><script src="https://d3js.org/d3.v4.js"></script><!-- Create a div where the graph will take place --><div id="my_dataviz"></div><script>// set the dimensions and margins of the graphvar margin = {top: 60, right: 30, bottom: 20, left:110},width = 460 - margin.left - margin.right,height = 400 - margin.top - margin.bottom;// append the svg object to the body of the pagevar svg = d3.select("#my_dataviz").append("svg").attr("width", width + margin.left + margin.right).attr("height", height + margin.top + margin.bottom).append("g").attr("transform","translate(" + margin.left + "," + margin.top + ")");//read datad3.csv("https://raw.githubusercontent.com/zonination/perceptions/master/probly.csv", function(data) {// Get the different categories and count themvar categories = data.columnsvar n = categories.length// Add X axisvar x = d3.scaleLinear().domain([-10, 140]).range([ 0, width ]);svg.append("g").attr("transform", "translate(0," + height + ")").call(d3.axisBottom(x));// Create a Y scale for densitiesvar y = d3.scaleLinear().domain([0, 0.4]).range([ height, 0]);// Create the Y axis for namesvar yName = d3.scaleBand().domain(categories).range([0, height]).paddingInner(1)svg.append("g").call(d3.axisLeft(yName));// Compute kernel density estimation for each column:var kde = kernelDensityEstimator(kernelEpanechnikov(7), x.ticks(40)) // increase this 40 for more accurate density.var allDensity = []for (i = 0; i < n; i++) {key = categories[i]density = kde( data.map(function(d){ return d[key]; }) )allDensity.push({key: key, density: density})}// Add areassvg.selectAll("areas").data(allDensity).enter().append("path").attr("transform", function(d){return("translate(0," + (yName(d.key)-height) +")" )}).datum(function(d){return(d.density)}).attr("fill", "#69b3a2").attr("stroke", "#000").attr("stroke-width", 1).attr("d", d3.line().curve(d3.curveBasis).x(function(d) { return x(d[0]); }).y(function(d) { return y(d[1]); }))})// This is what I need to compute kernel density estimationfunction kernelDensityEstimator(kernel, X) {return function(V) {return X.map(function(x) {return [x, d3.mean(V, function(v) { return kernel(x - v); })];});};}function kernelEpanechnikov(k) {return function(v) {return Math.abs(v /= k) <= 1 ? 0.75 * (1 - v * v) / k : 0;};}</script>""")