Chartify is a new plotting library that was recently open-sourced by Spotify Labs. You can read their announcement article here. Chartify is intended to make it easy for Python users to create standard chart types, including line, bar and area charts, and is built on top of Bokeh. As a Bokeh core contributor, I quickly experimented with Chartify to see what it’s like.
tl;dr I’m impressed. Chartify offers a clean API to ingest tidy data and generate a variety of visually pleasing charts, while also exposing the underlying Bokeh figure for further customization. I’m excited about this addition to the Python data visualization ecosystem.
(taken from https://github.com/spotify/chartify/blob/master/examples/Examples.ipynb)
Why does Chartify build on Bokeh?
Bokeh is a tool for creating web-based, interactive visualizations and offers a lot of primitives (like lines and circles) that users combine into highly customized visualizations. However, using primitives means that users may be required to do extra data manipulation to create their desired plot. For example, there’s no from bokeh import StackedBarChart
. Users can certainly create such a chart using Bokeh, but doing so requires figuring out how to transform their data into beginning and end positions for the stacked bars. Chartify aims to abstract away this data transformation step for users making standard chart types.
Using Tidy Data:
Chartify consumes tidy data, a data formatting concept that originated in the R ecosystem. You can read the whole explanation here, but synopsis is that a tidy dataset is one that is structured where:
- Each variable forms a column
- Each observation forms a row
- Each type of observational unit forms a table
To fully understand, it might be easier to look at an example of each:
Tidy Data Example:
date | country | fruit | unit_price | quantity | total_price | |
---|---|---|---|---|---|---|
0 | 2017-10-21 | US | Banana | 0.303711 | 4 | 1.214846 |
1 | 2017-05-30 | JP | Banana | 0.254109 | 4 | 1.016436 |
2 | 2017-05-21 | CA | Banana | 0.268635 | 4 | 1.074539 |
3 | 2017-09-18 | BR | Grape | 2.215277 | 2 | 4.430554 |
4 | 2017-12-08 | US | Banana | 0.308337 | 5 | 1.541687 |
Untidy Data Example:
country | BR | CA | GB | JP | US |
---|---|---|---|---|---|
fruit | |||||
Apple | 57 | 144 | 177 | 65 | 165 |
Banana | 30 | 222 | 113 | 232 | 479 |
Grape | 54 | 86 | 59 | 52 | 81 |
Orange | 74 | 207 | 97 | 75 | 409 |
You can see that each row in the tidy dataset contains a unique observation, composed of values for each variable. In the untidy dataset, each row corresponds to the summary of a different type of fruit and not unique observations.
Data analysis tools like Pandas are generally designed to consume data that matches this standard. Since Chartify is a Python library, you can read about about tidying data in Pandas from Pandas core contributor Tom Augspurger here. This is especially relevant because Chartify ingests tidy Pandas DataFrames for plotting, which is hugely valuable because users don’t have to do any special data transformation in order create visualizations.
The Chartify API
Chartify users create a chartify.Chart
object and specify one of a few enumerated axis types for the x and y axes. The resulting Chart
object will contain a set of appropriate plotting methods for your axis pair type. For example, using a "datetime"
x-axis and linear
y-axis means that a line chart is a good idea and bar chart is not, because bar charts are typically intended for categorical data. I think this is great - Bokeh tries very hard to help users make effective visualizations by having nice defaults and I think these opinionated guardrails are good. tk
The allowed axis types:
x_axis_type (enum, str):
- ‘linear’:
- ‘log’:
- ‘datetime’: Use for datetime formatted data.
- ‘categorical’:
- ‘density’
y_axis_type (enum, str):
- ‘linear’:
- ‘log’:
- ‘categorical’:
- ‘density’
As of release 2.3.5, Chartify offers the following chart types for the corresponding x and y axis types:
X Axis Below/Y Axis Right | linear/log/datetime | categorical | density |
---|---|---|---|
linear/log/datetime | line, scatter, text, area | bar, lollipop, parallel | kde, histogram |
categorical | bar, lollipop, parallel | heatmap | kde, histogram |
density | kde, histogram | kde, histogram | hexbin |
(Note: both area
and bar
include stacked area and bar charts)
While there’s endless the potential to add more, I think Chartify more than covers the necessary charts for general report generation.
Using the “chartify.Chart.plot” methods
Users pass their tidy dataframe into their chosen plotting method and specify which column names correspond visualization properties using keyword arguments. In this case, I created a grouped bar chart by specifying the "country"
and "fruit"
columns for the groupings and the "quantity"
column
for the data value. Additionally, I passed in optional kwargs to set the bar colors and ordering.
quantity_by_fruit_and_country = (tidy_data.groupby(
['fruit', 'country'])['quantity'].sum().reset_index())
ch = chartify.Chart(blank_labels=True, x_axis_type='categorical', y_axis_type='linear')
ch.set_title("Fruit by Country")
ch.set_subtitle("Change categorical order with 'categorical_order_by'.")
ch.plot.bar(
data_frame=quantity_by_fruit_and_country,
categorical_columns=['country', 'fruit'],
numeric_column='quantity',
color_column='country', ## optional
categorical_order_by='labels', ## optional
categorical_order_ascending=True ## optional
)
ch.axes.set_xaxis_tick_orientation('vertical')
ch.show()
I’ve very excited that Chartify exposes the Bokeh Figure object that it creates on the chart’s .figure
property. This means users get the wonderful functionality of a nice charting API while also being able to drop down to Bokeh-level APIs to further customize their plots. In this example, I modified a Chartify scatter plot to add a custom HoverTool and make the figure size be responsive. (You can test this by hovering over the plot and dragging the browser window larger and smaller.)
from bokeh.models import HoverTool
ch = chartify.Chart(blank_labels=True, x_axis_type='datetime', y_axis_type='linear')
ch.plot.scatter(
data_frame=tidy_data,
x_column="date",
y_column="total_price",
size_column='quantity',
color_column='fruit')
hover = HoverTool(tooltips=[
("Total Price (M $)", "@total_price"),
("Quantity Sold (M Units)", "@quantity"),
])
### access Bokeh.Figure object
ch.figure.add_tools(hover)
ch.figure.sizing_mode = 'scale_width
ch.show()
Beyond the Chart.plot
methods and accessing the Bokeh figure via Chart.figure
, Chartify also offers interfaces to modify plots styles, add annotations, and format the axes:
the chartify.Chart methods:
- Styling (.style)
- Plotting (.plot)
- Callouts (.callout)
- Axes (.axes)
- Bokeh figure (.figure)
You can views more demonstrations of these in Chartify’s examples notebook here.
Summation
Chartify offers a pleasant high-level interface for ingesting tidy data and generating a variety of visually pleasing charts, while also exposing the underlying Bokeh object for further customization. I’m excited about this addition to the Python data visualization ecosystem.