Using Draco for Visualization Design Space Exploration#
To help verify, debug, and tune the recommendation results, we provide general guidelines. We apply the guidelines and features in the following demonstration.
In this example we will use Draco to explore the visualization design space for the Seattle weather dataset. Starting with nothing but a raw dataset, we are going to use the reusable building blocks that Draco provides to generate a wide space of recommendations, and we will investigate the produced designs using the debugger module.
# Display utilities
import json
import numpy as np
from IPython.display import Markdown, display
# Handles serialization of common numpy datatypes
class NpEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(NpEncoder, self).default(obj)
def md(markdown: str):
display(Markdown(markdown))
def pprint(obj):
md(f"```json\n{json.dumps(obj, indent=2, cls=NpEncoder)}\n```")
Loading the Data#
We will use the Seattle weather dataset from the Vega Datasets for this example.
import altair as alt
import pandas as pd
from vega_datasets import data as vega_data
import draco as drc
# Loading data to be explored
df: pd.DataFrame = vega_data.seattle_weather()
df.head()
date | precipitation | temp_max | temp_min | wind | weather | |
---|---|---|---|---|---|---|
0 | 2012-01-01 | 0.0 | 12.8 | 5.0 | 4.7 | drizzle |
1 | 2012-01-02 | 10.9 | 10.6 | 2.8 | 4.5 | rain |
2 | 2012-01-03 | 0.8 | 11.7 | 7.2 | 2.3 | rain |
3 | 2012-01-04 | 20.3 | 12.2 | 5.6 | 4.7 | rain |
4 | 2012-01-05 | 1.3 | 8.9 | 2.8 | 6.1 | rain |
We can use the schema_from_dataframe
function to generate the schema of the dataset, including the data types of each column and their statistical properties.
data_schema = drc.schema_from_dataframe(df)
pprint(data_schema)
{
"number_rows": 1461,
"field": [
{
"name": "date",
"type": "datetime",
"unique": 1461,
"entropy": 7287,
"span_seconds": 126144000
},
{
"name": "precipitation",
"type": "number",
"unique": 111,
"entropy": 2422,
"min": 0,
"max": 55,
"std": 6,
"skew": 3
},
{
"name": "temp_max",
"type": "number",
"unique": 67,
"entropy": 3934,
"min": -1,
"max": 35,
"std": 7,
"skew": 0
},
{
"name": "temp_min",
"type": "number",
"unique": 55,
"entropy": 3596,
"min": -7,
"max": 18,
"std": 5,
"skew": 0
},
{
"name": "wind",
"type": "number",
"unique": 79,
"entropy": 3950,
"min": 0,
"max": 9,
"std": 1,
"skew": 0
},
{
"name": "weather",
"type": "string",
"unique": 5,
"entropy": 1201,
"freq": 714,
"min_length": 3,
"max_length": 7
}
]
}
We transform the data schema into a set of facts that Draco can use to reason about the data when generating recommendations. We use the dict_to_facts
function to do so which takes a dictionary and returns a list of facts.
The output list of facts encodes the same information as the input dictionary, it is just a different representation that we can feed into Clingo under the hood.
data_schema_facts = drc.dict_to_facts(data_schema)
pprint(data_schema_facts)
[
"attribute(number_rows,root,1461).",
"entity(field,root,0).",
"attribute((field,name),0,date).",
"attribute((field,type),0,datetime).",
"attribute((field,unique),0,1461).",
"attribute((field,entropy),0,7287).",
"attribute((field,span_seconds),0,126144000).",
"entity(field,root,1).",
"attribute((field,name),1,precipitation).",
"attribute((field,type),1,number).",
"attribute((field,unique),1,111).",
"attribute((field,entropy),1,2422).",
"attribute((field,min),1,0).",
"attribute((field,max),1,55).",
"attribute((field,std),1,6).",
"attribute((field,skew),1,3).",
"entity(field,root,2).",
"attribute((field,name),2,temp_max).",
"attribute((field,type),2,number).",
"attribute((field,unique),2,67).",
"attribute((field,entropy),2,3934).",
"attribute((field,min),2,-1).",
"attribute((field,max),2,35).",
"attribute((field,std),2,7).",
"attribute((field,skew),2,0).",
"entity(field,root,3).",
"attribute((field,name),3,temp_min).",
"attribute((field,type),3,number).",
"attribute((field,unique),3,55).",
"attribute((field,entropy),3,3596).",
"attribute((field,min),3,-7).",
"attribute((field,max),3,18).",
"attribute((field,std),3,5).",
"attribute((field,skew),3,0).",
"entity(field,root,4).",
"attribute((field,name),4,wind).",
"attribute((field,type),4,number).",
"attribute((field,unique),4,79).",
"attribute((field,entropy),4,3950).",
"attribute((field,min),4,0).",
"attribute((field,max),4,9).",
"attribute((field,std),4,1).",
"attribute((field,skew),4,0).",
"entity(field,root,5).",
"attribute((field,name),5,weather).",
"attribute((field,type),5,string).",
"attribute((field,unique),5,5).",
"attribute((field,entropy),5,1201).",
"attribute((field,freq),5,714).",
"attribute((field,min_length),5,3).",
"attribute((field,max_length),5,7)."
]
Iterating the partial specification query#
Generating recommendations from a minimal input
We start by defining input_spec_base
which is a list of facts including the data schema, a single view and a single mark.
This is the minimal set of facts that Draco needs to generate recommendations which can be rendered into charts.
We instantiate a Draco
object, using the default knowledge base, and an AltairRenderer
object which will be used to render the recommendations into Vega-Lite charts.
from draco.renderer import AltairRenderer
input_spec_base = data_schema_facts + [
"entity(view,root,v0).",
"entity(mark,v0,m0).",
]
d = drc.Draco()
renderer = AltairRenderer()
We can now use the complete_spec
method of the Draco
object to generate recommendations from incomplete specifications.
The function below is a reusable utility for this example, responsible for generating, rendering and displaying the recommendations.
def recommend_charts(
spec: list[str], draco: drc.Draco, num: int = 5, labeler=lambda i: f"CHART {i + 1}"
) -> dict[str, tuple[list[str], dict]]:
# Dictionary to store the generated recommendations, keyed by chart name
chart_specs = {}
for i, model in enumerate(draco.complete_spec(spec, num)):
chart_name = labeler(i)
spec = drc.answer_set_to_dict(model.answer_set)
chart_specs[chart_name] = drc.dict_to_facts(spec), spec
print(chart_name)
print(f"COST: {model.cost}")
chart = renderer.render(spec=spec, data=df)
# Adjust column-faceted chart size
if (
isinstance(chart, alt.FacetChart)
and chart.facet.column is not alt.Undefined
):
chart = chart.configure_view(continuousWidth=130, continuousHeight=130)
display(chart)
return chart_specs
We are using input_spec_base
as the starting point for our exploration, that is, we are only specifying the data schema, and that we want the recommendations to have at least one view and one mark.
input_spec = input_spec_base
initial_recommendations = recommend_charts(spec=input_spec, draco=d)
CHART 1
COST: [3]
CHART 2
COST: [4]
CHART 3
COST: [4]
CHART 4
COST: [5]
CHART 5
COST: [6]
While the above recommendations are valid, they are not very diverse. We can also observe that the first two recommendations are represented by seemingly identical Vega-Lite specifications, however, they have different costs. We explore this behavior below, by inspecting the Draco specification of the first two charts.
chart_1_key, chart_2_key = "CHART 1", "CHART 2"
(_, chart_1), (_, chart_2) = (
initial_recommendations[chart_1_key],
initial_recommendations[chart_2_key],
)
md(f"**Draco Specification of {chart_1_key}**")
pprint(chart_1)
md(f"**Draco Specification of {chart_2_key}**")
pprint(chart_2)
Draco Specification of CHART 1
{
"number_rows": 1461,
"task": "summary",
"field": [
{
"name": "date",
"type": "datetime",
"unique": 1461,
"entropy": 7287,
"span_seconds": 126144000
},
{
"name": "precipitation",
"type": "number",
"unique": 111,
"entropy": 2422,
"min": 0,
"max": 55,
"std": 6,
"skew": 3
},
{
"name": "temp_max",
"type": "number",
"unique": 67,
"entropy": 3934,
"min": -1,
"max": 35,
"std": 7,
"skew": 0
},
{
"name": "temp_min",
"type": "number",
"unique": 55,
"entropy": 3596,
"min": -7,
"max": 18,
"std": 5,
"skew": 0
},
{
"name": "wind",
"type": "number",
"unique": 79,
"entropy": 3950,
"min": 0,
"max": 9,
"std": 1,
"skew": 0
},
{
"name": "weather",
"type": "string",
"unique": 5,
"entropy": 1201,
"freq": 714,
"min_length": 3,
"max_length": 7
}
],
"view": [
{
"coordinates": "cartesian",
"mark": [
{
"type": "bar",
"encoding": [
{
"channel": "x",
"aggregate": "count"
}
]
}
],
"scale": [
{
"type": "linear",
"channel": "x",
"zero": "true"
}
]
}
]
}
Draco Specification of CHART 2
{
"number_rows": 1461,
"task": "summary",
"field": [
{
"name": "date",
"type": "datetime",
"unique": 1461,
"entropy": 7287,
"span_seconds": 126144000
},
{
"name": "precipitation",
"type": "number",
"unique": 111,
"entropy": 2422,
"min": 0,
"max": 55,
"std": 6,
"skew": 3
},
{
"name": "temp_max",
"type": "number",
"unique": 67,
"entropy": 3934,
"min": -1,
"max": 35,
"std": 7,
"skew": 0
},
{
"name": "temp_min",
"type": "number",
"unique": 55,
"entropy": 3596,
"min": -7,
"max": 18,
"std": 5,
"skew": 0
},
{
"name": "wind",
"type": "number",
"unique": 79,
"entropy": 3950,
"min": 0,
"max": 9,
"std": 1,
"skew": 0
},
{
"name": "weather",
"type": "string",
"unique": 5,
"entropy": 1201,
"freq": 714,
"min_length": 3,
"max_length": 7
}
],
"view": [
{
"coordinates": "cartesian",
"mark": [
{
"type": "bar",
"encoding": [
{
"channel": "y",
"aggregate": "count"
}
]
}
],
"scale": [
{
"zero": "true",
"channel": "y",
"type": "linear"
}
]
}
]
}
Taking a good look at the specifications above, we can see that they only differ by their "task"
attribute value. CHART 1
has "task": "summary"
, while CHART 2
has "task": "value"
. Thanks to the constraints in the default Draco knowledge base, the logical solver assigns slightly different costs to the two specifications. However, since the two charts use the same fields, scales, marks and encodings, the actual Vega-Lite specifications of the different Draco specifications are identical.
We can extend the input specification to better specify the design space we want to see recommendations for, to get more diverse results.
Let’s say, we want the fields date
and temp_max
of the weather dataset to be encoded in the charts.
Also, we specify that we want the chart to be a faceted chart.
Note that we are not specifying the mark type, the encoding channels for the fields nor for the facet. We leave this to Draco to decide, based on its underlying knowledge base.
input_spec = input_spec_base + [
# We want to encode the `date` field
"entity(encoding,m0,e0).",
"attribute((encoding,field),e0,date).",
# We want to encode the `temp_max` field
"entity(encoding,m0,e1).",
"attribute((encoding,field),e1,temp_max).",
# We want the chart to be a faceted chart
"entity(facet,v0,f0).",
"attribute((facet,channel),f0,col).",
]
recommendations = recommend_charts(spec=input_spec, draco=d, num=5)
CHART 1
COST: [22]
CHART 2
COST: [22]
CHART 3
COST: [23]
CHART 4
COST: [23]
CHART 5
COST: [23]
Inspecting the Knowledge Base#
Debugging the recommendations
We can use the DracoDebug
class to investigate the recommendations generated by Draco and whether they violate any of the soft constraints.
We start by instantiating a DracoDebug
object, passing the recommendations and the Draco
object used to generate them.
A DataFrame
is returned, containing the recommendations and the soft constraints that they violate as well as the weights associated with each constraint.
# Parameterized helper to avoid code duplication as we iterate on designs
def display_debug_data(draco: drc.Draco, specs: dict[str, tuple[list[str], dict]]):
debugger = drc.DracoDebug(
specs={chart_name: fact_spec for chart_name, (fact_spec, _) in specs.items()},
draco=draco,
)
chart_preferences = debugger.chart_preferences
display(Markdown("**Raw debug data**"))
display(chart_preferences.head())
display(Markdown("**Number of violated preferences**"))
num_violations = len(
set(chart_preferences[chart_preferences["count"] != 0]["pref_name"])
)
num_all = len(set(chart_preferences["pref_name"]))
display(
Markdown(
f"*{num_violations} preferences are violated out of a total of {num_all} preferences (soft constraints)*"
)
)
display(
Markdown(
"Using `DracoDebugPlotter` to visualize the debug `DataFrame` produced by `DracoDebug`:"
)
)
plotter = drc.DracoDebugPlotter(chart_preferences)
plot_size = (600, 300)
chart = plotter.create_chart(
cfg=drc.DracoDebugChartConfig.SORT_BY_COUNT_SUM,
violated_prefs_only=True,
plot_size=plot_size,
)
display(chart)
display_debug_data(draco=d, specs=recommendations)
Raw debug data
chart_name | pref_name | pref_description | count | weight | |
---|---|---|---|---|---|
0 | CHART 1 | cartesian_coordinate | Cartesian coordinates. | 1 | 0 |
1 | CHART 1 | summary_point | Point mark for summary tasks. | 1 | 0 |
2 | CHART 1 | linear_y | Linear scale with y channel. | 1 | 0 |
3 | CHART 1 | linear_x | Linear scale with x channel. | 1 | 0 |
4 | CHART 1 | c_c_point | Continuous by continuous for point mark. | 1 | 0 |
Number of violated preferences
20 preferences are violated out of a total of 151 preferences (soft constraints)
Using DracoDebugPlotter
to visualize the debug DataFrame
produced by DracoDebug
:
Generating Input Specifications Programmatically#
Exploring more possibilities within the design space
To get a better impression of the space of possible visualizations and to produce examples that might be covered by more soft constraints, we can programmatically generate further input specifications. We define a list of possible values for the mark type, fields and encoding channels that we want to be used in the recommendations and combine them using a nested list comprehension. We also filter out designs with less than 3 encodings and exclude multi-layer designs for now.
We set off by creating the helper function rec_from_generated_spec
to avoid code duplication as we iterate on designs.
def rec_from_generated_spec(
marks: list[str],
fields: list[str],
encoding_channels: list[str],
draco: drc.Draco,
num: int = 1,
) -> dict[str, dict]:
input_specs = [
(
(mark, field, enc_ch),
input_spec_base
+ [
f"attribute((mark,type),m0,{mark}).",
"entity(encoding,m0,e0).",
f"attribute((encoding,field),e0,{field}).",
f"attribute((encoding,channel),e0,{enc_ch}).",
# filter out designs with less than 3 encodings
":- {entity(encoding,_,_)} < 3.",
# exclude multi-layer designs
":- {entity(mark,_,_)} != 1.",
],
)
for mark in marks
for field in fields
for enc_ch in encoding_channels
]
recs = {}
for cfg, spec in input_specs:
def labeler(i):
f"CHART {i + 1} ({' | '.join(cfg)})"
recs = recs | recommend_charts(spec=spec, draco=draco, num=num, labeler=labeler)
return recs
recommendations = rec_from_generated_spec(
marks=["point", "bar", "line", "rect"],
fields=["weather", "temp_min", "date"],
encoding_channels=["color", "shape", "size"],
draco=d,
)
None
COST: [25]
None
COST: [28]
None
COST: [30]
None
COST: [27]
None
COST: [41]
None
COST: [28]
None
COST: [42]
None
COST: [19]
None
COST: [25]
None
COST: [27]
None
COST: [28]
None
COST: [45]
None
COST: [47]
None
COST: [48]
None
COST: [73]
None
COST: [39]
None
COST: [42]
It is no secret that some of the above recommendations are not very useful when it comes to communicating the data. Nevertheless, they are valid visualizations from the space of possibilities. Following the already introduced workflow, we can use DracoDebug
to investigate the soft constraint violations of the generated recommendations. If there are recommendations we are not happy with, we can extend the knowledge base to cover them so that they do not appear in the future.
display_debug_data(draco=d, specs=recommendations)
Raw debug data
chart_name | pref_name | pref_description | count | weight | |
---|---|---|---|---|---|
0 | None | cartesian_coordinate | Cartesian coordinates. | 1 | 0 |
1 | None | summary_rect | Rect mark for summary tasks. | 1 | 0 |
2 | None | aggregate_min | Min as aggregate op. | 1 | 4 |
3 | None | ordinal_y | Ordinal scale with y channel. | 1 | 0 |
4 | None | ordinal_x | Ordinal scale with x channel. | 1 | 1 |
Number of violated preferences
15 preferences are violated out of a total of 151 preferences (soft constraints)
Using DracoDebugPlotter
to visualize the debug DataFrame
produced by DracoDebug
:
Adjusting the Knowledge Base#
Filtering out suboptimal designs by creating a new soft constraint and tuning its weight
As apparent from the above-generated recommendations, there are some visualizations that are valid but not as expressive as we would desire. As a concrete example, the recommendations CHART 1 (rect | weather | color)
and CHART 1 (rect | date | color)
used row-faceting and no rules in our knowledge base penalised them for doing so.
We demonstrate how we can extend the knowledge base with a design rule (soft constraint) to discourage using faceting with rect
mark and color
encoding and how to tune its weight to achieve more desirable recommendations.
We start by creating the helper function draco_with_updated_kb
, to return a Draco
instance with the updated knowledge base.
We extend the knowledge base with a new preference (soft constraint) called rect_color_facet
to discourage
faceting with rect mark and color encoding. To explore how the recommendations change as we assign different weights to the new soft constraint, we parameterize this function to accept pref_weight
as an argument. This weight will be associated with the rect_color_facet
preference we extend the knowledge base with.
def draco_with_updated_kb(pref_weight: int) -> drc.Draco:
# Custom soft constraint to discourage faceting with rect mark and color encoding
rect_color_facet_pref = """
% @soft(rect_color_facet) Faceting with rect mark and color encoding.
preference(rect_color_facet,Fa) :-
attribute((mark,type),_,rect),
attribute((encoding,channel),_,color),
attribute((facet,channel),Fa,_).
""".strip()
rect_color_facet_pref_weight = pref_weight
# Update the default soft constraint knowledge base (program)
soft_updated = drc.Draco().soft + f"\n\n{rect_color_facet_pref}\n\n"
# Assign the weight to the new soft constraint
weights_updated = drc.Draco().weights | {
"rect_color_facet_weight": rect_color_facet_pref_weight
}
return drc.Draco(soft=soft_updated, weights=weights_updated)
As opposed to the previous example, we only generate specifications for the rect
mark, the weather
and date
fields and the color
encoding channel, since we observed the undesired faceted recommendations for these configurations. We explore how the recommendations change as we assign a higher weight value (that is, a higher penalty) to the new soft constraint.
Verifying That the Knowledge Base Got Updated#
First, to validate that the rect_color_facet
soft constraint we created got registered properly to our knowledge base we start with a weight of 0
. We expect to obtain the same, faceted recommendations, but we also expect to see in the plot created in display_debug_data
by DracoDebugPlotter
that the faceted recommendations violate the design preference we defined.
weight = 0
display(Markdown(f"**Weight for `rect_color_facet` preference: {weight}**"))
updated_draco = draco_with_updated_kb(pref_weight=weight)
recommendations = rec_from_generated_spec(
marks=["rect"],
fields=["weather", "date"],
encoding_channels=["color"],
draco=updated_draco,
)
display_debug_data(draco=updated_draco, specs=recommendations)
Weight for rect_color_facet
preference: 0
None
COST: [73]
None
COST: [42]
Raw debug data
chart_name | pref_name | pref_description | count | weight | |
---|---|---|---|---|---|
0 | None | cartesian_coordinate | Cartesian coordinates. | 1 | 0 |
1 | None | summary_rect | Rect mark for summary tasks. | 1 | 0 |
2 | None | aggregate_max | Max as aggregate op. | 1 | 4 |
3 | None | ordinal_y | Ordinal scale with y channel. | 1 | 0 |
4 | None | ordinal_x | Ordinal scale with x channel. | 1 | 1 |
Number of violated preferences
15 preferences are violated out of a total of 152 preferences (soft constraints)
Using DracoDebugPlotter
to visualize the debug DataFrame
produced by DracoDebug
:
As expected, our debug plot indicates in the heatmap’s 6th column that both CHART 1 (rect | weather | color)
and CHART 1 (rect | date | color)
violate the rect_color_facet
preference we introduced. Now we can work on tuning the weight associated with this rule, so that we actually penalize the usage of faceting when we have a rect
mark and color
as the encoding channel.
Weight Tuning#
We increase the weight from 0
to 10
and by doing so we expect that this penalty will be sufficient for the Clingo solver the find a model with a lower cost, not violating the rect_color_facet
design rule we extended our knowledge base with.
weight = 10
display(Markdown(f"**Weight for `rect_color_facet` preference: {weight}**"))
updated_draco = draco_with_updated_kb(pref_weight=weight)
recommendations = rec_from_generated_spec(
marks=["rect"],
fields=["weather", "date"],
encoding_channels=["color"],
draco=updated_draco,
)
display_debug_data(draco=updated_draco, specs=recommendations)
Weight for rect_color_facet
preference: 10
None
COST: [73]
None
COST: [42]
Raw debug data
chart_name | pref_name | pref_description | count | weight | |
---|---|---|---|---|---|
0 | None | cartesian_coordinate | Cartesian coordinates. | 1 | 0 |
1 | None | summary_rect | Rect mark for summary tasks. | 1 | 0 |
2 | None | aggregate_max | Max as aggregate op. | 1 | 4 |
3 | None | ordinal_y | Ordinal scale with y channel. | 1 | 0 |
4 | None | ordinal_x | Ordinal scale with x channel. | 1 | 1 |
Number of violated preferences
15 preferences are violated out of a total of 152 preferences (soft constraints)
Using DracoDebugPlotter
to visualize the debug DataFrame
produced by DracoDebug
:
Just as expected, thanks to the higher weight assigned to the newly added rect_color_facet
rule, we don’t see recommendations using faceting when a rect
mark and color
encoding is used. One can use the very same process to tailor the knowledge base and fine-tune the constraint weights to obtain more expressive visualization recommendations.