Using Draco for Visualization Design Space Exploration#

To help verify, debug, and tune the recommendation results, we provide general guidelines. We apply the guidelines and features in the following demonstration.

In this example we will use Draco to explore the visualization design space for the Seattle weather dataset. Starting with nothing but a raw dataset, we are going to use the reusable building blocks that Draco provides to generate a wide space of recommendations, and we will investigate the produced designs using the debugger module.

# Display utilities
import json

import numpy as np
from IPython.display import Markdown, display


# Handles serialization of common numpy datatypes
class NpEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        else:
            return super(NpEncoder, self).default(obj)


def md(markdown: str):
    display(Markdown(markdown))


def pprint(obj):
    md(f"```json\n{json.dumps(obj, indent=2, cls=NpEncoder)}\n```")

Loading the Data#

We will use the Seattle weather dataset from the Vega Datasets for this example.

import altair as alt
import pandas as pd
from vega_datasets import data as vega_data

import draco as drc

# Loading data to be explored
df: pd.DataFrame = vega_data.seattle_weather()
df.head()
date precipitation temp_max temp_min wind weather
0 2012-01-01 0.0 12.8 5.0 4.7 drizzle
1 2012-01-02 10.9 10.6 2.8 4.5 rain
2 2012-01-03 0.8 11.7 7.2 2.3 rain
3 2012-01-04 20.3 12.2 5.6 4.7 rain
4 2012-01-05 1.3 8.9 2.8 6.1 rain

We can use the schema_from_dataframe function to generate the schema of the dataset, including the data types of each column and their statistical properties.

data_schema = drc.schema_from_dataframe(df)
pprint(data_schema)
{
  "number_rows": 1461,
  "field": [
    {
      "name": "date",
      "type": "datetime",
      "unique": 1461,
      "entropy": 7287
    },
    {
      "name": "precipitation",
      "type": "number",
      "unique": 111,
      "entropy": 2422,
      "min": 0,
      "max": 55,
      "std": 6
    },
    {
      "name": "temp_max",
      "type": "number",
      "unique": 67,
      "entropy": 3934,
      "min": -1,
      "max": 35,
      "std": 7
    },
    {
      "name": "temp_min",
      "type": "number",
      "unique": 55,
      "entropy": 3596,
      "min": -7,
      "max": 18,
      "std": 5
    },
    {
      "name": "wind",
      "type": "number",
      "unique": 79,
      "entropy": 3950,
      "min": 0,
      "max": 9,
      "std": 1
    },
    {
      "name": "weather",
      "type": "string",
      "unique": 5,
      "entropy": 1201,
      "freq": 714
    }
  ]
}

We transform the data schema into a set of facts that Draco can use to reason about the data when generating recommendations. We use the dict_to_facts function to do so which takes a dictionary and returns a list of facts. The output list of facts encodes the same information as the input dictionary, it is just a different representation that we can feed into Clingo under the hood.

data_schema_facts = drc.dict_to_facts(data_schema)
pprint(data_schema_facts)
[
  "attribute(number_rows,root,1461).",
  "entity(field,root,0).",
  "attribute((field,name),0,date).",
  "attribute((field,type),0,datetime).",
  "attribute((field,unique),0,1461).",
  "attribute((field,entropy),0,7287).",
  "entity(field,root,1).",
  "attribute((field,name),1,precipitation).",
  "attribute((field,type),1,number).",
  "attribute((field,unique),1,111).",
  "attribute((field,entropy),1,2422).",
  "attribute((field,min),1,0).",
  "attribute((field,max),1,55).",
  "attribute((field,std),1,6).",
  "entity(field,root,2).",
  "attribute((field,name),2,temp_max).",
  "attribute((field,type),2,number).",
  "attribute((field,unique),2,67).",
  "attribute((field,entropy),2,3934).",
  "attribute((field,min),2,-1).",
  "attribute((field,max),2,35).",
  "attribute((field,std),2,7).",
  "entity(field,root,3).",
  "attribute((field,name),3,temp_min).",
  "attribute((field,type),3,number).",
  "attribute((field,unique),3,55).",
  "attribute((field,entropy),3,3596).",
  "attribute((field,min),3,-7).",
  "attribute((field,max),3,18).",
  "attribute((field,std),3,5).",
  "entity(field,root,4).",
  "attribute((field,name),4,wind).",
  "attribute((field,type),4,number).",
  "attribute((field,unique),4,79).",
  "attribute((field,entropy),4,3950).",
  "attribute((field,min),4,0).",
  "attribute((field,max),4,9).",
  "attribute((field,std),4,1).",
  "entity(field,root,5).",
  "attribute((field,name),5,weather).",
  "attribute((field,type),5,string).",
  "attribute((field,unique),5,5).",
  "attribute((field,entropy),5,1201).",
  "attribute((field,freq),5,714)."
]

Iterating the partial specification query#

Generating recommendations from a minimal input

We start by defining input_spec_base which is a list of facts including the data schema, a single view and a single mark. This is the minimal set of facts that Draco needs to generate recommendations which can be rendered into charts.

We instantiate a Draco object, using the default knowledge base, and an AltairRenderer object which will be used to render the recommendations into Vega-Lite charts.

from draco.renderer import AltairRenderer

input_spec_base = data_schema_facts + [
    "entity(view,root,v0).",
    "entity(mark,v0,m0).",
]
d = drc.Draco()
renderer = AltairRenderer()

We can now use the complete_spec method of the Draco object to generate recommendations from incomplete specifications. The function below is a reusable utility for this example, responsible for generating, rendering and displaying the recommendations.

def recommend_charts(
    spec: list[str], draco: drc.Draco, num: int = 5, labeler=lambda i: f"CHART {i+1}"
) -> dict[str, tuple[list[str], dict]]:
    # Dictionary to store the generated recommendations, keyed by chart name
    chart_specs = {}
    for i, model in enumerate(draco.complete_spec(spec, num)):
        chart_name = labeler(i)
        spec = drc.answer_set_to_dict(model.answer_set)
        chart_specs[chart_name] = drc.dict_to_facts(spec), spec

        print(chart_name)
        print(f"COST: {model.cost}")
        chart = renderer.render(spec=spec, data=df)
        # Adjust column-faceted chart size
        if (
            isinstance(chart, alt.FacetChart)
            and chart.facet.column is not alt.Undefined
        ):
            chart = chart.configure_view(continuousWidth=130, continuousHeight=130)
        display(chart)

    return chart_specs

We are using input_spec_base as the starting point for our exploration, that is, we are only specifying the data schema, and that we want the recommendations to have at least one view and one mark.

input_spec = input_spec_base
initial_recommendations = recommend_charts(spec=input_spec, draco=d)
CHART 1
COST: [3]
CHART 2
COST: [4]
CHART 3
COST: [4]
CHART 4
COST: [4]
CHART 5
COST: [5]

While the above recommendations are valid, they are not very diverse. We can also observe that the first two recommendations are represented by seemingly identical Vega-Lite specifications, however, they have different costs. We explore this behavior below, by inspecting the Draco specification of the first two charts.

chart_1_key, chart_2_key = "CHART 1", "CHART 2"
(_, chart_1), (_, chart_2) = (
    initial_recommendations[chart_1_key],
    initial_recommendations[chart_2_key],
)

md(f"**Draco Specification of {chart_1_key}**")
pprint(chart_1)

md(f"**Draco Specification of {chart_2_key}**")
pprint(chart_2)

Draco Specification of CHART 1

{
  "number_rows": 1461,
  "task": "summary",
  "field": [
    {
      "name": "date",
      "type": "datetime",
      "unique": 1461,
      "entropy": 7287
    },
    {
      "name": "precipitation",
      "type": "number",
      "unique": 111,
      "entropy": 2422,
      "min": 0,
      "max": 55,
      "std": 6
    },
    {
      "name": "temp_max",
      "type": "number",
      "unique": 67,
      "entropy": 3934,
      "min": -1,
      "max": 35,
      "std": 7
    },
    {
      "name": "temp_min",
      "type": "number",
      "unique": 55,
      "entropy": 3596,
      "min": -7,
      "max": 18,
      "std": 5
    },
    {
      "name": "wind",
      "type": "number",
      "unique": 79,
      "entropy": 3950,
      "min": 0,
      "max": 9,
      "std": 1
    },
    {
      "name": "weather",
      "type": "string",
      "unique": 5,
      "entropy": 1201,
      "freq": 714
    }
  ],
  "view": [
    {
      "coordinates": "cartesian",
      "mark": [
        {
          "type": "bar",
          "encoding": [
            {
              "channel": "x",
              "aggregate": "count"
            }
          ]
        }
      ],
      "scale": [
        {
          "type": "linear",
          "channel": "x",
          "zero": "true"
        }
      ]
    }
  ]
}

Draco Specification of CHART 2

{
  "number_rows": 1461,
  "task": "summary",
  "field": [
    {
      "name": "date",
      "type": "datetime",
      "unique": 1461,
      "entropy": 7287
    },
    {
      "name": "precipitation",
      "type": "number",
      "unique": 111,
      "entropy": 2422,
      "min": 0,
      "max": 55,
      "std": 6
    },
    {
      "name": "temp_max",
      "type": "number",
      "unique": 67,
      "entropy": 3934,
      "min": -1,
      "max": 35,
      "std": 7
    },
    {
      "name": "temp_min",
      "type": "number",
      "unique": 55,
      "entropy": 3596,
      "min": -7,
      "max": 18,
      "std": 5
    },
    {
      "name": "wind",
      "type": "number",
      "unique": 79,
      "entropy": 3950,
      "min": 0,
      "max": 9,
      "std": 1
    },
    {
      "name": "weather",
      "type": "string",
      "unique": 5,
      "entropy": 1201,
      "freq": 714
    }
  ],
  "view": [
    {
      "coordinates": "cartesian",
      "mark": [
        {
          "type": "bar",
          "encoding": [
            {
              "channel": "y",
              "aggregate": "count"
            }
          ]
        }
      ],
      "scale": [
        {
          "zero": "true",
          "channel": "y",
          "type": "linear"
        }
      ]
    }
  ]
}

Taking a good look at the specifications above, we can see that they only differ by their "task" attribute value. CHART 1 has "task": "summary", while CHART 2 has "task": "value". Thanks to the constraints in the default Draco knowledge base, the logical solver assigns slightly different costs to the two specifications. However, since the two charts use the same fields, scales, marks and encodings, the actual Vega-Lite specifications of the different Draco specifications are identical.

We can extend the input specification to better specify the design space we want to see recommendations for, to get more diverse results. Let’s say, we want the fields date and temp_max of the weather dataset to be encoded in the charts. Also, we specify that we want the chart to be a faceted chart. Note that we are not specifying the mark type, the encoding channels for the fields nor for the facet. We leave this to Draco to decide, based on its underlying knowledge base.

input_spec = input_spec_base + [
    # We want to encode the `date` field
    "entity(encoding,m0,e0).",
    "attribute((encoding,field),e0,date).",
    # We want to encode the `temp_max` field
    "entity(encoding,m0,e1).",
    "attribute((encoding,field),e1,temp_max).",
    # We want the chart to be a faceted chart
    "entity(facet,v0,f0).",
    "attribute((facet,channel),f0,col).",
]
recommendations = recommend_charts(spec=input_spec, draco=d, num=5)
CHART 1
COST: [16]
CHART 2
COST: [16]
CHART 3
COST: [17]
CHART 4
COST: [17]
CHART 5
COST: [17]

Inspecting the Knowledge Base#

Debugging the recommendations

We can use the DracoDebug class to investigate the recommendations generated by Draco and whether they violate any of the soft constraints. We start by instantiating a DracoDebug object, passing the recommendations and the Draco object used to generate them. A DataFrame is returned, containing the recommendations and the soft constraints that they violate as well as the weights associated with each constraint.

# Parameterized helper to avoid code duplication as we iterate on designs
def display_debug_data(draco: drc.Draco, specs: dict[str, tuple[list[str], dict]]):
    debugger = drc.DracoDebug(
        specs={chart_name: fact_spec for chart_name, (fact_spec, _) in specs.items()},
        draco=draco,
    )
    chart_preferences = debugger.chart_preferences
    display(Markdown("**Raw debug data**"))
    display(chart_preferences.head())

    display(Markdown("**Number of violated preferences**"))
    num_violations = len(
        set(chart_preferences[chart_preferences["count"] != 0]["pref_name"])
    )
    num_all = len(set(chart_preferences["pref_name"]))
    display(
        Markdown(
            f"*{num_violations} preferences are violated out of a total of {num_all} preferences (soft constraints)*"
        )
    )

    display(
        Markdown(
            "Using `DracoDebugPlotter` to visualize the debug `DataFrame` produced by `DracoDebug`:"
        )
    )
    plotter = drc.DracoDebugPlotter(chart_preferences)
    plot_size = (600, 300)
    chart = plotter.create_chart(
        cfg=drc.DracoDebugChartConfig.SORT_BY_COUNT_SUM,
        violated_prefs_only=True,
        plot_size=plot_size,
    )
    display(chart)
display_debug_data(draco=d, specs=recommendations)

Raw debug data

chart_name pref_name pref_description count weight
0 CHART 1 cartesian_coordinate Cartesian coordinates. 1 0
1 CHART 1 summary_point Point mark for summary tasks. 1 0
2 CHART 1 linear_y Linear scale with y channel. 1 0
3 CHART 1 linear_x Linear scale with x channel. 1 0
4 CHART 1 c_c_point Continuous by continuous for point mark. 1 0

Number of violated preferences

18 preferences are violated out of a total of 147 preferences (soft constraints)

Using DracoDebugPlotter to visualize the debug DataFrame produced by DracoDebug:

Generating Input Specifications Programmatically#

Exploring more possibilities within the design space

To get a better impression of the space of possible visualizations and to produce examples that might be covered by more soft constraints, we can programmatically generate further input specifications. We define a list of possible values for the mark type, fields and encoding channels that we want to be used in the recommendations and combine them using a nested list comprehension. We also filter out designs with less than 3 encodings and exclude multi-layer designs for now.

We set off by creating the helper function rec_from_generated_spec to avoid code duplication as we iterate on designs.

def rec_from_generated_spec(
    marks: list[str],
    fields: list[str],
    encoding_channels: list[str],
    draco: drc.Draco,
    num: int = 1,
) -> dict[str, dict]:
    input_specs = [
        (
            (mark, field, enc_ch),
            input_spec_base
            + [
                f"attribute((mark,type),m0,{mark}).",
                "entity(encoding,m0,e0).",
                f"attribute((encoding,field),e0,{field}).",
                f"attribute((encoding,channel),e0,{enc_ch}).",
                # filter out designs with less than 3 encodings
                ":- {entity(encoding,_,_)} < 3.",
                # exclude multi-layer designs
                ":- {entity(mark,_,_)} != 1.",
            ],
        )
        for mark in marks
        for field in fields
        for enc_ch in encoding_channels
    ]
    recs = {}
    for cfg, spec in input_specs:

        def labeler(i):
            f"CHART {i + 1} ({' | '.join(cfg)})"

        recs = recs | recommend_charts(spec=spec, draco=draco, num=num, labeler=labeler)

    return recs
recommendations = rec_from_generated_spec(
    marks=["point", "bar", "line", "rect"],
    fields=["weather", "temp_min", "date"],
    encoding_channels=["color", "shape", "size"],
    draco=d,
)
None
COST: [25]
None
COST: [28]
None
COST: [30]
None
COST: [27]
None
COST: [41]
None
COST: [28]
None
COST: [42]
None
COST: [19]
None
COST: [25]
None
COST: [27]
None
COST: [28]
None
COST: [45]
None
COST: [47]
None
COST: [48]
None
COST: [71]
None
COST: [39]
None
COST: [40]

It is no secret that some of the above recommendations are not very useful when it comes to communicating the data. Nevertheless, they are valid visualizations from the space of possibilities. Following the already introduced workflow, we can use DracoDebug to investigate the soft constraint violations of the generated recommendations. If there are recommendations we are not happy with, we can extend the knowledge base to cover them so that they do not appear in the future.

display_debug_data(draco=d, specs=recommendations)

Raw debug data

chart_name pref_name pref_description count weight
0 None cartesian_coordinate Cartesian coordinates. 1 0
1 None summary_rect Rect mark for summary tasks. 1 0
2 None ordinal_y Ordinal scale with y channel. 1 0
3 None ordinal_x Ordinal scale with x channel. 1 1
4 None linear_color Linear scale with color channel. 1 10

Number of violated preferences

17 preferences are violated out of a total of 147 preferences (soft constraints)

Using DracoDebugPlotter to visualize the debug DataFrame produced by DracoDebug:

Adjusting the Knowledge Base#

Filtering out suboptimal designs by creating a new soft constraint and tuning its weight

As apparent from the above-generated recommendations, there are some visualizations that are valid but not as expressive as we would desire. As a concrete example, the recommendations CHART 1 (rect | weather | color) and CHART 1 (rect | date | color) used row-faceting and no rules in our knowledge base penalised them for doing so.

We demonstrate how we can extend the knowledge base with a design rule (soft constraint) to discourage using faceting with rect mark and color encoding and how to tune its weight to achieve more desirable recommendations.

We start by creating the helper function draco_with_updated_kb, to return a Draco instance with the updated knowledge base. We extend the knowledge base with a new preference (soft constraint) called rect_color_facet to discourage faceting with rect mark and color encoding. To explore how the recommendations change as we assign different weights to the new soft constraint, we parameterize this function to accept pref_weight as an argument. This weight will be associated with the rect_color_facet preference we extend the knowledge base with.

def draco_with_updated_kb(pref_weight: int) -> drc.Draco:
    # Custom soft constraint to discourage faceting with rect mark and color encoding
    rect_color_facet_pref = """
    % @soft(rect_color_facet) Faceting with rect mark and color encoding.
    preference(rect_color_facet,Fa) :-
        attribute((mark,type),_,rect),
        attribute((encoding,channel),_,color),
        attribute((facet,channel),Fa,_).
    """.strip()
    rect_color_facet_pref_weight = pref_weight

    # Update the default soft constraint knowledge base (program)
    soft_updated = drc.Draco().soft + f"\n\n{rect_color_facet_pref}\n\n"
    # Assign the weight to the new soft constraint
    weights_updated = drc.Draco().weights | {
        "rect_color_facet_weight": rect_color_facet_pref_weight
    }
    return drc.Draco(soft=soft_updated, weights=weights_updated)

As opposed to the previous example, we only generate specifications for the rect mark, the weather and date fields and the color encoding channel, since we observed the undesired faceted recommendations for these configurations. We explore how the recommendations change as we assign a higher weight value (that is, a higher penalty) to the new soft constraint.

Verifying That the Knowledge Base Got Updated#

First, to validate that the rect_color_facet soft constraint we created got registered properly to our knowledge base we start with a weight of 0. We expect to obtain the same, faceted recommendations, but we also expect to see in the plot created in display_debug_data by DracoDebugPlotter that the faceted recommendations violate the design preference we defined.

weight = 0
display(Markdown(f"**Weight for `rect_color_facet` preference: {weight}**"))
updated_draco = draco_with_updated_kb(pref_weight=weight)
recommendations = rec_from_generated_spec(
    marks=["rect"],
    fields=["weather", "date"],
    encoding_channels=["color"],
    draco=updated_draco,
)
display_debug_data(draco=updated_draco, specs=recommendations)

Weight for rect_color_facet preference: 0

None
COST: [71]
None
COST: [40]

Raw debug data

chart_name pref_name pref_description count weight
0 None rect_color_facet Faceting with rect mark and color encoding. 1 0
1 None cartesian_coordinate Cartesian coordinates. 1 0
2 None summary_rect Rect mark for summary tasks. 1 0
3 None ordinal_y Ordinal scale with y channel. 1 0
4 None ordinal_x Ordinal scale with x channel. 1 1

Number of violated preferences

18 preferences are violated out of a total of 148 preferences (soft constraints)

Using DracoDebugPlotter to visualize the debug DataFrame produced by DracoDebug:

As expected, our debug plot indicates in the heatmap’s 6th column that both CHART 1 (rect | weather | color) and CHART 1 (rect | date | color) violate the rect_color_facet preference we introduced. Now we can work on tuning the weight associated with this rule, so that we actually penalize the usage of faceting when we have a rect mark and color as the encoding channel.

Weight Tuning#

We increase the weight from 0 to 10 and by doing so we expect that this penalty will be sufficient for the Clingo solver the find a model with a lower cost, not violating the rect_color_facet design rule we extended our knowledge base with.

weight = 10
display(Markdown(f"**Weight for `rect_color_facet` preference: {weight}**"))
updated_draco = draco_with_updated_kb(pref_weight=weight)
recommendations = rec_from_generated_spec(
    marks=["rect"],
    fields=["weather", "date"],
    encoding_channels=["color"],
    draco=updated_draco,
)
display_debug_data(draco=updated_draco, specs=recommendations)

Weight for rect_color_facet preference: 10

None
COST: [73]
None
COST: [42]

Raw debug data

chart_name pref_name pref_description count weight
0 None cartesian_coordinate Cartesian coordinates. 1 0
1 None summary_rect Rect mark for summary tasks. 1 0
2 None aggregate_max Max as aggregate op. 1 4
3 None ordinal_y Ordinal scale with y channel. 1 0
4 None ordinal_x Ordinal scale with x channel. 1 1

Number of violated preferences

15 preferences are violated out of a total of 148 preferences (soft constraints)

Using DracoDebugPlotter to visualize the debug DataFrame produced by DracoDebug:

Just as expected, thanks to the higher weight assigned to the newly added rect_color_facet rule, we don’t see recommendations using faceting when a rect mark and color encoding is used. One can use the very same process to tailor the knowledge base and fine-tune the constraint weights to obtain more expressive visualization recommendations.