Read Data and Generate the Schema#

Here, we will cover how to load data and use inferred statistics in Draco.

Available functions#

The main functions allow you to get the schema from a Pandas dataframe or a file. These functions return a schema as a dictionary, which you can encode as Answer Set Programming facts using our generic dict_to_facts encoder.

draco.schema.schema_from_dataframe(df, parse_data_type=dtype_to_field_type)#

Read schema information from the given Pandas dataframe.

  • df (DataFrame) – DataFrame to generate schema for.

  • parse_data_type – Function to parse data types.

Return type:



A dictionary representing the schema.

draco.schema.schema_from_file(file_path, parse_data_type=dtype_to_field_type)#

Read schema information from the given CSV or JSON file.

  • file_path (Path) – Path to CSV or JSON file.

  • parse_data_type – Function to parse data types.


ValueError – If the file has an unsupported data type.

Return type:



A dictionary representing the schema.

Usage Example#

from draco import dict_to_facts, schema_from_dataframe

In this example, we use a weather dataset from Vega datasets but this could be any Pandas dataframe.

from vega_datasets import data

df = data.seattle_weather()

We can then call schema_from_dataframe to get schema information from the pandas dataframe. The schema information is a dictionary.

schema = schema_from_dataframe(df)
{'number_rows': 1461,
 'field': [{'name': 'date',
   'type': 'datetime',
   'unique': 1461,
   'entropy': 7287},
  {'name': 'precipitation',
   'type': 'number',
   'unique': 111,
   'entropy': 2422,
   'min': 0,
   'max': 55,
   'std': 6},
  {'name': 'temp_max',
   'type': 'number',
   'unique': 67,
   'entropy': 3934,
   'min': -1,
   'max': 35,
   'std': 7},
  {'name': 'temp_min',
   'type': 'number',
   'unique': 55,
   'entropy': 3596,
   'min': -7,
   'max': 18,
   'std': 5},
  {'name': 'wind',
   'type': 'number',
   'unique': 79,
   'entropy': 3950,
   'min': 0,
   'max': 9,
   'std': 1},
  {'name': 'weather',
   'type': 'string',
   'unique': 5,
   'entropy': 1201,
   'freq': 714}]}

We can then convert the schema dictionary into facts that Dracos constraint solver can use with dict_to_facts. The function returns a list of facts. The solver will be able to parse these facts and consider them in the recommendation process.
