UI Configuration
You can pass arguments to the run
command to configure the Texture interface.
data: pd.DataFrame
: The dataframe to parse and visualize.schema: DatasetSchema
: a dataset schema (calculated automatically if none provided)load_tables: Dict[str, pd.DataFrame]
: A dictionary of tables to load.create_new_embedding_func
: A function that takes a string and returns a vector embedding
If only data
is provided, Texture will automatically calculate the schema and load tables. The schema and tables can also be calculated manually by calling the preprocess
function then passed to run
:
# automatically calculate schema and tables
texture.run(df)
# OR manually preprocess
schema, load_tables = texture.preprocess(df)
DatasetSchema
The schema is a DatasetSchema
object that describes the columns, types, and tables in the dataset. These column definitions are used to determine how the data will be visualized in the interface. Each schema has the following fields:
name
: The name of the dataset.columns
: A list ofColumn
objects that describe the columns in the dataset.primary_key
: AColumn
object that describes the primary key of the dataset.has_embeddings
: Whether the dataset has embeddings or not. If so, must contain a column named"vector"
with embedding arrayshas_projection
: Whether the dataset has projections or not. If so, must contain columns named"umap_x"
and"umap_y"
with 2d projection coordinates.
Column
Column
definition contains:
name
: The name of the column as it appears in the dataframetype
: The type of the column. Options aretext
,number
,categorical
, andboolean
.derivedSchema
: ADerivedSchema
object that describes how the column is derived from another table. Options are:is_segment
: Whether the column is a segment (e.g. word, phrase) or not (e.g. author, keyword).table_name
: The name of the table the column is in.derived_from
: The column in the main table that the column is derived from.derived_how
: How the column is derived from the derived table.
Embedding Functions
To support similarity search, Texture allows you to pass a function that will create a new embedding for a given string. This allows you to create embeddings on the fly for query strings. We recommend pre-computing embeddings for your data and passing them to Texture in the vector
column and then using the same model in this function to create embeddings for query strings.
For example, the following code creates a new embedding for a given string using the sentence-transformers
library:
def get_embedding(value: str):
import sentence_transformers
model = sentence_transformers.SentenceTransformer("all-mpnet-base-v2")
e = model.encode(value)
return e
Example Usage
See the demos for examples of how to configure the schema, columns, and embedding functions.