Study: Comparing high-dimensional embeddings

Published on

This study is no longer recruiting. During September 2021, we are conducting a follow-up study that you can read more about here.

We are building a tool for people who work with high-dimensional embedding spaces, a type of data common to many fields such as natural language processing, computer vision, and computational biology. In order to establish how embedding users understand and use their data, we are interviewing embedding model experts across these domains - for anyone who has worked extensively with embedding models in a recent project, we would love to hear about your experiences!

Many machine learning models work by generating a “latent space” which maps out (or “embeds”) the input data to better perform a downstream task, such as classification. Visualizing these embedding spaces is an important step to make sure that the model has learned the desired attributes (e.g. correctly separating dogs from cats, or cancer cells from non-cancer cells). However, most existing visualizations are static and are quite difficult to compare from one model to another. We’re building a tool that helps to visually compare and inspect multiple embedding spaces, opening up new possibilities for exploratory analysis. In order to establish how embedding users understand and use their data, we are interviewing embedding model experts across several fields to understand the approaches they take to understanding embedding spaces. After that, we plan to expand on our system and invite some of these experts to try the system on their own datasets.

Participate