Operationalize your embeddings with one simple tool.

Experience a comprehensive database designed to provide embedding functionality that, until now, required multiple platforms. Elevate your machine learning quickly and painlessly through Embeddinghub.

The Embeddinghub workflow

Embeddings are dense, numerical representations of real-world objects and relationships, expressed as vectors. They are often created by first defining a supervised machine learning problem, known as a "surrogate problem." Embeddings intend to capture the semantics of the inputs they were derived from, subsequently getting shared and reused for improved learning across machine learning models. Embeddinghub lets you achieve this in a streamlined, intuitive way.

Durable storage with precise management

High-availability storage with total control over versioning, access, and painless rollback capability.

Powerful embedding operations

Partitioning, sub-indices, averaging, and more enabled.

Nearest neighbor approximation

Achieve high-similarity recommendations using the computationally efficient HNSW algorithm.

Where embedding makes a difference

When it comes to natural language processing, recognizing context and intent is essential. Embeddings (like BERT) allow applications such as search engines to tokenize words, index them, and then analyze their vectors, comparing relevance and interdependence.

Modern recommender systems commonly use collaborative filtering when training their models, a system that relies on embeddings. Applying the SVD method, a user embedding is multiplied by an item embedding to generate a rating prediction.

Computer vision enormously benefits from embeddings. When training self-driving cars, images can be translated into embeddings and considered in context, in a process known as transfer learning. Taking a generated image from a game like Grand Theft Auto, turning it into an embedding in the same vector space, and training a driving model without using thousands of expensive, real-world images is exactly what companies like Tesla do today.

blue arrow pointing left
blue arrow pointing right

Current architecture

Embeddinghub's Alpha version is exclusively for single-node configurations. It uses RocksDB to durably store embeddings and metadata, taking advantage of HNSWLib to build approximate nearest neighbor indices. The related Python client also has the ability to use HNSWLib for building local embeddings but does not currently handle durable storage. Embeddinghub's server communicates via gRPC, with a proto service file accessible here. All metadata is also stored in serialized protobuf form, as defined here.

Ready to get started?

See what a virtual feature store means for your organization.