retrieval package¶

Submodules¶

retrieval.basics module¶

class retrieval.basics.Document(page_content: str, metadata: Dict | None = None)[source]¶

Bases: object

A simple class to hold text and metadata.

pretty_print(indent: int = 0)[source]¶

Return a formatted string representation of the Document instance with optional indentation. :param indent: The number of spaces to indent the output. Defaults to 0. :type indent: int

Returns:: A formatted string representation of the Document.
Return type:: str

class retrieval.basics.VectorStore[source]¶

Bases: object

Base class for vector stores.

add_texts(texts: List[str], metadatas: List[Dict] | None = None) → List[str][source]¶

Add texts to the vector store with optional metadata.

Parameters:

texts (List[str]) – A list of texts to add.
metadatas (Optional[List[Dict]]) – A list of metadata dictionaries corresponding to the texts. Defaults to None.

Returns:

A list of IDs or keys associated with the added texts.

Return type:

List[str]

similarity_search(query: str, k: int = 4) → List[Document][source]¶

Perform a similarity search for the given query string.

Parameters:

query (str) – The query string to search for.
k (int) – The number of results to return. Defaults to 4.

Returns:

A list of Document instances that are most similar to the query.

Return type:

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4) → List[Document][source]¶

Perform a similarity search using a precomputed embedding vector.

Parameters:

embedding (List[float]) – The embedding vector to search with.
k (int) – The number of results to return. Defaults to 4.

Returns:

A list of Document instances that are most similar to the embedding.

Return type:

List[Document]

retrieval.pgvector_store module¶

class retrieval.pgvector_store.CollectionStore(**kwargs)[source]¶

Bases: Base

Represents a collection in the database.

uuid¶

Primary key for the collection.

Type:: UUID

name¶

Name of the collection.

Type:: str

cmetadata¶

Metadata associated with the collection.

Type:: dict

embeddings¶

List of embeddings associated with the collection.

Type:: List[EmbeddingStore]

cmetadata¶

embeddings¶

name¶

uuid¶

class retrieval.pgvector_store.EmbeddingStore(**kwargs)[source]¶

Bases: Base

Represents an embedding in the database.

uuid¶

Primary key for the embedding.

Type:: UUID

collection_id¶

Foreign key referencing the collection.

Type:: UUID

collection¶

Collection associated with the embedding.

Type:: CollectionStore

embedding¶

The embedding vector.

Type:: Vector

document¶

The document associated with the embedding.

Type:: str

cmetadata¶

Metadata associated with the embedding.

Type:: dict

custom_id¶

Custom ID for the embedding.

Type:: str

cmetadata¶

collection¶

collection_id¶

custom_id¶

document¶

embedding¶

uuid¶

class retrieval.pgvector_store.PGVector(connection_string: str, embedding: Embeddings, collection_name: str = 'vectorsearch', pool_size: int = 5, **kwargs)[source]¶

Bases: VectorStore

A vector store implementation using PostgreSQL and pgvector.

_engine¶

SQLAlchemy engine for database connection.

Type:: sqlalchemy.engine.Engine

_Session¶

SQLAlchemy session maker.

Type:: sqlalchemy.orm.sessionmaker

_embedding¶

Embedding model used for generating embeddings.

Type:: Embeddings

_collection_name¶

Name of the collection in the database.

Type:: str

_collection¶

The collection associated with this vector store.

Type:: CollectionStore

add_texts(texts: List[str], metadatas: List[Dict] | None = None) → List[str][source]¶

Add texts to the vector store.

Parameters:

texts (List[str]) – The texts to add.
metadatas (Optional[List[Dict]], optional) – Metadata for each text. Defaults to None.

Returns:

The IDs of the added texts.

Return type:

List[str]

add_texts_with_embeddings(texts: List[str], embeddings: List[List[float]], metadatas: List[Dict] | None = None) → List[str][source]¶

Add texts with precomputed embeddings to the vector store.

Parameters:

texts (List[str]) – The texts to add.
embeddings (List[List[float]]) – Precomputed embeddings.
metadatas (Optional[List[Dict]], optional) – Metadata for each text. Defaults to None.

Returns:

The IDs of the added texts.

Return type:

List[str]

create_collection() → None[source]¶: Create a collection in the database.

create_tables_if_not_exists() → None[source]¶: Create tables in the database if they don’t exist.

delete_by_ids(ids: List[str])[source]¶: Delete documents by their IDs.

get_all_collection_metadata() → Dict[source]¶

Get all metadata from the collection.

Returns:: A dictionary containing all metadata.
Return type:: Dict

get_collection_metadata(key: str) → str | None[source]¶

Get metadata from the collection.

Parameters:: key (str) – Metadata key.
Returns:: The metadata value if it exists, otherwise None.
Return type:: Optional[str]

set_collection_metadata(key: str, value: str)[source]¶

Set metadata for the collection.

Parameters:

key (str) – Metadata key.
value (str) – Metadata value.

similarity_search(query: str, k: int = 4) → Tuple[List[Document], List[float]][source]¶

Perform a similarity search for a query.

Parameters:

query (str) – The query to search for.
k (int, optional) – The number of results to return. Defaults to 4.

Returns:

A tuple containing the list of documents that match the query and their corresponding similarity scores.

Return type:

Tuple[List[Document], List[float]]

similarity_search_by_vector(embedding: List[float], k: int = 4, distance_metric: str = 'cosine') → Tuple[List[Document], List[float]][source]¶

Perform a similarity search by vector with configurable distance metrics.

Parameters:

embedding (List[float]) – The embedding vector to search with.
k (int, optional) – The number of results to return. Defaults to 4.
distance_metric (str, optional) – The distance metric to use. Defaults to “l2”. Options are “l2” and “cosine”. see https://github.com/pgvector/pgvector?tab=readme-ov-file#querying for more details.

Returns:

Documents and similarity scores.

Return type:

Tuple[List[Document], List[float]]

retrieval.sqlitevec_store module¶

class retrieval.sqlitevec_store.SQLiteVec(table: str, db_file: str = 'vec.db', pool_size: int = 5, embedding: Embeddings | None = None)[source]¶

Bases: VectorStore

SQLite with Vec extension as a vector database.

add_texts(texts: List[str], metadatas: List[Dict] | None = None) → List[str][source]¶

Add texts to the vector store. :param texts: The list of texts to add. :type texts: List[str] :param metadatas: The list of metadata dictionaries. Defaults to None. :type metadatas: Optional[List[Dict]], optional

Returns:: The list of row IDs for the added texts.
Return type:: List[str]
Raises:: sqlite3.Error – If the addition of texts fails.

add_texts_with_embeddings(texts: List[str], embeddings: List[List[float]], metadatas: List[Dict] | None = None) → List[str][source]¶

Add texts with precomputed embeddings to the vector store.

Parameters:

texts (List[str]) – The list of texts to add.
embeddings (List[List[float]]) – The list of precomputed embeddings.
metadatas (Optional[List[Dict]], optional) – The list of metadata dictionaries. Defaults to None.

Returns:

The list of row IDs for the added texts.

Return type:

List[str]

Raises:

sqlite3.Error – If the addition of texts fails.

create_metadata_table()[source]¶: Create metadata table if not exists

create_table()[source]¶: Create the main table and the virtual table.

create_table_if_not_exists()[source]¶

Create tables if they don’t exist.

Raises:: sqlite3.Error – If the table creation fails.

delete_by_ids(ids: List[str])[source]¶: Delete documents by their row IDs.

drop_table()[source]¶: Drop the main table and the virtual table if they exist.

get_dimensionality() → int[source]¶

Get the dimensionality of the embeddings.

Returns:: The dimensionality of the embeddings.
Return type:: int

get_metadata(key: str) → str | None[source]¶

Get metadata value by key. :param key: The key to retrieve the metadata value for. :type key: str

Returns:: The metadata value if found, otherwise None.
Return type:: Optional[str]

static serialize_f32(vector: List[float]) → bytes[source]¶

Serialize a list of floats into bytes.

Parameters:: vector (List[float]) – The list of floats to serialize.
Returns:: The serialized bytes.
Return type:: bytes

set_metadata(key: str, value: str)[source]¶: Set metadata key-value pair. :param key: The key to set. :type key: str :param value: The value to associate with the key. :type value: str

similarity_search(query: str, k: int = 4) → Tuple[List[Document], List[float]][source]¶

Perform a similarity search.

Parameters:

query (str) – The query string.
k (int, optional) – The number of results to return. Defaults to 4.

Returns:

A tuple containing the list of documents that match the query and their corresponding similarity scores.

Return type:

Tuple[List[Document], List[float]]

Raises:

Exception – If the similarity search fails.

similarity_search_by_vector(embedding: List[float], k: int = 4, distance_metric: str = 'cosine') → Tuple[List[Document], List[float]][source]¶

Perform a similarity search by vector with configurable distance metrics.

Parameters:

embedding (List[float]) – The embedding vector to search with.
k (int, optional) – The number of results to return. Defaults to 4.
distance_metric (str, optional) – Distance metric to use. Supported: ‘l2’ (Euclidean), ‘cosine’. Defaults to “l2”. see https://alexgarcia.xyz/sqlite-vec/api-reference.html#distance for more details.

Returns:

Documents and similarity scores.

Return type:

Tuple[List[Document], List[float]]

retrieval package¶

Submodules¶

retrieval.basics module¶

retrieval.pgvector_store module¶

retrieval.sqlitevec_store module¶

Module contents¶