retrieval package

Submodules

retrieval.basics module

class retrieval.basics.Document(page_content: str, metadata: Dict | None = None)[source]

Bases: object

A simple class to hold text and metadata.

pretty_print(indent: int = 0)[source]

Return a formatted string representation of the Document instance with optional indentation. :param indent: The number of spaces to indent the output. Defaults to 0. :type indent: int

Returns:

A formatted string representation of the Document.

Return type:

str

class retrieval.basics.Embeddings[source]

Bases: object

Base class for embedding models.

embed_documents(texts: List[str] | str | Document | List[Document]) List[List[float]][source]

Generate embeddings for a list of texts, a single text, a Document, or a list of Documents.

Parameters:

texts – Can be a list of strings, a single string, a Document, or a list of Documents.

Returns:

A list of embeddings, where each embedding is a list of floats.

Return type:

List[List[float]]

embed_query(text: str) List[float][source]

Generate an embedding for a single query text.

Parameters:

text (str) – The text to embed.

Returns:

The embedding vector.

Return type:

List[float]

class retrieval.basics.VectorStore[source]

Bases: object

Base class for vector stores.

add_texts(texts: List[str], metadatas: List[Dict] | None = None) List[str][source]

Add texts to the vector store with optional metadata.

Parameters:
  • texts (List[str]) – A list of texts to add.

  • metadatas (Optional[List[Dict]]) – A list of metadata dictionaries corresponding to the texts. Defaults to None.

Returns:

A list of IDs or keys associated with the added texts.

Return type:

List[str]

Perform a similarity search for the given query string.

Parameters:
  • query (str) – The query string to search for.

  • k (int) – The number of results to return. Defaults to 4.

Returns:

A list of Document instances that are most similar to the query.

Return type:

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4) List[Document][source]

Perform a similarity search using a precomputed embedding vector.

Parameters:
  • embedding (List[float]) – The embedding vector to search with.

  • k (int) – The number of results to return. Defaults to 4.

Returns:

A list of Document instances that are most similar to the embedding.

Return type:

List[Document]

retrieval.pgvector_store module

class retrieval.pgvector_store.CollectionStore(**kwargs)[source]

Bases: Base

Represents a collection in the database.

uuid

Primary key for the collection.

Type:

UUID

name

Name of the collection.

Type:

str

cmetadata

Metadata associated with the collection.

Type:

dict

embeddings

List of embeddings associated with the collection.

Type:

List[EmbeddingStore]

cmetadata
embeddings
name
uuid
class retrieval.pgvector_store.EmbeddingStore(**kwargs)[source]

Bases: Base

Represents an embedding in the database.

uuid

Primary key for the embedding.

Type:

UUID

collection_id

Foreign key referencing the collection.

Type:

UUID

collection

Collection associated with the embedding.

Type:

CollectionStore

embedding

The embedding vector.

Type:

Vector

document

The document associated with the embedding.

Type:

str

cmetadata

Metadata associated with the embedding.

Type:

dict

custom_id

Custom ID for the embedding.

Type:

str

cmetadata
collection
collection_id
custom_id
document
embedding
uuid
class retrieval.pgvector_store.PGVector(connection_string: str, embedding: Embeddings, collection_name: str = 'vectorsearch', pool_size: int = 5, **kwargs)[source]

Bases: VectorStore

A vector store implementation using PostgreSQL and pgvector.

_engine

SQLAlchemy engine for database connection.

Type:

sqlalchemy.engine.Engine

_Session

SQLAlchemy session maker.

Type:

sqlalchemy.orm.sessionmaker

_embedding

Embedding model used for generating embeddings.

Type:

Embeddings

_collection_name

Name of the collection in the database.

Type:

str

_collection

The collection associated with this vector store.

Type:

CollectionStore

add_texts(texts: List[str], metadatas: List[Dict] | None = None) List[str][source]

Add texts to the vector store.

Parameters:
  • texts (List[str]) – The texts to add.

  • metadatas (Optional[List[Dict]], optional) – Metadata for each text. Defaults to None.

Returns:

The IDs of the added texts.

Return type:

List[str]

add_texts_with_embeddings(texts: List[str], embeddings: List[List[float]], metadatas: List[Dict] | None = None) List[str][source]

Add texts with precomputed embeddings to the vector store.

Parameters:
  • texts (List[str]) – The texts to add.

  • embeddings (List[List[float]]) – Precomputed embeddings.

  • metadatas (Optional[List[Dict]], optional) – Metadata for each text. Defaults to None.

Returns:

The IDs of the added texts.

Return type:

List[str]

create_collection() None[source]

Create a collection in the database.

create_tables_if_not_exists() None[source]

Create tables in the database if they don’t exist.

delete_by_ids(ids: List[str])[source]

Delete documents by their IDs.

get_all_collection_metadata() Dict[source]

Get all metadata from the collection.

Returns:

A dictionary containing all metadata.

Return type:

Dict

get_collection_metadata(key: str) str | None[source]

Get metadata from the collection.

Parameters:

key (str) – Metadata key.

Returns:

The metadata value if it exists, otherwise None.

Return type:

Optional[str]

set_collection_metadata(key: str, value: str)[source]

Set metadata for the collection.

Parameters:
  • key (str) – Metadata key.

  • value (str) – Metadata value.

Perform a similarity search for a query.

Parameters:
  • query (str) – The query to search for.

  • k (int, optional) – The number of results to return. Defaults to 4.

Returns:

A tuple containing the list of documents that match the query and their corresponding similarity scores.

Return type:

Tuple[List[Document], List[float]]

similarity_search_by_vector(embedding: List[float], k: int = 4, distance_metric: str = 'cosine') Tuple[List[Document], List[float]][source]

Perform a similarity search by vector with configurable distance metrics.

Parameters:
  • embedding (List[float]) – The embedding vector to search with.

  • k (int, optional) – The number of results to return. Defaults to 4.

  • distance_metric (str, optional) – The distance metric to use. Defaults to “l2”. Options are “l2” and “cosine”. see https://github.com/pgvector/pgvector?tab=readme-ov-file#querying for more details.

Returns:

Documents and similarity scores.

Return type:

Tuple[List[Document], List[float]]

retrieval.siliconflow_embeddings module

class retrieval.siliconflow_embeddings.SiliconFlowEmbeddings(api_key: str, api_base_url: str, model_name: str, org_id: str = None, **model_kwargs)[source]

Bases: Embeddings

Embedding class for SiliconFlow BGE-M3 API.

embed_documents(texts: List[str] | str | Document | List[Document]) List[List[float]][source]

Generate embeddings for a list of texts or documents.

Parameters:

texts (Union[List[str], str, Document, List[Document]]) – The input texts or documents to embed.

Returns:

A list of embeddings for the input texts or documents.

Return type:

List[List[float]]

Raises:
  • ValueError – If the input list contains mixed types of strings and Documents.

  • TypeError – If the input is not a string, Document, list of strings, or list of Documents.

embed_query(text: str) List[float][source]

Generate an embedding for a single query text.

Parameters:

text (str) – The query text to embed.

Returns:

The embedding of the query text.

Return type:

List[float]

retrieval.siliconflow_rerank module

class retrieval.siliconflow_rerank.SiliconFlowRerank(api_key: str, api_base_url: str = 'https://api.siliconflow.cn/v1', model_name: str = 'BAAI/bge-reranker-v2-m3', **model_kwargs)[source]

Bases: Rerank

Rerank class for SiliconFlow BGE-Reranker API.

rerank(query: str, texts: List[str] | str | Document | List[Document], top_n: int = 4, return_documents: bool = False) List[Dict][source]

Rerank a list of texts or documents based on a query.

Parameters:
  • query (str) – The query to rerank texts or documents against.

  • texts (Union[List[str], str, Document, List[Document]]) – The input texts or documents to rerank.

  • top_n (int, optional) – Number of top documents to return. Defaults to 4.

  • return_documents (bool, optional) – Whether to return the full documents or just scores. Defaults to False.

Returns:

List of reranked documents or scores.

Return type:

List[Dict]

Raises:
  • ValueError – If the input list contains mixed types of strings and Documents.

  • TypeError – If the input is not a string, Document, list of strings, or list of Documents.

retrieval.sqlitevec_store module

class retrieval.sqlitevec_store.SQLiteVec(table: str, db_file: str = 'vec.db', pool_size: int = 5, embedding: Embeddings | None = None)[source]

Bases: VectorStore

SQLite with Vec extension as a vector database.

add_texts(texts: List[str], metadatas: List[Dict] | None = None) List[str][source]

Add texts to the vector store. :param texts: The list of texts to add. :type texts: List[str] :param metadatas: The list of metadata dictionaries. Defaults to None. :type metadatas: Optional[List[Dict]], optional

Returns:

The list of row IDs for the added texts.

Return type:

List[str]

Raises:

sqlite3.Error – If the addition of texts fails.

add_texts_with_embeddings(texts: List[str], embeddings: List[List[float]], metadatas: List[Dict] | None = None) List[str][source]

Add texts with precomputed embeddings to the vector store.

Parameters:
  • texts (List[str]) – The list of texts to add.

  • embeddings (List[List[float]]) – The list of precomputed embeddings.

  • metadatas (Optional[List[Dict]], optional) – The list of metadata dictionaries. Defaults to None.

Returns:

The list of row IDs for the added texts.

Return type:

List[str]

Raises:

sqlite3.Error – If the addition of texts fails.

create_metadata_table()[source]

Create metadata table if not exists

create_table()[source]

Create the main table and the virtual table.

create_table_if_not_exists()[source]

Create tables if they don’t exist.

Raises:

sqlite3.Error – If the table creation fails.

delete_by_ids(ids: List[str])[source]

Delete documents by their row IDs.

drop_table()[source]

Drop the main table and the virtual table if they exist.

get_dimensionality() int[source]

Get the dimensionality of the embeddings.

Returns:

The dimensionality of the embeddings.

Return type:

int

get_metadata(key: str) str | None[source]

Get metadata value by key. :param key: The key to retrieve the metadata value for. :type key: str

Returns:

The metadata value if found, otherwise None.

Return type:

Optional[str]

static serialize_f32(vector: List[float]) bytes[source]

Serialize a list of floats into bytes.

Parameters:

vector (List[float]) – The list of floats to serialize.

Returns:

The serialized bytes.

Return type:

bytes

set_metadata(key: str, value: str)[source]

Set metadata key-value pair. :param key: The key to set. :type key: str :param value: The value to associate with the key. :type value: str

Perform a similarity search.

Parameters:
  • query (str) – The query string.

  • k (int, optional) – The number of results to return. Defaults to 4.

Returns:

A tuple containing the list of documents that match the query and their corresponding similarity scores.

Return type:

Tuple[List[Document], List[float]]

Raises:

Exception – If the similarity search fails.

similarity_search_by_vector(embedding: List[float], k: int = 4, distance_metric: str = 'cosine') Tuple[List[Document], List[float]][source]

Perform a similarity search by vector with configurable distance metrics.

Parameters:
  • embedding (List[float]) – The embedding vector to search with.

  • k (int, optional) – The number of results to return. Defaults to 4.

  • distance_metric (str, optional) – Distance metric to use. Supported: ‘l2’ (Euclidean), ‘cosine’. Defaults to “l2”. see https://alexgarcia.xyz/sqlite-vec/api-reference.html#distance for more details.

Returns:

Documents and similarity scores.

Return type:

Tuple[List[Document], List[float]]

Module contents