retrieval package¶
Submodules¶
retrieval.basics module¶
- class retrieval.basics.Document(page_content: str, metadata: Dict | None = None)[source]¶
Bases:
object
A simple class to hold text and metadata.
- class retrieval.basics.Embeddings[source]¶
Bases:
object
Base class for embedding models.
- embed_documents(texts: List[str] | str | Document | List[Document]) List[List[float]] [source]¶
Generate embeddings for a list of texts, a single text, a Document, or a list of Documents.
- Parameters:
texts – Can be a list of strings, a single string, a Document, or a list of Documents.
- Returns:
A list of embeddings, where each embedding is a list of floats.
- Return type:
List[List[float]]
- class retrieval.basics.VectorStore[source]¶
Bases:
object
Base class for vector stores.
- add_texts(texts: List[str], metadatas: List[Dict] | None = None) List[str] [source]¶
Add texts to the vector store with optional metadata.
- Parameters:
texts (List[str]) – A list of texts to add.
metadatas (Optional[List[Dict]]) – A list of metadata dictionaries corresponding to the texts. Defaults to None.
- Returns:
A list of IDs or keys associated with the added texts.
- Return type:
List[str]
- similarity_search(query: str, k: int = 4) List[Document] [source]¶
Perform a similarity search for the given query string.
- Parameters:
query (str) – The query string to search for.
k (int) – The number of results to return. Defaults to 4.
- Returns:
A list of Document instances that are most similar to the query.
- Return type:
List[Document]
- similarity_search_by_vector(embedding: List[float], k: int = 4) List[Document] [source]¶
Perform a similarity search using a precomputed embedding vector.
- Parameters:
embedding (List[float]) – The embedding vector to search with.
k (int) – The number of results to return. Defaults to 4.
- Returns:
A list of Document instances that are most similar to the embedding.
- Return type:
List[Document]
retrieval.pgvector_store module¶
- class retrieval.pgvector_store.CollectionStore(**kwargs)[source]¶
Bases:
Base
Represents a collection in the database.
- uuid¶
Primary key for the collection.
- Type:
UUID
- name¶
Name of the collection.
- Type:
str
- cmetadata¶
Metadata associated with the collection.
- Type:
dict
- embeddings¶
List of embeddings associated with the collection.
- Type:
List[EmbeddingStore]
- cmetadata¶
- embeddings¶
- name¶
- uuid¶
- class retrieval.pgvector_store.EmbeddingStore(**kwargs)[source]¶
Bases:
Base
Represents an embedding in the database.
- uuid¶
Primary key for the embedding.
- Type:
UUID
- collection_id¶
Foreign key referencing the collection.
- Type:
UUID
- collection¶
Collection associated with the embedding.
- Type:
- embedding¶
The embedding vector.
- Type:
Vector
- document¶
The document associated with the embedding.
- Type:
str
- cmetadata¶
Metadata associated with the embedding.
- Type:
dict
- custom_id¶
Custom ID for the embedding.
- Type:
str
- cmetadata¶
- collection¶
- collection_id¶
- custom_id¶
- document¶
- embedding¶
- uuid¶
- class retrieval.pgvector_store.PGVector(connection_string: str, embedding: Embeddings, collection_name: str = 'vectorsearch', pool_size: int = 5, **kwargs)[source]¶
Bases:
VectorStore
A vector store implementation using PostgreSQL and pgvector.
- _engine¶
SQLAlchemy engine for database connection.
- Type:
sqlalchemy.engine.Engine
- _Session¶
SQLAlchemy session maker.
- Type:
sqlalchemy.orm.sessionmaker
- _embedding¶
Embedding model used for generating embeddings.
- Type:
- _collection_name¶
Name of the collection in the database.
- Type:
str
- _collection¶
The collection associated with this vector store.
- Type:
- add_texts(texts: List[str], metadatas: List[Dict] | None = None) List[str] [source]¶
Add texts to the vector store.
- Parameters:
texts (List[str]) – The texts to add.
metadatas (Optional[List[Dict]], optional) – Metadata for each text. Defaults to None.
- Returns:
The IDs of the added texts.
- Return type:
List[str]
- add_texts_with_embeddings(texts: List[str], embeddings: List[List[float]], metadatas: List[Dict] | None = None) List[str] [source]¶
Add texts with precomputed embeddings to the vector store.
- Parameters:
texts (List[str]) – The texts to add.
embeddings (List[List[float]]) – Precomputed embeddings.
metadatas (Optional[List[Dict]], optional) – Metadata for each text. Defaults to None.
- Returns:
The IDs of the added texts.
- Return type:
List[str]
- get_all_collection_metadata() Dict [source]¶
Get all metadata from the collection.
- Returns:
A dictionary containing all metadata.
- Return type:
Dict
- get_collection_metadata(key: str) str | None [source]¶
Get metadata from the collection.
- Parameters:
key (str) – Metadata key.
- Returns:
The metadata value if it exists, otherwise None.
- Return type:
Optional[str]
- set_collection_metadata(key: str, value: str)[source]¶
Set metadata for the collection.
- Parameters:
key (str) – Metadata key.
value (str) – Metadata value.
- similarity_search(query: str, k: int = 4) Tuple[List[Document], List[float]] [source]¶
Perform a similarity search for a query.
- Parameters:
query (str) – The query to search for.
k (int, optional) – The number of results to return. Defaults to 4.
- Returns:
A tuple containing the list of documents that match the query and their corresponding similarity scores.
- Return type:
Tuple[List[Document], List[float]]
- similarity_search_by_vector(embedding: List[float], k: int = 4, distance_metric: str = 'cosine') Tuple[List[Document], List[float]] [source]¶
Perform a similarity search by vector with configurable distance metrics.
- Parameters:
embedding (List[float]) – The embedding vector to search with.
k (int, optional) – The number of results to return. Defaults to 4.
distance_metric (str, optional) – The distance metric to use. Defaults to “l2”. Options are “l2” and “cosine”. see https://github.com/pgvector/pgvector?tab=readme-ov-file#querying for more details.
- Returns:
Documents and similarity scores.
- Return type:
Tuple[List[Document], List[float]]
retrieval.siliconflow_embeddings module¶
- class retrieval.siliconflow_embeddings.SiliconFlowEmbeddings(api_key: str, api_base_url: str, model_name: str, org_id: str = None, **model_kwargs)[source]¶
Bases:
Embeddings
Embedding class for SiliconFlow BGE-M3 API.
- embed_documents(texts: List[str] | str | Document | List[Document]) List[List[float]] [source]¶
Generate embeddings for a list of texts or documents.
- Parameters:
texts (Union[List[str], str, Document, List[Document]]) – The input texts or documents to embed.
- Returns:
A list of embeddings for the input texts or documents.
- Return type:
List[List[float]]
- Raises:
ValueError – If the input list contains mixed types of strings and Documents.
TypeError – If the input is not a string, Document, list of strings, or list of Documents.
retrieval.siliconflow_rerank module¶
- class retrieval.siliconflow_rerank.SiliconFlowRerank(api_key: str, api_base_url: str = 'https://api.siliconflow.cn/v1', model_name: str = 'BAAI/bge-reranker-v2-m3', **model_kwargs)[source]¶
Bases:
Rerank
Rerank class for SiliconFlow BGE-Reranker API.
- rerank(query: str, texts: List[str] | str | Document | List[Document], top_n: int = 4, return_documents: bool = False) List[Dict] [source]¶
Rerank a list of texts or documents based on a query.
- Parameters:
query (str) – The query to rerank texts or documents against.
texts (Union[List[str], str, Document, List[Document]]) – The input texts or documents to rerank.
top_n (int, optional) – Number of top documents to return. Defaults to 4.
return_documents (bool, optional) – Whether to return the full documents or just scores. Defaults to False.
- Returns:
List of reranked documents or scores.
- Return type:
List[Dict]
- Raises:
ValueError – If the input list contains mixed types of strings and Documents.
TypeError – If the input is not a string, Document, list of strings, or list of Documents.
retrieval.sqlitevec_store module¶
- class retrieval.sqlitevec_store.SQLiteVec(table: str, db_file: str = 'vec.db', pool_size: int = 5, embedding: Embeddings | None = None)[source]¶
Bases:
VectorStore
SQLite with Vec extension as a vector database.
- add_texts(texts: List[str], metadatas: List[Dict] | None = None) List[str] [source]¶
Add texts to the vector store. :param texts: The list of texts to add. :type texts: List[str] :param metadatas: The list of metadata dictionaries. Defaults to None. :type metadatas: Optional[List[Dict]], optional
- Returns:
The list of row IDs for the added texts.
- Return type:
List[str]
- Raises:
sqlite3.Error – If the addition of texts fails.
- add_texts_with_embeddings(texts: List[str], embeddings: List[List[float]], metadatas: List[Dict] | None = None) List[str] [source]¶
Add texts with precomputed embeddings to the vector store.
- Parameters:
texts (List[str]) – The list of texts to add.
embeddings (List[List[float]]) – The list of precomputed embeddings.
metadatas (Optional[List[Dict]], optional) – The list of metadata dictionaries. Defaults to None.
- Returns:
The list of row IDs for the added texts.
- Return type:
List[str]
- Raises:
sqlite3.Error – If the addition of texts fails.
- create_table_if_not_exists()[source]¶
Create tables if they don’t exist.
- Raises:
sqlite3.Error – If the table creation fails.
- get_dimensionality() int [source]¶
Get the dimensionality of the embeddings.
- Returns:
The dimensionality of the embeddings.
- Return type:
int
- get_metadata(key: str) str | None [source]¶
Get metadata value by key. :param key: The key to retrieve the metadata value for. :type key: str
- Returns:
The metadata value if found, otherwise None.
- Return type:
Optional[str]
- static serialize_f32(vector: List[float]) bytes [source]¶
Serialize a list of floats into bytes.
- Parameters:
vector (List[float]) – The list of floats to serialize.
- Returns:
The serialized bytes.
- Return type:
bytes
- set_metadata(key: str, value: str)[source]¶
Set metadata key-value pair. :param key: The key to set. :type key: str :param value: The value to associate with the key. :type value: str
- similarity_search(query: str, k: int = 4) Tuple[List[Document], List[float]] [source]¶
Perform a similarity search.
- Parameters:
query (str) – The query string.
k (int, optional) – The number of results to return. Defaults to 4.
- Returns:
A tuple containing the list of documents that match the query and their corresponding similarity scores.
- Return type:
Tuple[List[Document], List[float]]
- Raises:
Exception – If the similarity search fails.
- similarity_search_by_vector(embedding: List[float], k: int = 4, distance_metric: str = 'cosine') Tuple[List[Document], List[float]] [source]¶
Perform a similarity search by vector with configurable distance metrics.
- Parameters:
embedding (List[float]) – The embedding vector to search with.
k (int, optional) – The number of results to return. Defaults to 4.
distance_metric (str, optional) – Distance metric to use. Supported: ‘l2’ (Euclidean), ‘cosine’. Defaults to “l2”. see https://alexgarcia.xyz/sqlite-vec/api-reference.html#distance for more details.
- Returns:
Documents and similarity scores.
- Return type:
Tuple[List[Document], List[float]]