CouchbaseEmbeddingRetriever
# Couchbase Embedding Retriever
## Class Overview
### `CouchbaseEmbeddingRetriever`
The `CouchbaseEmbeddingRetriever` retrieves documents from the `CouchbaseDocumentStore` by embedding similarity. The similarity depends on the `vector_search_index` used in the `CouchbaseDocumentStore` and the metric chosen during the creation of the index (e.g., dot product, or L2 norm).
#### Initialization
```python
def __init__(
self,
*,
document_store: CouchbaseDocumentStore,
top_k: int = 10,
)
Input Parameters:
document_store
(CouchbaseDocumentStore): An instance ofCouchbaseDocumentStore
where the documents are stored.top_k
(int): Maximum number of documents to return. Defaults to 10.
Raises:
ValueError
: Ifdocument_store
is not an instance ofCouchbaseDocumentStore
.
Example Usage:
import numpy as np
from couchbase_haystack import CouchbaseDocumentStore, CouchbaseEmbeddingRetriever
from haystack.utils.auth import Secret
store = CouchbaseDocumentStore(
cluster_connection_string=Secret.from_env_var("CB_CONNECTION_STRING"),,
cluster_options=CouchbaseClusterOptions(),
authenticator=CouchbasePasswordAuthenticator(),
bucket="haystack_test_bucket",
scope="scope_name",
collection="collection_name",
vector_search_index="vector_index"
)
retriever = CouchbaseEmbeddingRetriever(document_store=store)
query_embedding = np.random.random(768).tolist()
results = retriever.run(query_embedding=query_embedding)
print(results["documents"])
run
@component.output_types(documents=List[Document])
def run(
self,
query_embedding: List[float],
top_k: Optional[int] = None,
search_query: Optional[SearchQuery] = None,
limit: Optional[int] = None,
) -> Dict[str, List[Document]]
Description:
- Retrieves documents from the
CouchbaseDocumentStore
based on the similarity of their embeddings to the provided query embedding.
Input Parameters:
query_embedding
(List[float]): A list of float values representing the query embedding. The dimensionality of this embedding must match the dimensionality of the embeddings stored in theCouchbaseDocumentStore
.top_k
(Optional[int]): The maximum number of documents to return. Overrides the value specified during initialization. Defaults to the value oftop_k
set during initialization.search_query
(Optional[SearchQuery]): An optional search query to combine with the embedding query. The embedding query and search query are combined using an OR operation.limit
(Optional[int]): The maximum number of documents to return from the Couchbase full-text search (FTS) query. Defaults totop_k
.
Response:
- Returns a dictionary with a single key,
documents
, which maps to a list ofDocument
objects that are most similar to the providedquery_embedding
.
Example Usage:
query_embedding = [0.1, 0.2, 0.3, ...] # Example embedding vector
results = retriever.run(query_embedding=query_embedding, top_k=5)
print(results["documents"])
to_dict
def to_dict() -> Dict[str, Any]
Description:
- Serializes the
CouchbaseEmbeddingRetriever
instance into a dictionary format.
Response:
- Returns a dictionary containing the serialized state of the
CouchbaseEmbeddingRetriever
instance.
Example Usage:
retriever_dict = retriever.to_dict()
from_dict
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "CouchbaseEmbeddingRetriever"
Description:
- Deserializes a
CouchbaseEmbeddingRetriever
instance from a dictionary.
Input Parameters:
data
(Dict[str, Any]): A dictionary containing the serialized state of aCouchbaseEmbeddingRetriever
.
Response:
- Returns a
CouchbaseEmbeddingRetriever
instance reconstructed from the provided dictionary.
Example Usage:
retriever_instance = CouchbaseEmbeddingRetriever.from_dict(retriever_dict)
Usage Example
import numpy as np
from couchbase_haystack import CouchbaseDocumentStore, CouchbaseEmbeddingRetriever
store = CouchbaseDocumentStore(
cluster_connection_string="couchbases://localhost",
cluster_options=CouchbaseClusterOptions(),
authenticator=CouchbasePasswordAuthenticator(),
bucket="haystack_test_bucket",
scope="scope_name",
collection="collection_name",
vector_search_index="vector_index"
)
retriever = CouchbaseEmbeddingRetriever(document_store=store)
query_embedding = np.random.random(768).tolist()
results = retriever.run(query_embedding=query_embedding)
print(results["documents"])
This example retrieves the 10 most similar documents to a randomly generated query embedding from the CouchbaseDocumentStore
. Note that the dimensionality of the query_embedding
must match the dimensionality of the embeddings stored in the CouchbaseDocumentStore
.