Embedding Retrievers¤
CouchbaseSearchEmbeddingRetriever¤
Retrieves documents from the CouchbaseSearchDocumentStore by embedding similarity.
Uses Search Vector Index (FTS-based) for hybrid searches combining vector, full-text, and geospatial queries. See CouchbaseSearchDocumentStore for more information.
Usage example:
import numpy as np
from couchbase_haystack import CouchbaseSearchDocumentStore, CouchbaseSearchEmbeddingRetriever, CouchbasePasswordAuthenticator
from haystack.utils import Secret
store = CouchbaseSearchDocumentStore(
cluster_connection_string=Secret.from_env_var("CB_CONNECTION_STRING"),
authenticator=CouchbasePasswordAuthenticator(
username=Secret.from_env_var("CB_USERNAME"),
password=Secret.from_env_var("CB_PASSWORD")
),
bucket="haystack_test_bucket",
scope="scope_name",
collection="collection_name",
vector_search_index="vector_index"
)
retriever = CouchbaseSearchEmbeddingRetriever(document_store=store)
results = retriever.run(query_embedding=np.random.random(768).tolist())
print(results["documents"])
The example above retrieves the 10 most similar documents to a random query embedding from the CouchbaseSearchDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the CouchbaseSearchDocumentStore.
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
__init__ ¤
Note: Currently, the filter option is not supported with embedding queries. Instead, you can provide a couchbase search query while running the embedding query. The embedding query and search query are combined using an OR operation.
Parameters:
-
document_store(CouchbaseSearchDocumentStore) –An instance of CouchbaseSearchDocumentStore.
-
top_k(int, default:10) –Maximum number of Documents to return.
Raises:
-
ValueError–If document_store is not an instance of CouchbaseSearchDocumentStore.
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
to_dict ¤
Serializes the component to a dictionary.
Returns:
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
from_dict
classmethod
¤
Deserializes the component from a dictionary.
Parameters:
Returns:
-
CouchbaseSearchEmbeddingRetriever–Deserialized component.
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
run ¤
run(
query_embedding: List[float],
top_k: Optional[int] = None,
filters: Optional[Dict[str, Any]] = None,
search_query: Optional[SearchQuery] = None,
limit: Optional[int] = None,
) -> Dict[str, List[Document]]
Retrieve documents from the CouchbaseSearchDocumentStore, based on the provided embedding similarity.
Parameters:
-
query_embedding(List[float]) –Embedding of the query.
-
top_k(Optional[int], default:None) –Maximum number of Documents to be returned from vector query. Overrides the value specified at initialization.
-
filters(Optional[Dict[str, Any]], default:None) –Optional dictionary of filters to apply before the vector search. Refer to Haystack documentation for filter structure (https://docs.haystack.deepset.ai/v2.0/docs/metadata-filtering).
-
search_query(Optional[SearchQuery], default:None) –Search filters param which is parsed to the Couchbase search query. The vector query and search query are ORed operation.
-
limit(Optional[int], default:None) –Maximum number of Documents to be return by the couchbase fts search request. Default value is top_k.
Returns:
-
Dict[str, List[Document]]–A dictionary with the following keys:
-
Dict[str, List[Document]]–- documents: List of Documents most similar to the given query_embedding
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
CouchbaseQueryEmbeddingRetriever¤
Retrieves documents from the CouchbaseQueryDocumentStore using vector similarity search.
Works with both Hyperscale Vector Index and Composite Vector Index. Supports ANN (approximate) and KNN (exact) search with various similarity metrics. See CouchbaseQueryDocumentStore for more details.
Usage example:
import numpy as np
from couchbase_haystack import (
CouchbaseQueryDocumentStore,
CouchbaseQueryEmbeddingRetriever,
CouchbasePasswordAuthenticator,
QueryVectorSearchType,
CouchbaseQueryOptions,
QueryVectorSearchSimilarity
)
from haystack.utils import Secret
# Assume a Couchbase GSI index named "vector_gsi_index" exists on the "embedding" field
# with dimension 768 and using cosine similarity.
store = CouchbaseQueryDocumentStore(
cluster_connection_string=Secret.from_env_var("CB_CONNECTION_STRING"),
authenticator=CouchbasePasswordAuthenticator(
username=Secret.from_env_var("CB_USERNAME"),
password=Secret.from_env_var("CB_PASSWORD")
),
bucket="haystack_test_bucket",
scope="scope_name",
collection="collection_name",
search_type=QueryVectorSearchType.ANN, # Or KNN depending on index
similarity=QueryVectorSearchSimilarity.COSINE, # Or DOT, L2, EUCLIDEAN, L2_SQUARED or EUCLIDEAN_SQUARED
nprobes=10, # optional Number of probes for the ANN search
query_options=CouchbaseQueryOptions() # Optional query options
)
retriever = CouchbaseQueryEmbeddingRetriever(document_store=store, top_k=5)
# Generate a random query embedding matching the dimension
random_embedding = np.random.rand(768).tolist()
# Example without filters
results_no_filter = retriever.run(query_embedding=random_embedding)
print("Documents found without filters:", results_no_filter["documents"])
# Example with filters
filters = {"field": "meta.genre", "operator": "==", "value": "fiction"}
results_with_filter = retriever.run(query_embedding=random_embedding, filters=filters)
print("Documents found with filters:", results_with_filter["documents"])
The example above retrieves the 5 most similar documents to a random query embedding from the
CouchbaseQueryDocumentStore. Note that the dimensions of the query_embedding must match the dimensions
configured in the query_vector_search_params of the CouchbaseQueryDocumentStore.
Filters are applied before the vector search.
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 | |
__init__ ¤
Parameters:
-
document_store(CouchbaseQueryDocumentStore) –An instance of CouchbaseQueryDocumentStore.
-
top_k(int, default:10) –Maximum number of Documents to return based on vector similarity.
Raises:
-
ValueError–If document_store is not an instance of CouchbaseQueryDocumentStore.
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
to_dict ¤
Serializes the component to a dictionary.
Returns:
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
from_dict
classmethod
¤
Deserializes the component from a dictionary.
Parameters:
Returns:
-
CouchbaseQueryEmbeddingRetriever–Deserialized component.
Source code in src/couchbase_haystack/components/retrievers/embedding_retriever.py
run ¤
run(
query_embedding: List[float],
top_k: Optional[int] = None,
filters: Optional[Dict[str, Any]] = None,
nprobes: Optional[int] = None,
) -> Dict[str, List[Document]]
Retrieve documents from the CouchbaseQueryDocumentStore based on embedding similarity using GSI.
Parameters:
-
query_embedding(List[float]) –Embedding of the query.
-
top_k(Optional[int], default:None) –Maximum number of Documents to be returned based on similarity score. Overrides the value specified at initialization.
-
filters(Optional[Dict[str, Any]], default:None) –Optional dictionary of filters to apply before the vector search. Refer to Haystack documentation for filter structure (https://docs.haystack.deepset.ai/v2.0/docs/metadata-filtering).
-
nprobes(Optional[int], default:None) –Number of probes for the ANN search. If None, uses the value set at index creation time
Returns:
A dictionary with the following keys:
- documents: List of Documents most similar to the given query_embedding, potentially filtered.