i am using the qdrant-client==0.11.5 python client , and so most of the times this piece of code works fine but have noticed some times i am getting the `[Errno 32] Broken pipe` code that i am using to do upload `client.upload_collection(collection_name=product_collection_name, vectors=text_embeddings, payload=text_payloads, ids=text_ids, batch_size=100) ` client creation code piece : `QdrantClient(host=app.config['VECTOR_DB_HOST'], port=app.config['VECTOR_DB_PORT'], timeout=Timeout(timeout=300]))`
Last active 2 months ago
i am using the qdrant-client==0.11.5 python client , and so most of the times this piece of code works fine but have noticed some times i am getting the
[Errno 32] Broken pipe
code that i am using to do upload
client.upload_collection(collection_name=product_collection_name, vectors=text_embeddings, payload=text_payloads, ids=text_ids, batch_size=100)
client creation code piece :
QdrantClient(host=app.config['VECTOR_DB_HOST'], port=app.config['VECTOR_DB_PORT'], timeout=Timeout(timeout=300]))
is it reproducible? Could you please try to run it with parallel=1 and check the logs on the server side?
was not able to reproduce this , when i tried . seen this at the same time (30 sec ) in range for 3 collection same [Errno 32] Broken pipe error .
also when we do some upload do we get any logs on qdrant side that we can check ??
also by default the parallel wpould be 1 right ?
by default parallel is 2 i think
> also when we do some upload do we get any logs on qdrant side that we can check ??
Is there any WARN or ERROR log entries?
def uploadcollection(self, collectionname: str,
vectors: Union[np.ndarray, Iterable[List[float]]],
payload: Optional[Iterable[dict]] = None,
ids: Optional[Iterable[types.PointId]] = None,
batch_size: int = 64,
parallel: int = 1):
"""Upload vectors and payload to the collection.
This method will perform automatic batching of the data.
If you need to perform a single update, use
upload_recordsmethod if you want to upload multiple vectors with single payload.
Args: collection_name: Name of the collection to upload to vectors: np.ndarray or an iterable over vectors to upload. Might be mmaped payload: Iterable of vectors payload, Optional, Default: None ids: Iterable of custom vectors ids, Optional, Default: None batch_size: How many vectors upload per-request, Default: 64 parallel: Number of parallel processes of upload """ batches_iterator = self._updater_class.iterate_batches(vectors=vectors, payload=payload, ids=ids, batch_size=batch_size) self._upload_collection(batches_iterator, collection_name, parallel)
ig default is 1. and did not see any error
hm. Broken pipe error is usually associated with error in communication between processes. But if parallel=1 qdrant client won't spawn any additional processes
can this be an issue if multiple upload_collection happen at the same time for different collections ?
so i had 25 something collection for 3 only it gave this error , others worked fine
shouldn't be. apart from higher latency
hm yeah very strange , as this happened only on prod env and in the lower env not able to replicate and the
The "Broken Pipe" error (errno 32) typically occurs when a process is trying to write to a pipe or socket that has already been closed by the other end server side
that is what i understood from the error . will monitor on prod for some time ig. and check if i am getting the same error or not. but is there any other changes that u will suggest on client creation and the usage of upload_collection ?
maybe use of gRPC might help to increase spead
sure as i was reading the client code , thought of using that . thanks will check with grpc once
Me and @snayan06 are from same team, we are running Qdrant on EFS, so could this be happening because of any slowness from EFS?
Hi @Swastik K , is there a particular reason you chose efs instead of ebs?
We usually keep small disk (ebs) size on the instances and since size of collections can grow to large numbers we decided to go with EFS
We did not test EFS in our setups. We usually run on EBS and had quite good benchmark results
cool, will check more from our side. Thank you
@Swastik K If you are using EFS bursting mode, I would look the other options. The baseline throughput is small. Depending on the IO operations being done, you will suffer if you are out of the burst period. Furthermore, when using EFS, the recommended way to use the full IOPS potential is multiple readers/writers.
Anyway, EFS is not recommended for databases, usually. It's more for web servers and other file sharing scenarios. EFS is like mounting an NFS disk in your local network (analogy), your usage patterns need to be adapted to get the best of it. And to some, neither adaptation will produce good results.
Since Qdrant has built-in replication when using clusters, I wouldn't recommend using EFS with it. Qdrant is known for its low latency and great speed. Deploying it this way will produce bad results. There are many downsides without a real benefit.
agree @danielbichuetti I'm also against using EFS for Qdrant, but it's not in my hands😅 .
Thank you for this valuable information🙏
Thanks for the information 🙌
Last active 2 months ago