Logo-amall

i am using the qdrant-client==0.11.5 python client , and so most of the times this piece of code works fine but have noticed some times i am getting the `[Errno 32] Broken pipe` code that i am using to do upload `client.upload_collection(collection_name=product_collection_name, vectors=text_embeddings, payload=text_payloads, ids=text_ids, batch_size=100) ` client creation code piece : `QdrantClient(host=app.config['VECTOR_DB_HOST'], port=app.config['VECTOR_DB_PORT'], timeout=Timeout(timeout=300]))`

Last active 2 months ago

23 replies

8 views

  • SN

    i am using the qdrant-client==0.11.5 python client , and so most of the times this piece of code works fine but have noticed some times i am getting the [Errno 32] Broken pipe

    code that i am using to do upload
    client.upload_collection(collection_name=product_collection_name, vectors=text_embeddings, payload=text_payloads, ids=text_ids, batch_size=100)

    client creation code piece :
    QdrantClient(host=app.config['VECTOR_DB_HOST'], port=app.config['VECTOR_DB_PORT'], timeout=Timeout(timeout=300]))

  • AN

    is it reproducible? Could you please try to run it with parallel=1 and check the logs on the server side?

  • SN

    was not able to reproduce this , when i tried . seen this at the same time (30 sec ) in range for 3 collection same [Errno 32] Broken pipe error .
    also when we do some upload do we get any logs on qdrant side that we can check ??

    also by default the parallel wpould be 1 right ?

  • AN

    by default parallel is 2 i think

  • AN

    > also when we do some upload do we get any logs on qdrant side that we can check ??

    Is there any WARN or ERROR log entries?

  • SN

    def uploadcollection(self, collectionname: str,
    vectors: Union[np.ndarray, Iterable[List[float]]],
    payload: Optional[Iterable[dict]] = None,
    ids: Optional[Iterable[types.PointId]] = None,
    batch_size: int = 64,
    parallel: int = 1):
    """Upload vectors and payload to the collection.
    This method will perform automatic batching of the data.
    If you need to perform a single update, use upsert method.
    Note: use upload_records method if you want to upload multiple vectors with single payload.

        Args:
            collection_name:  Name of the collection to upload to
            vectors: np.ndarray or an iterable over vectors to upload. Might be mmaped
            payload: Iterable of vectors payload, Optional, Default: None
            ids: Iterable of custom vectors ids, Optional, Default: None
            batch_size: How many vectors upload per-request, Default: 64
            parallel: Number of parallel processes of upload
        """
        batches_iterator = self._updater_class.iterate_batches(vectors=vectors,
                                                               payload=payload,
                                                               ids=ids,
                                                               batch_size=batch_size)
        self._upload_collection(batches_iterator, collection_name, parallel)
    
  • SN

    ig default is 1. and did not see any error

  • AN

    hm. Broken pipe error is usually associated with error in communication between processes. But if parallel=1 qdrant client won't spawn any additional processes

  • SN

    can this be an issue if multiple upload_collection happen at the same time for different collections ?

  • SN

    so i had 25 something collection for 3 only it gave this error , others worked fine

  • AN

    shouldn't be. apart from higher latency

  • SN

    hm yeah very strange , as this happened only on prod env and in the lower env not able to replicate and the The "Broken Pipe" error (errno 32) typically occurs when a process is trying to write to a pipe or socket that has already been closed by the other end server side
    that is what i understood from the error . will monitor on prod for some time ig. and check if i am getting the same error or not. but is there any other changes that u will suggest on client creation and the usage of upload_collection ?
    😅

  • AN

    maybe use of gRPC might help to increase spead

  • SN

    sure as i was reading the client code , thought of using that . thanks will check with grpc once

  • SW

    Me and @snayan06 are from same team, we are running Qdrant on EFS, so could this be happening because of any slowness from EFS?

  • FA

    Hi @Swastik K , is there a particular reason you chose efs instead of ebs?

  • SW

    We usually keep small disk (ebs) size on the instances and since size of collections can grow to large numbers we decided to go with EFS

  • FA

    We did not test EFS in our setups. We usually run on EBS and had quite good benchmark results

  • SW

    cool, will check more from our side. Thank you

  • DA

    @Swastik K If you are using EFS bursting mode, I would look the other options. The baseline throughput is small. Depending on the IO operations being done, you will suffer if you are out of the burst period. Furthermore, when using EFS, the recommended way to use the full IOPS potential is multiple readers/writers.

    Anyway, EFS is not recommended for databases, usually. It's more for web servers and other file sharing scenarios. EFS is like mounting an NFS disk in your local network (analogy), your usage patterns need to be adapted to get the best of it. And to some, neither adaptation will produce good results.

  • DA

    Since Qdrant has built-in replication when using clusters, I wouldn't recommend using EFS with it. Qdrant is known for its low latency and great speed. Deploying it this way will produce bad results. There are many downsides without a real benefit.

  • SW

    agree @danielbichuetti I'm also against using EFS for Qdrant, but it's not in my hands😅 .

    Thank you for this valuable information🙏

  • SN

    Thanks for the information 🙌

Last active 2 months ago

23 replies

8 views