Logo-amall

Hello people. I also experience some time out errors when deploying the qdrant helm chart and trying to upload a collection. The logs that i do get are the following qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 503 (Service Unavailable) Raw response content: b'upstream connect error or disconnect/reset before headers. reset reason: connection termination' qdrant-server == 0.10.1 , client 0.10.2. This happens at random timings , so it is not a data problem and i believe is not a resources problems (RAM,cpu) as well. How can i debug it??

Last active 7 months ago

22 replies

11 views

  • TB

    Hello people. I also experience some time out errors when deploying the qdrant helm chart and trying to upload a collection. The logs that i do get are the following
    qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 503 (Service Unavailable)
    Raw response content:
    b'upstream connect error or disconnect/reset before headers. reset reason: connection termination'

    qdrant-server == 0.10.1 , client 0.10.2. This happens at random timings , so it is not a data problem and i believe is not a resources problems (RAM,cpu) as well.
    How can i debug it??

  • AN

    Hi! Could you please describe your setup in more details? What's the cluster size, how many data you upload, from where, e.t.c.

  • TB

    Hello, cluster mode is disabled, one replica and i have defined resources as cpu requests 100m limits 32 and RAM requests 4Gb limits 64GB.

  • TB

    Data points are 64 M vectors of 384 dimensions, with ondiskpayload True

  • AN

    100m cpu means 10% of CPU

  • TB

    10% of 1 core yes, but i thought that if needed more it would go up until the limit of 32 cores

  • TB

    should i increase that and try reindexing, or you see some additional problem

  • AN

    assuming it is allocated in a proper node.

  • AN

    it looks like k8s kills the service because of the lack of resources

  • TB

    i'll check and inform you of the resolution. thank you

  • TB

    I have updated to 0.10.2 server, 0.10.3 client. K8s resources:
    limits:
    cpu: "32"
    memory: 128Gi
    requests:
    cpu: "24"
    memory: 64Gi
    Cluster disabled, nfs storage, and i connect to qdrant by exposing the service as a nodeport.

  • TB

    Still getting time outs

  • TB
  • TB

    2022-10-04T09:48:02.358Z DEBUG eventual::core] Core::producerready; state=State[count=0; consuming=false; producing=false; lifecycle=New] [2022-10-04T09:48:02.358Z DEBUG eventual::core] - transitioned from State[count=0; consuming=false; producing=false; lifecycle=New] to State[count=0; consuming=false; producing=false; lifecycle=ProducerWait] [2022-10-04T09:48:02.358Z INFO wal::segment] Segment { path: "./storage/collections/aggas/0/wal/open-101", entries: 249, space: (33539088/33554432) }: renaming file to "./storage/collections/aggas/0/wal/closed-24903" [2022-10-04T09:48:02.358Z DEBUG wal::segment] Segment { path: "./storage/collections/aggas/0/wal/open-101", entries: 249, space: (33539088/33554432) }: async flushing byte range [0, 33539088)
    [2022-10-04T09:48:08.191Z DEBUG eventual::core] Core::complete; state=State[count=0; consuming=false; producing=false; lifecycle=New]; success=true; last=true
    [2022-10-04T09:48:08.191Z DEBUG eventual::core] - transitioned from State[count=0; consuming=false; producing=false; lifecycle=New] to State[count=0; consuming=false; producing=false; lifecycle=Ready]
    [2022-10-04T09:48:08.193Z DEBUG wal] Wal { path: "./storage/collections/aggas/0/wal", segment-count: 3, entries: [24654, 25152) }: open segment retired. startindex: 24903

  • AN

    what is the service uptime in k8s?

  • TB

    17 hours

  • AN

    so the service is up

  • TB

    yes, I am connecting through a jupyterlab server directly to the node that the qdrant pod is, whenever it fails though it outputs the logs that i pasted

  • AN

    how many data it uploaded before timeout?

  • TB

    random timeouts between 700_000 and 4M

  • TB

    I am checking right now to install the image in a dedicated server and try to index everything there. If that succeeds the issue should be either somethings k8s related or storage related

  • AN

    nfs storage might be a problem as well

Last active 7 months ago

22 replies

11 views