Logo-amall

Hi, guys, any plans to support vector clustering? It looks like this could be done pretty easy given the index architecture

Last active 4 months ago

29 replies

14 views

  • DM

    Hi, guys, any plans to support vector clustering? It looks like this could be done pretty easy given the index architecture

  • AN

    Hi @Dmitry! Could you please elaborate how do you see clustering in the engine?

  • DM

    maybe, get k cluster centers

  • AN

    perform clustering on request?

  • DM

    yes

  • AN

    but clustering is very time-consuming operation

  • DM

    I believe the index is a tree, so that I would maybe interested to access the tree nodes

  • AN

    hnsw index is a graph, there is no well-defined center. Also index is separated into multiple segments, so each segment would have it's own cluster.

  • AN

    Also which clustering algorithm do you have in mind?

  • DM

    Now I am taking a tiny random subset of data, do pairwise distance and perform a hierarchical/agglomerative clustering

  • DM

    that a take a mean vector for each cluster to find an approximate cluster centers

  • AN

    what qdrant can do is to calculate distance matrix for this subset in one API call, should be pretty fast

  • AN

    you can use recommendation batch request with filters on subset point ids

  • DM

    I am taking random subset from postgres

  • AN

    taking random subset and calculating distances somewhere else might be slower, cause you need to transfer vectors over the network.

  • DM

    is there a way to get random things from qdrant?

  • AN

    I don't think so. What you would need to generate Ids of the subset externally

  • DM

    and query ids as a payload filter?

  • AN

    yes

  • DM

    smart

  • DM

    but how recommendation api would get me a distance matrix?

  • AN

    you wound need to make recommendation batch request

  • AN

    so it would be multiple recommendation requests in one api call

  • AN

    all with same filter

  • AN

    but different positive ID

  • DM

    I am following you

  • AN

    https://qdrant.github.io/qdrant/redoc/index.html#tag/points/operation/recommendbatchpoints

  • AN

    internally, we have an optimization to group requests with the same filter, so filter will be reused between request inside the batch

  • DM

    I will dig into it, thank you

Last active 4 months ago

29 replies

14 views