Logo-amall

Is DiskANN/Vamana considered as an alternative to HNSW in the future? If so, how far away could that be?

Last active 4 months ago

11 replies

5 views

  • VI

    Is DiskANN/Vamana considered as an alternative to HNSW in the future? If so, how far away could that be?

  • AN

    As I can see from the paper, there are not too many differences there. The graph building process is a bit different, but at the end it will be the same graph.

    Currently qdrant already can serve HNSW from disk. Which part of Vamana, do you think, we lack the most?

  • VI

    As I understand the third experiment (HNSW in MMAP) of your article "Minimal RAM you need to serve a million vectors", it seems that when only a fourth of the data can fit in RAM (300 MB in this benchmark), a similarity search for one point takes about in 1.1 seconds. I haven't thoroughly read the DiskANN paper, but I imagine from their claims that it would be much faster with Vamana. Don't you think so?

  • AN

    We did not measure how much memory actually read from disk. It would be interesting to test DiskANN under the same conditions🤔

  • VI

    FWIW, they claim a ~6 ms latency for a 98.68% recall in a single node with 64 GB RAM and two 1 TB SDD in RAID-0 on ANN_SIFT1B (1 B vectors of 128 uint8s).

  • VI

    Their index was 348 GB, so it seems they're able to impressively minimize the disk reads.

  • AN

    that’s strange, vectors is 120gb, and index 348?

  • AN

    anyway, we should compare on same dataset and same machine

  • VI

    From their paper: "On the disk, for each point, we store its full precision vector followed by the identities of its ≤ R neighbors."

  • VI

    "If the degree of a node is smaller than R, we pad with zeros."

  • VI

    Thanks for your replies on a Saturday and happy new year! 🙂

Last active 4 months ago

11 replies

5 views