Logo-amall

That is going to be extremely time consuming for me to create. So I'm trying to avoid that. I understand that isn't your concern though.

Last active 8 months ago

51 replies

20 views

  • SE

    That is going to be extremely time consuming for me to create. So I'm trying to avoid that. I understand that isn't your concern though.

  • AN

    Let's move this discussion to a thread.

  • AN

    @andrey.vasnetsov fyi

  • SE

    Thank you. Sorry for making so much noise.

  • SE

    I'm just really excited about this solution. The results I'm getting back are GREAT!

  • SE

    Just need to figure out this issue with the indexing.

  • AN

    No worries. We would like to help it is just not simple without a complete view.

  • SE

    Completely understand.

  • AN

    does it make any difference if you upload with single vs multi thread?

  • AN

    Missing entries issue

  • SE

    I'm not sure how I would change that.

  • SE

    Right now I'm just doing an upsert.

  • SE

    client.upsert(collection_name=request.collection, points=[PointStruct(id=int(request.meta.get('id')), payload=request.meta, vector=vector.tolist())])

  • AN

    I mean it would be ideal to find a single request which has x points, but only y

  • AN

    also I would double-check that ids are not overlapping

  • SE

    I built a script that goes over all of the records and performs a client.retrieve for the specific ID. If a record is not found then I print a log.

    It logs all the missing entries correctly.

  • SE

    If I then attempt to add those they get inserted into the collection correctly.

  • AN

    but the initial update request is different?

  • SE

    Exact same request payload.

  • AN

    The missing entries are different each time, right? Not the same vectors.

  • SE

    I'm not sure on that.

  • SE

    As the results are not consistent.

  • SE

    Sometimes 10k missing records. Sometimes 5k.

  • SE

    I use this exact same script and send more or less the exact same record to a different image match service I have that has an API using Flask and backend of ElasticSearch and it works. So that's the strange part.

  • SE

    I think that I might have identified part of the problem. I switched this over to flask and I'm able to see the threads spinning up in my debug. For some reason it is creating a new thread for each request.

  • SE

    I'm assuming at some point this is just getting overloaded.

  • AN

    flask?

  • SE

    Yes.

  • SE
  • AN

    do you run it with uvicorn ?

  • SE

    This is just running via the VSCode Debug.

  • SE
            {"name":"Python: Flask","type":"python","request":"launch","module":"flask","env":{"FLASK_APP":"api.py","FLASK_DEBUG":"1"},"args":["run","--no-debugger","--no-reload"],"jinja":true,"justMyCode":true},
    
  • SE

    Seems like it is cleaning up the threads though.

  • SE
  • AN

    uvicorn have setting for required number of workers https://www.uvicorn.org/settings/

  • SE

    Yeah. I've adjusted those with the FastAPI.

  • SE

    If I leave the ID field out of the upsert request will it generate a random ID?

  • AN

    no

  • AN

    if you need random id, I would suggest to use random UUID generator

  • SE

    Hmm.

            id = int(uuid.uuid1())
            client.upsert(collection_name=data.get('collection'), points=[PointStruct(id=id, payload=data.get('meta'), vector=vector.tolist())])
    
  • AN

    no, that won't work.

  • AN

    str(uuid.uuid4()) - I usually use this

  • AN

    uuid is 128 bit, it won't fit the uint64

  • SE

    Ok… hopefully this is the issue and the ids I was passing maybe were just getting stored wrong.

  • AN

    you mean the request.meta.get('id') is actually a uuid?

  • SE

    Actually…. now that I'm thinking about it. I think this might have been the issue.

  • SE

    I think I was passing in an integer and then converting it to an int

  • SE

    Which I'm assuming could cause issues.

  • SE

    🫢 Well this is embarrassing.

  • SE

    Looking promising. Only 30 records off through 21k records.

  • AN

    Missing entries issue (solved)

Last active 8 months ago

51 replies

20 views