That is going to be extremely time consuming for me to create. So I'm trying to avoid that. I understand that isn't your concern though.
Last active 8 months ago
51 replies
20 views
- SE
That is going to be extremely time consuming for me to create. So I'm trying to avoid that. I understand that isn't your concern though.
- AN
Let's move this discussion to a thread.
- AN
@andrey.vasnetsov fyi
- SE
Thank you. Sorry for making so much noise.
- SE
I'm just really excited about this solution. The results I'm getting back are GREAT!
- SE
Just need to figure out this issue with the indexing.
- AN
No worries. We would like to help it is just not simple without a complete view.
- SE
Completely understand.
- AN
does it make any difference if you upload with single vs multi thread?
- AN
Missing entries issue
- SE
I'm not sure how I would change that.
- SE
Right now I'm just doing an upsert.
- SE
client.upsert(collection_name=request.collection, points=[PointStruct(id=int(request.meta.get('id')), payload=request.meta, vector=vector.tolist())])
- AN
I mean it would be ideal to find a single request which has x points, but only y
- AN
also I would double-check that ids are not overlapping
- SE
I built a script that goes over all of the records and performs a
client.retrieve
for the specific ID. If a record is not found then I print a log.It logs all the missing entries correctly.
- SE
If I then attempt to add those they get inserted into the collection correctly.
- AN
but the initial update request is different?
- SE
Exact same request payload.
- AN
The missing entries are different each time, right? Not the same vectors.
- SE
I'm not sure on that.
- SE
As the results are not consistent.
- SE
Sometimes 10k missing records. Sometimes 5k.
- SE
I use this exact same script and send more or less the exact same record to a different image match service I have that has an API using Flask and backend of ElasticSearch and it works. So that's the strange part.
- SE
I think that I might have identified part of the problem. I switched this over to flask and I'm able to see the threads spinning up in my debug. For some reason it is creating a new thread for each request.
- SE
I'm assuming at some point this is just getting overloaded.
- AN
flask?
- SE
Yes.
- SE
- AN
do you run it with uvicorn ?
- SE
This is just running via the VSCode Debug.
- SE
{"name":"Python: Flask","type":"python","request":"launch","module":"flask","env":{"FLASK_APP":"api.py","FLASK_DEBUG":"1"},"args":["run","--no-debugger","--no-reload"],"jinja":true,"justMyCode":true},
- SE
Seems like it is cleaning up the threads though.
- SE
- AN
uvicorn have setting for required number of workers https://www.uvicorn.org/settings/
- SE
Yeah. I've adjusted those with the FastAPI.
- SE
If I leave the ID field out of the upsert request will it generate a random ID?
- AN
no
- AN
if you need random id, I would suggest to use random UUID generator
- SE
Hmm.
id = int(uuid.uuid1()) client.upsert(collection_name=data.get('collection'), points=[PointStruct(id=id, payload=data.get('meta'), vector=vector.tolist())])
- AN
no, that won't work.
- AN
str(uuid.uuid4())
- I usually use this - AN
uuid is 128 bit, it won't fit the uint64
- SE
Ok… hopefully this is the issue and the ids I was passing maybe were just getting stored wrong.
- AN
you mean the
request.meta.get('id')
is actually a uuid? - SE
Actually…. now that I'm thinking about it. I think this might have been the issue.
- SE
I think I was passing in an integer and then converting it to an int
- SE
Which I'm assuming could cause issues.
- SE
🫢 Well this is embarrassing.
- SE
Looking promising. Only 30 records off through 21k records.
- AN
Missing entries issue (solved)
Last active 8 months ago
51 replies
20 views