I use redisearch to index hashes, with each hash having a field “tags” with a list of tags. Certain queries with tags return mutually inconsistent results - here’s an example. Sorry for the difficult-to-read escaping of tags
This search yields two resuls - here’s the tags of the first result:
Some more info on this. Documents appear and dissapear out of these tag indexes continuously. When I first add a document via FT.ADDHASH it works great, immediately. But after a few minutes/hours, it starts being inaccessible through the same queries. Here’s another example that’s even stranger:
Let’s try to locate some documents via numerical constraints:
Can you pleas try to increase the TIMEOUT parameter to something like 10000 and check then.
You can do this by adding “TIMEOUT 10000” right after the loadmodule option when you start the redis.
Let me know if it has any effect of the returned results.
It doesn’t seem to make a difference. I also upped MAXDOCTABLESIZE to 10 000 000 just in case - here’s my setup:
82069:M 11 Nov 2018 15:25:41.706 * RediSearch version 1.2.0 (Git=v1.2.0-179-gcc54f9b)
82069:M 11 Nov 2018 15:25:41.706 * concurrency: ON, gc: ON, prefix min length: 1, prefix max expansions: 200, query timeout (ms): 10000, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 10000000, search pool size: 20, index pool size: 8,
82069:M 11 Nov 2018 15:25:41.707 * Initialized thread pool!
82069:M 11 Nov 2018 15:25:41.707 * Module ‘ft’ loaded from /Users/michaelmasouras/src/redis-5.0.0/redisearch.so
What are the names of the index keys so that I can query them directly?
This has been happening sporadically for the last couple of months, but because I reprocess everything daily and I was in development mode, it wasn’t completely obvious. In the last couple of weeks, I was preparing for a beta launch so I audited the search results and discovered this chaos
Before I used ft.search, I used to keep my own indexes. They illustrate the problem pretty well:
After checking the RDB I found out that you are correct and there is actually a bug in redisearch.
Please follow this issue for more details about the bug: https://github.com/RedisLabsModules/RediSearch/issues/534
The good news is that I already submitted a fix and its currently on review, please follow the PR so you will know when the fix is merged to master : https://github.com/RedisLabsModules/RediSearch/pull/535
Bad news is that there is no workaround here, you must upgrade in order to get the fix, you can either wait for 1.4.2 version (which planed to be release soon) or get the fix directly from master. At least you do not have to re-index your data, just use the same RDB with the fixed version and it should work correctly.
Please lmk if you don’t see that discrepancy so I can debug further, but I have been trying this for a few hours now and I get consistent results in multiple production build machines vs my MacBook.
Turns out the issue I just reported is a different one that at the start of the thread. The issue I reported where different tag combinations produce inconsistent results seems to be fixed everywhere (mac/linux).
However, this bug still stands - it’s just not a new one. I confirmed that I am getting the same results for this geo query as with my original rediseearch build, so I don’t think this fix introduced other problems based on this evidence.
Another interesting thing to note is that even if I remove all tag constraints, the geo query always returns 47 results on linux, and that’s even for a radius larger than the earth:
Once I upped the timeout for these geo queries, I got back consistent results. After a little digging I believe it’s the georadius redis queries that are causing the slowdown:
Is this latency what you’d expect for such a small index size? Is this some denormalized index which explains why smaller radius queries are faster than larger (you’d have to calculate containment in every single document otherwise)?
(minor improvement) adding a constraint of a tag that has 0 documents (doesnt exist actually) doesnt seem to speed things up:
RediSearch does not have a native geospatial index, it rather offloads this to Redis using GEORADIUS.
It may be that the query engine is evaluating the GEORADIUS query before the tag query. It may be possible to optimize this in a future version and give precedence of cheaper queries over more expensive ones in a boolean query.
Mark Nunberg | Senior Software Engineer Redis Labs - home of Redis
If you’re updating/deleting documents, then older entries will not be removed from the geo index. I don’t believe there is anything we are inherently unable to do (as it’s just checking if the member in the set is a valid docid); but this may be cause for the high cardinality.
Regards, Mark Nunberg | Senior Software Engineer Redis Labs - home of Redis
By omission — in the beginning we didn’t really have any kind of garbage collection. Then we added GC for text indexes… then numeric indexes… and I guess soon, geo indexes.
It is safe to remove the items manually (that’s not the “official” way to do this… but for now, it should work); however, you need to know the numeric document ID of the document you’ve deleted. I believe Meir has implemented a debug command providing you with this info.
Mark Nunberg | Senior Software Engineer Redis Labs - home of Redis
FT.DEBUG DOCIDTOID
Notice that you must do it before you delete the document otherwise you will get an error message, also notice that this is an undocumented debug command so it might change/remove in future releases.