tag-based queries return inconsistent results

I use redisearch to index hashes, with each hash having a field “tags” with a list of tags. Certain queries with tags return mutually inconsistent results - here’s an example. Sorry for the difficult-to-read escaping of tags :slight_smile:

This search yields two resuls - here’s the tags of the first result:

127.0.0.1:6379> FT.SEARCH il1 “@tags:{w\:couchride\:t\:altamotors}” LIMIT 0 1 SORTBY _t DESC RETURN 1 tags

  1. (integer) 2

  2. “p:222588”

    1. “tags”
  3. “w:couchride:t:dealer,w:couchride:t:new,t:image,w:couchride:t:altamotors,w:couchride:t:motocross,w:couchride:t:redshift,w:couchride,f:spaceportcyclesnew47,cf:d:132,cf:m:ariv2,FL”

Let’s add one of the tags and re-try the query - we get 0 results

127.0.0.1:6379> FT.SEARCH il1 “@tags:{w\:couchride\:t\:new} @tags:{w\:couchride\:t\:altamotors}” LIMIT 0 1 SORTBY _t DESC RETURN 1 tags

  1. (integer) 0

Same for another tag:

FT.SEARCH il1 “@tags:{w\:couchride\:t\:dealer} @tags:{w\:couchride\:t\:altamotors}” LIMIT 0 1 SORTBY _t DESC RETURN 1 tags

  1. (integer) 0

Other tag combinations can be consistent and return results.

Any pointers on further debugging appreciated?

Thanks,

Michael

Some more info on this. Documents appear and dissapear out of these tag indexes continuously. When I first add a document via FT.ADDHASH it works great, immediately. But after a few minutes/hours, it starts being inaccessible through the same queries. Here’s another example that’s even stranger:

Let’s try to locate some documents via numerical constraints:

127.0.0.1:6379> “FT.SEARCH” “il1” “@_t:[1540247314 1540247314]” “LIMIT” “0” “1” “SORTBY” “_t” “DESC” RETURN 1 tags

  1. (integer) 3

  2. “p:212587”

    1. “tags”
  3. “w:couchride:t:dealer,w:couchride:t:new,t:image,w:couchride:t:cruisers,w:couchride:t:cruiserfamily,w:couchride:t:harley-davidson,w:couchride:t:sportster,w:couchride:t:sportster1200,w:couchride:t:sportster1200iron,w:couchride:t:iron,w:couchride,f:adamecharley-davidson\xc2\xaenew,cf:d:59,cf:m:dsv7,FL”

Cool, let’s add a tag that’s in the document above - no results, like my previous example:

127.0.0.1:6379> “FT.SEARCH” “il1” “@_t:[1540247314 1540247314] @tags:{w\:couchride\:t\:new}” “LIMIT” “0” “1” “SORTBY” “_t” “DESC” RETURN 1 tags

  1. (integer) 0

But now let’s add even more constraints - a query string:

127.0.0.1:6379> “FT.SEARCH” “il1” “@_t:[1540247314 1540247314] XL 1200NS - Sportster iron @tags:{w\:couchride\:t\:new}” “LIMIT” “0” “1” “SORTBY” “_t” “DESC” RETURN 1 tags

  1. (integer) 3

  2. “p:212587”

    1. “tags”
  3. “w:couchride:t:dealer,w:couchride:t:new,t:image,w:couchride:t:cruisers,w:couchride:t:cruiserfamily,w:couchride:t:harley-davidson,w:couchride:t:sportster,w:couchride:t:sportster1200,w:couchride:t:sportster1200iron,w:couchride:t:iron,w:couchride,f:adamecharley-davidson\xc2\xaenew,cf:d:59,cf:m:dsv7,FL”

dada!

I run MONITOR while working on this - grep for “p:212587” - these are all the operations that run on the server:

1541898977.358744 [0 127.0.0.1:36488] “HGETALL” “p:212587”

1541899697.998675 [0 127.0.0.1:36564] “HGETALL” “p:212587”

1541900152.900435 [0 35.231.200.123:37024] “HGETALL” “p:212587”

1541900154.328735 [0 35.231.200.123:37024] “HGETALL” “p:212587:bl”

1541900418.142649 [0 127.0.0.1:36636] “HGETALL” “p:212587”

And here’s my FT.INFO just in case it’s useful:

127.0.0.1:6379> FT.INFO il1

  1. index_name

  2. il1

  3. index_options

    1. “NOFREQS”
  4. “NOOFFSETS”

  5. fields

      1. ctitle
  6. type

  7. TEXT

  8. WEIGHT

  9. “1”

  10. NOSTEM

    1. geo
  11. type

  12. GEO

    1. _t
  13. type

  14. NUMERIC

  15. SORTABLE

    1. tags
  16. type

  17. TAG

  18. SEPARATOR

  19. ,

    1. price
  20. type

  21. NUMERIC

  22. SORTABLE

    1. year
  23. type

  24. NUMERIC

  25. SORTABLE

    1. mileage
  26. type

  27. NUMERIC

  28. SORTABLE

  29. num_docs

  30. “31667”

  31. max_doc_id

  32. “3858017”

  33. num_terms

  34. “10505”

  35. num_records

  36. “1.8446744073683358e+19”

  37. inverted_sz_mb

  38. “17592186044297.982”

  39. offset_vectors_sz_mb

  40. “0”

  41. doc_table_size_mb

  42. “315.65262222290039”

  43. sortable_values_size_mb

  44. “336.76803588867188”

  45. key_table_size_mb

  46. “0.81483268737792969”

  47. records_per_doc_avg

  48. “582522628405265.75”

  49. bytes_per_record_avg

  50. “1”

  51. offsets_per_term_avg

  52. “0”

  53. offset_bits_per_record_avg

  54. “-nan”

  55. gc_stats

    1. current_hz
  56. “68.944908142089844”

  57. bytes_collected

  58. “174798224”

  59. effectiv_cycles_rate

  60. “0.10952896031468519”

  61. cursor_stats

    1. global_idle
  62. (integer) 0

  63. global_total

  64. (integer) 0

  65. index_capacity

  66. (integer) 128

  67. index_total

  68. (integer) 0

Thanks

Hey Michael,

Can you pleas try to increase the TIMEOUT parameter to something like 10000 and check then.
You can do this by adding “TIMEOUT 10000” right after the loadmodule option when you start the redis.
Let me know if it has any effect of the returned results.

It doesn’t seem to make a difference. I also upped MAXDOCTABLESIZE to 10 000 000 just in case - here’s my setup:

82069:M 11 Nov 2018 15:25:41.706 * RediSearch version 1.2.0 (Git=v1.2.0-179-gcc54f9b)

82069:M 11 Nov 2018 15:25:41.706 * concurrency: ON, gc: ON, prefix min length: 1, prefix max expansions: 200, query timeout (ms): 10000, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 10000000, search pool size: 20, index pool size: 8,

82069:M 11 Nov 2018 15:25:41.707 * Initialized thread pool!

82069:M 11 Nov 2018 15:25:41.707 * Module ‘ft’ loaded from /Users/michaelmasouras/src/redis-5.0.0/redisearch.so

What are the names of the index keys so that I can query them directly?

Some additional info:

  1. This has been happening sporadically for the last couple of months, but because I reprocess everything daily and I was in development mode, it wasn’t completely obvious. In the last couple of weeks, I was preparing for a beta launch so I audited the search results and discovered this chaos :frowning:

  2. Before I used ft.search, I used to keep my own indexes. They illustrate the problem pretty well:

Great:

127.0.0.1:6379> ZCARD w:couchride:w:couchride:t:**dealer:**listings:live

(integer) 17526

127.0.0.1:6379> “FT.SEARCH” “il1” “@tags:{w\:couchride\:t\:dealer}” “LIMIT” “0” “1” “SORTBY” “_t” “DESC” RETURN 0

  1. (integer) 17526

Great:

127.0.0.1:6379> ZCARD w:couchride:w:couchride:t:new:listings:live

(integer) 12187

127.0.0.1:6379> “FT.SEARCH” “il1” “@tags:{w\:couchride\:t\:new}” “LIMIT” “0” “1” “SORTBY” “_t” “DESC” RETURN 0

  1. (integer) 12187

Ooops!

127.0.0.1:6379> ZINTERSTORE dealernew 2 w:couchride:w:couchride:t:dealer:listings:live w:couchride:w:couchride:t:new:listings:live

(integer) 12187 // <-------- CORRECT (all new vehicles are offered by dealers so the “dealers” set should be a superset of “new”)

127.0.0.1:6379> “FT.SEARCH” “il1” “@tags:{w\:couchride\:t\:new} @tags:{w\:couchride\:t\:dealer}” “LIMIT” “0” “1” “SORTBY” “_t” “DESC” RETURN 0

  1. (integer) 623 // <-------- INCORRECT

Thanks

Michael

Hey Michael

After checking the RDB I found out that you are correct and there is actually a bug in redisearch.
Please follow this issue for more details about the bug: https://github.com/RedisLabsModules/RediSearch/issues/534
The good news is that I already submitted a fix and its currently on review, please follow the PR so you will know when the fix is merged to master : https://github.com/RedisLabsModules/RediSearch/pull/535
Bad news is that there is no workaround here, you must upgrade in order to get the fix, you can either wait for 1.4.2 version (which planed to be release soon) or get the fix directly from master. At least you do not have to re-index your data, just use the same RDB with the fixed version and it should work correctly.

Thanks for reporting the issue!

Meir

This is a huge relief, thank you for looking into it and fixing it so fast.

How soon do you think 1.42 will be out? I 'd rather deploy a tested release everywhere than build from trunk.

Probably something like two weeks … maybe less.

I tried to build from trunk. The fix addresses the issue correctly on my Mac, but fails to do so on production Linux.

The environments have:

  • same build of redisearch from trunk

  • same rdb file

  • ft.info is identical in both machines

MAC

macbookpro:redis-5.0.0 $ uname -a

Darwin macbookpro.local 18.2.0 Darwin Kernel Version 18.2.0: Fri Oct 5 19:41:49 PDT 2018; root:xnu-4903.221.2~2/RELEASE_X86_64 x86_64

macbookpro:redis-5.0.0 $ redis-cli ft.search il1 “@tags:{w\:couchride\:t\:new} @tags:{w\:couchride\:t\:dealer} @geo:[-81.3792365 28.5383355 100 mi]” LIMIT 0 0

  1. (integer) 7344

LINUX

while production ubuntu doesn’t:

michaelmasouras@staging:/home/redis$ uname -a

Linux staging 4.15.0-1024-gcp #25-Ubuntu SMP Wed Oct 24 13:09:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

michaelmasouras@staging:/home/redis$ lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description: Ubuntu 18.04.1 LTS

Release: 18.04

Codename: bionic

127.0.0.1:6379> ft.search il1 “@tags:{w\:couchride\:t\:new} @tags:{w\:couchride\:t\:dealer} @geo:[-81.3792365 28.5383355 100 mi]” LIMIT 0 0

  1. (integer) 47

(1.86s)

Please lmk if you don’t see that discrepancy so I can debug further, but I have been trying this for a few hours now and I get consistent results in multiple production build machines vs my MacBook.

Michael

Turns out the issue I just reported is a different one that at the start of the thread. The issue I reported where different tag combinations produce inconsistent results seems to be fixed everywhere (mac/linux).

However, this bug still stands - it’s just not a new one. I confirmed that I am getting the same results for this geo query as with my original rediseearch build, so I don’t think this fix introduced other problems based on this evidence.

Another interesting thing to note is that even if I remove all tag constraints, the geo query always returns 47 results on linux, and that’s even for a radius larger than the earth:

mac (correct):

127.0.0.1:6379> ft.search il1 “@geo:[-81.3792365 28.5383355 100000 mi]” LIMIT 0 0

  1. (integer) 30498

linux:

127.0.0.1:6379> ft.search il1 “@geo:[-81.3792365 28.5383355 100000 mi]” LIMIT 0 0

  1. (integer) 47

Note that there are definitely more than 47 docs with valid geo:

$ redis-cli ft.search il1 “a*” LIMIT 0 10000 return 1 geo | grep “-” | wc -l

2860

in that index.

Michael

It seems these queries are just timing out. I am gathering some timings and will get back to this thread.

Once I upped the timeout for these geo queries, I got back consistent results. After a little digging I believe it’s the georadius redis queries that are causing the slowdown:

127.0.0.1:6379> “GEORADIUS” “geo:il1/geo” “-80.191790” “25.761680” “10.000000” “mi” COUNT 1

  1. “100002”

127.0.0.1:6379> “GEORADIUS” “geo:il1/geo” “-80.191790” “25.761680” “100.000000” “mi” COUNT 1

  1. “100002”

(0.95s)

127.0.0.1:6379> “GEORADIUS” “geo:il1/geo” “-80.191790” “25.761680” “100.000000” “mi” COUNT 1

  1. “100002”

(0.91s)

127.0.0.1:6379> “GEORADIUS” “geo:il1/geo” “-80.191790” “25.761680” “1000.000000” “mi” COUNT 1

  1. “100002”

(3.32s)

A few questions:

  1. Is this cardinality expected for an index that has 40K documents:

127.0.0.1:6379> ZCARD geo:il1/geo

(integer) 6918748

  1. Is this latency what you’d expect for such a small index size? Is this some denormalized index which explains why smaller radius queries are faster than larger (you’d have to calculate containment in every single document otherwise)?

  2. (minor improvement) adding a constraint of a tag that has 0 documents (doesnt exist actually) doesnt seem to speed things up:

127.0.0.1:6379> “FT.SEARCH” “il1” “@tag:{doesnotexist} @geo:[-80.1917902 25.7616798 1000 mi]” “LIMIT” “0” “0” “SORTBY” “_t” “DESC”

  1. (integer) 0

(3.21s)

Michael

RediSearch does not have a native geospatial index, it rather offloads this to Redis using GEORADIUS.

It may be that the query engine is evaluating the GEORADIUS query before the tag query. It may be possible to optimize this in a future version and give precedence of cheaper queries over more expensive ones in a boolean query.

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

*in an intersection query.

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

Thanks for the reply, do you think it’s expected for the geospatial key to having 7m items for an index size of 37K?

If you’re updating/deleting documents, then older entries will not be removed from the geo index. I don’t believe there is anything we are inherently unable to do (as it’s just checking if the member in the set is a valid docid); but this may be cause for the high cardinality.

Regards,
Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

So you mean that as I continue to add and remove documents from my index, this geo index will grow indefinitely? Is that by design os by omission?

Is it safe to remove items from the geoindex manually myself?

Michael

By omission :slight_smile: — in the beginning we didn’t really have any kind of garbage collection. Then we added GC for text indexes… then numeric indexes… and I guess soon, geo indexes.

It is safe to remove the items manually (that’s not the “official” way to do this… but for now, it should work); however, you need to know the numeric document ID of the document you’ve deleted. I believe Meir has implemented a debug command providing you with this info.

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

Hi Meir, what is the debug command for getting the numerical docid?

FT.DEBUG DOCIDTOID
Notice that you must do it before you delete the document otherwise you will get an error message, also notice that this is an undocumented debug command so it might change/remove in future releases.