FT.AGGREGATE performance problems

Hi,
We are using Redis Enterprise Cloud, and switching to FT.AGGREGATE from FT.SEARCH has introduced significant performance problems - our instance is using Redis 5.0.4/ Redisearch 1.4.6.

Here is a query that takes 3 seconds to return:

“FT.AGGREGATE” “trips_hash_index” “(@VisibleToCompanyIDs:{2}) (@TripMainTypeID:{6|4|1|3|2}) (@TripStatusID:{1|5|4|6|3})” “SORTBY” “8” “@CreatedDateTimeUTC” “DESC” “@TripStartDate” “ASC” “@TripEndDate” “ASC” “@TripID” “DESC” “LIMIT” “0” “10”

``

with FT.SEARCH it is quite quick:

“FT.SEARCH” “trips_hash_index” “(@VisibleToCompanyIDs:{2}) (@TripMainTypeID:{6|4|1|3|2}) (@TripStatusID:{1|5|4|6|3})” “SORTBY” “TripID” “DESC” “LIMIT” “0” “10”

``

And here is what the index looks like:

ft.info trips_hash_index

  1. index_name

  2. “trips_hash_index”

  3. fields

      1. PrimaryKey
  4. type

  5. TEXT

  6. WEIGHT

  7. “1”

  8. SORTABLE

    1. TripID
  9. type

  10. NUMERIC

  11. SORTABLE

    1. UserID
  12. type

  13. TAG

  14. SEPARATOR

  15. ,

    1. TripDescription
  16. type

  17. TEXT

  18. WEIGHT

  19. “1”

    1. TripStartDate
  20. type

  21. NUMERIC

  22. SORTABLE

    1. TripEndDate
  23. type

  24. NUMERIC

  25. SORTABLE

    1. TripTypeID
  26. type

  27. TAG

  28. SEPARATOR

  29. ,

    1. TripMainTypeID
  30. type

  31. TAG

  32. SEPARATOR

  33. ,

    1. TripStatusID
  34. type

  35. TAG

  36. SEPARATOR

  37. ,

    1. CreatedDateTimeUTC
  38. type

  39. NUMERIC

  40. SORTABLE

    1. VisibleToCompanyIDs
  41. type

  42. TAG

  43. SEPARATOR

  44. ,

    1. ParentTripID
  45. type

  46. TAG

  47. SEPARATOR

  48. ,

    1. Supplier
  49. type

  50. TAG

  51. SEPARATOR

  52. ,

    1. TripCarrierTypeID
  53. type

  54. TAG

  55. SEPARATOR

  56. ,

    1. ClientIDs
  57. type

  58. TAG

  59. SEPARATOR

  60. ,

  61. index_options

    1. “NOOFFSETS”
  62. gc_stats

    1. current_hz
  63. “3.0289804935455322”

  64. bytes_collected

  65. (integer) 68723

  66. effectiv_cycles_rate

  67. “0.036071268224674498”

  68. cursor_stats

    1. global_idle
  69. (integer) 0

  70. global_total

  71. (integer) 0

  72. index_capacity

  73. (integer) 128

  74. index_total

  75. (integer) 0

  76. num_docs

  77. (integer) 599715

  78. max_doc_id

  79. (integer) 600432

  80. num_terms

  81. (integer) 878752

  82. num_records

  83. (integer) 4937876

  84. inverted_sz_mb

  85. “4.6262814823090749e+18”

  86. offset_vectors_sz_mb

  87. “0”

  88. doc_table_size_mb

  89. “4.6329217579703337e+18”

  90. key_table_size_mb

  91. “4.6255695592675082e+18”

  92. records_per_doc_avg

  93. “8.2337043428962087”

  94. bytes_per_record_avg

  95. “4.2159641108849231”

  96. offsets_per_term_avg

  97. “0”

  98. offset_bits_per_record_avg

  99. “nan”

``

Any thoughts on what the problem might be?

Thanks!

Hi there Paul, specifically focusing on FT.AGGREGATE performance recommendations I would recommend:

Can you give it a try and let us know if that solved the issue?

Hi Filipe,
I am actually using MAX, but did not include it in the query - the difference between using it and not using is very negligible.

Secondly, CURSORS wouldn’t be acceptable in my case because the data changes all the time, so going from one page to the next could have different records because some may have got added, other deleted, other modified.

Do you have any other suggestions? What would be the reason search is so much faster? The only reason I need aggregate is to be able to sort by multiple fields? Is there a different way to accomplish this with search?

Hi again Paul, given that using the cursor API is not a possible solution here, I think that we should investigate further why the MAX parameter on SORTBY is not having the expected effect on performance improvement ( given that the query is the same on FT.AGGREGATE and FT.SEARCH ). With that in mind, is it possible for you to create an issue on:
https://github.com/RediSearch/RediSearch/issues
with what you’ve described here, + recreation instruction, and ( if possible ) an rdb so that we can test this for you and provide a solution? The more info we have the faster we can provide an explanation/solution.

Hi Filipe,
I created this issue: https://github.com/RediSearch/RediSearch/issues/1016

How would I go about giving you access to our QA environment where you could troubleshoot yourself? There are only about 165k records in that index, but FT.AGGREGATE still takes about a second to return with or without MAX.

Thanks for your help.