Redis Search: sortby is not correct when there are more than 1,000,000 docs in DB

Hello,

We are using Redis Search with RedisJSON to index a list of JSON transactions based on EventDatetime.

The index ticketIdx is created with
FT.CREATE ticketIdx ON JSON PREFIX 1 ticket: SCHEMA $.eventDateTime AS eventDateTime TEXT SORTABLE $.assetType AS assetType TEXT $.salesPersonUt AS salesPersonUt TEXT $.status AS status TEXT

As long as the total number of JSON documents (with or without prefix “ticket:”) are less then 1,000,000 everything is working well.
When it is above 1,000,000, we noticed that querying the index on the sortable field eventDateTime in descending order is working well when limiting the number of results to 10 or 100, but not for 1000. When limiting to 1000, the results are incorrect as they do not contains the latest items.
See below.

Would you know why? is there a configuration parameter to change? I already tried to change MAXDOCTABLESIZE to 2000000 but it did not help

FT.SEARCH ticketIdx “@assetType:(BOND|IRD|FX|EQY|RSF|eMTN|Generic VC|Generic Margin|Securitization
|Call Account|Money Market)” SORTBY eventDateTime desc limit 0 10

  1. “392681”
  2. “ticket:kdbfx:d0dc1f3b-4f6a-4fcd-96d0-4c7f84a4835c”
    1. “eventDateTime”
    2. 2023-09-13@09:35:15.057

FT.SEARCH ticketIdx “@assetType:(BOND|IRD|FX|EQY|RSF|eMTN|Generic VC|Generic Margin|Securitization
|Call Account|Money Market)” SORTBY eventDateTime desc limit 0 100

  1. “392673”
  2. “ticket:jsbond:TRADEWEB_EUGV_EUGV_20230913_5769_20230913”
    1. “eventDateTime”
    2. 2023-09-13@09:36:24.287

FT.SEARCH ticketIdx “@assetType:(BOND|IRD|FX|EQY|RSF|eMTN|Generic VC|Generic Margin|Securitization
|Call Account|Money Market)” SORTBY eventDateTime desc limit 0 1000

  1. “362299”
  2. “ticket:kdbfx:f4aaa082-fc4e-4185-ac7c-664df06e9632”
    1. “eventDateTime”
    2. 2023-09-13@04:22:58.741

Here is the configuration and index details:
RediSearch v. 2.6.9
RedisJSON v. 2.4.7

FT.CONFIG GET *

    1. “EXTLOAD”
    2. “null”
    1. “SAFEMODE”
    2. “true”
    1. “CONCURRENT_WRITE_MODE”
    2. “false”
    1. “NOGC”
    2. “false”
    1. “MINPREFIX”
    2. “2”
    1. “FORKGC_SLEEP_BEFORE_EXIT”
    2. “0”
    1. “MAXDOCTABLESIZE”
    2. “2000000”
    1. “MAXSEARCHRESULTS”
    2. “10000”
    1. “MAXAGGREGATERESULTS”
    2. “10000”
    1. “MAXEXPANSIONS”
  1. “200”
    1. “MAXPREFIXEXPANSIONS”
  2. “200”
    1. “TIMEOUT”
  3. “500”
    1. “INDEX_THREADS”
  4. “8”
    1. “SEARCH_THREADS”
  5. “20”
    1. “FRISOINI”
  6. “null”
    1. “ON_TIMEOUT”
  7. “return”
    1. “GCSCANSIZE”
  8. “100”
    1. “MIN_PHONETIC_TERM_LEN”
  9. “3”
    1. “GC_POLICY”
  10. “fork”
    1. “FORK_GC_RUN_INTERVAL”
  11. “30”
    1. “FORK_GC_CLEAN_THRESHOLD”
  12. “100”
    1. “FORK_GC_RETRY_INTERVAL”
  13. “5”
    1. “FORK_GC_CLEAN_NUMERIC_EMPTY_NODES”
  14. “true”
    1. “_FORK_GC_CLEAN_NUMERIC_EMPTY_NODES”
  15. “true”
    1. “_MAX_RESULTS_TO_UNSORTED_MODE”
  16. “1000”
    1. “UNION_ITERATOR_HEAP”
  17. “20”
    1. “CURSOR_MAX_IDLE”
  18. “300000”
    1. “NO_MEM_POOLS”
  19. “false”
    1. “PARTIAL_INDEXED_DOCS”
  20. “false”
    1. “UPGRADE_INDEX”
  21. “Upgrade config for upgrading”
    1. “_NUMERIC_COMPRESS”
  22. “false”
    1. “_FREE_RESOURCE_ON_THREAD”
  23. “true”
    1. “_PRINT_PROFILE_CLOCK”
  24. “true”
    1. “RAW_DOCID_ENCODING”
  25. “false”
    1. “_NUMERIC_RANGES_PARENTS”
  26. “0”
    1. “DEFAULT_DIALECT”
  27. “1”
    1. “VSS_MAX_RESIZE”
  28. “0”
    1. “MULTI_TEXT_SLOP”
  29. “100”

FT.INFO ticketIdx

  1. “index_name”
  2. “ticketIdx”
  3. “index_options”
  4. (empty list or set)
  5. “index_definition”
    1. “key_type”
    2. “JSON”
    3. “prefixes”
      1. “ticket:”
    4. “default_score”
    5. “1”
  6. “attributes”
      1. “identifier”
      2. “$.eventDateTime”
      3. “attribute”
      4. “eventDateTime”
      5. “type”
      6. “TEXT”
      7. “WEIGHT”
      8. “1”
      9. “SORTABLE”
      10. “UNF”
      1. “identifier”
      2. “$.assetType”
      3. “attribute”
      4. “assetType”
      5. “type”
      6. “TEXT”
      7. “WEIGHT”
      8. “1”
      1. “identifier”
      2. “$.salesPersonUt”
      3. “attribute”
      4. “salesPersonUt”
      5. “type”
      6. “TEXT”
      7. “WEIGHT”
      8. “1”
      1. “identifier”
      2. “$.status”
      3. “attribute”
      4. “status”
      5. “type”
      6. “TEXT”
      7. “WEIGHT”
      8. “1”
  7. “num_docs”
  8. “387918”
  9. “max_doc_id”
  10. “994493”
  11. “num_terms”
  12. “1490”
  13. “num_records”
  14. “3629904”
  15. “inverted_sz_mb”
  16. “29.600536346435547”
  17. “vector_index_sz_mb”
  18. “0”
  19. “total_inverted_index_blocks”
  20. “943874”
  21. “offset_vectors_sz_mb”
  22. “10.037736892700195”
  23. “doc_table_size_mb”
  24. “42.581501007080078”
  25. “sortable_values_size_mb”
  26. “17.387529373168945”
  27. “key_table_size_mb”
  28. “14.226670265197754”
  29. “records_per_doc_avg”
  30. “9.3573999404907227”
  31. “bytes_per_record_avg”
  32. “8.5507526397705078”
  33. “offsets_per_term_avg”
  34. “2.8996167182922363”
  35. “offset_bits_per_record_avg”
  36. “8”
  37. “hash_indexing_failures”
  38. “8”
  39. “total_indexing_time”
  40. “60379.194000000003”
  41. “indexing”
  42. “0”
  43. “percent_indexed”
  44. “1”
  45. “number_of_uses”
  46. “775656”
  47. “gc_stats”
    1. “bytes_collected”
  48. “31444555”
  49. “total_ms_run”
  50. “1710673”
  51. “total_cycles”
  52. “3247”
  53. “average_cycle_time_ms”
  54. “526.84724360948564”
  55. “last_run_time_ms”
  56. “738”
  57. “gc_numeric_trees_missed”
  58. “0”
  59. “gc_blocks_denied”
  60. “16178”
  61. “cursor_stats”
    1. “global_idle”
  62. “0”
  63. “global_total”
  64. “0”
  65. “index_capacity”
  66. “128”
  67. “index_total”
  68. “0”
  69. “dialect_stats”
    1. “dialect_1”
  70. “1”
  71. “dialect_2”
  72. “0”
  73. “dialect_3”
  74. “0”

Best regards,

Xavier Appe

Well, just after writing this post, I found the issue, it was the timeout parameter set to 500ms which was preventing the search to finish properly.