Working with huge datasets

Hi everyone,

I am playing around with RediSearch on huge data payloads (~500k). I have following data structure that is stored as Json:

 {
        public Guid Id;
        public string InvoiceNumber;
        public int Items;
        public Instant? InvoiceDate;
}

For test purposes InvoiceNumber=$“INV_{N}” and Items = n%20 where n <0,500000> and I have following index:

FT.CREATE invoice on JSON PREFIX 1 "invoice:" SCHEMA $.InvoiceNumber as InvoiceNumber TEXT SORTABLE $.Items as Items NUMERIC SORTABLE 

I faced few issues with ft.search

  1. Too generic queries
    For following query
ft.search invoice "@InvoiceNumber:INV_*" limit 0 20

shows that 200 items are available

ft.search invoice "@InvoiceNumber:INV_* @Items:[10 10]" limit 0 20

shows that only 10 items are available

ft.search invoice " @Items:[10 10]" limit 0 20

shows that 25000 items are available

Could you please tell me why MAXEXPANSIONS are applied only to TEXT types?

  1. I changed MAXEXPANSIONS value to be equal 500k.
ft.search invoice "@InvoiceNumber:INV_*"

returns

  1. (integer) 99
  2. (empty array)
    (1.52s)

There is empty array, but according to documentation it should return top results accumulated so far

I am not really sure what is happening here. The only solution(and probably not the most elegant), that solves this issue is setting timeout to 0.

  1. With MAXEXPANSIONS set to 500k and TIMEOUT limit behaves really strangely and takes a lot of time. There is no difference for queries with or without limit parameter.

Thank you in advance for your help. I was doing research for few days, but I didn’t manage to find answer.

Regards,
Mkrzyszc