Query efficiency of ft.aggregate

xinz · August 30, 2023, 3:47am

I am a newbie to redisearch and I am having a problem. I currently have an index with about 7 fields, one of which is a communityNumber field of type: numeric. there is about 2 million data under this index. When I query using:

127.0.0.1:6379> ft.aggregate userIdx "*" groupby 1 @communityNumber limit 0 0
1) (integer) 500
(4.80s)

it takes about 5 seconds. But when I put the same data into elasticsearch query, it took less than 500 milliseconds. I would like to ask if this is the normal efficiency of aggregated queries? Or am I not querying correctly? What should I do to improve the aggregation query efficiency?

my current version of redisearch is 2.8.4.

Here is the ft.explain message:

127.0.0.1:6379> ft.explain userIdx "*" groupby 1 @communityNumber limit 0 0 
"<WILDCARD>}\n"

Below is the ft.info information:

127.0.0.1:6379> ft.info userIdx
 1) index_name
 2) userIdx
 3) index_options
 4) (empty array)
 5) index_definition
 6) 1) key_type
    2) JSON
    3) prefixes
    4) 1) user
    5) default_score
    6) "1"
 7) attributes
 8) 1) 1) identifier
       2) $.id
       3) attribute
       4) id
       5) type
       6) NUMERIC
    2) 1) identifier
       2) $.communityNumber
       3) attribute
       4) communityNumber
       5) type
       6) NUMERIC
    3) 1) identifier
       2) $.name
       3) attribute
       4) name
       5) type
       6) TAG
       7) SEPARATOR
       8) 
    4) 1) identifier
       2) $.age
       3) attribute
       4) age
       5) type
       6) NUMERIC
    5) 1) identifier
       2) $.createId
       3) attribute
       4) createId
       5) type
       6) NUMERIC
    6) 1) identifier
       2) $.createName
       3) attribute
       4) createName
       5) type
       6) TAG
       7) SEPARATOR
       8) 
    7) 1) identifier
       2) $.createTime
       3) attribute
       4) createTime
       5) type
       6) TAG
       7) SEPARATOR
       8) 
 9) num_docs
10) "2000000"
11) max_doc_id
12) "2000000"
13) num_terms
14) "0"
15) num_records
16) "14000000"
17) inverted_sz_mb
18) "31.286308288574219"
19) vector_index_sz_mb
20) "0"
21) total_inverted_index_blocks
22) "8118141"
23) offset_vectors_sz_mb
24) "0"
25) doc_table_size_mb
26) "147.71356201171875"
27) sortable_values_size_mb
28) "0"
29) key_table_size_mb
30) "55.313194274902344"
31) records_per_doc_avg
32) "7"
33) bytes_per_record_avg
34) "2.3432908058166504"
35) offsets_per_term_avg
36) "0"
37) offset_bits_per_record_avg
38) "-nan"
39) hash_indexing_failures
40) "0"
41) total_indexing_time
42) "53497.718999999997"
43) indexing
44) "0"
45) percent_indexed
46) "1"
47) number_of_uses
48) (integer) 20
49) gc_stats
50)  1) bytes_collected
     2) "0"
     3) total_ms_run
     4) "0"
     5) total_cycles
     6) "0"
     7) average_cycle_time_ms
     8) "-nan"
     9) last_run_time_ms
    10) "0"
    11) gc_numeric_trees_missed
    12) "0"
    13) gc_blocks_denied
    14) "0"
51) cursor_stats
52) 1) global_idle
    2) (integer) 0
    3) global_total
    4) (integer) 0
    5) index_capacity
    6) (integer) 128
    7) index_total
    8) (integer) 0
53) dialect_stats
54) 1) "dialect_1"
    2) (integer) 1
    3) "dialect_2"
    4) (integer) 0
    5) "dialect_3"
    6) (integer) 0

I’ve filed an issue on github for this problem: https://github.com/RediSearch/RediSearch/issues/3805

Topic		Replies	Views
FT.AGGREGATE performance problems RediSearch	4	889	December 16, 2019
FT.AGGREGATE troubleshooting RediSearch	7	581	May 26, 2019
FT.AGGREGATE equivalent of FT.SEARCH RediSearch	1	1430	June 19, 2022
Tuning redisearch for FT.SEARCH (prefix matching) performance for MINPREFIX 1, 2 or 3 any suggestions? RediSearch redisearch	0	1244	September 18, 2020
Working with huge datasets RediSearch	0	875	September 2, 2021

Query efficiency of ft.aggregate

Related Topics