FT.AGGREGATE performance problems

Paul · December 16, 2019, 2:15pm

Hi,
We are using Redis Enterprise Cloud, and switching to FT.AGGREGATE from FT.SEARCH has introduced significant performance problems - our instance is using Redis 5.0.4/ Redisearch 1.4.6.

Here is a query that takes 3 seconds to return:

“FT.AGGREGATE” “trips_hash_index” “(@VisibleToCompanyIDs:{2}) (@TripMainTypeID:{6|4|1|3|2}) (@TripStatusID:{1|5|4|6|3})” “SORTBY” “8” “@CreatedDateTimeUTC” “DESC” “@TripStartDate” “ASC” “@TripEndDate” “ASC” “@TripID” “DESC” “LIMIT” “0” “10”

``

with FT.SEARCH it is quite quick:

“FT.SEARCH” “trips_hash_index” “(@VisibleToCompanyIDs:{2}) (@TripMainTypeID:{6|4|1|3|2}) (@TripStatusID:{1|5|4|6|3})” “SORTBY” “TripID” “DESC” “LIMIT” “0” “10”

``

And here is what the index looks like:

ft.info trips_hash_index

index_name
“trips_hash_index”
fields
1. 1. PrimaryKey
type
TEXT
WEIGHT
“1”
SORTABLE
1. TripID
type
NUMERIC
SORTABLE
1. UserID
type
TAG
SEPARATOR
,
1. TripDescription
type
TEXT
WEIGHT
“1”
1. TripStartDate
type
NUMERIC
SORTABLE
1. TripEndDate
type
NUMERIC
SORTABLE
1. TripTypeID
type
TAG
SEPARATOR
,
1. TripMainTypeID
type
TAG
SEPARATOR
,
1. TripStatusID
type
TAG
SEPARATOR
,
1. CreatedDateTimeUTC
type
NUMERIC
SORTABLE
1. VisibleToCompanyIDs
type
TAG
SEPARATOR
,
1. ParentTripID
type
TAG
SEPARATOR
,
1. Supplier
type
TAG
SEPARATOR
,
1. TripCarrierTypeID
type
TAG
SEPARATOR
,
1. ClientIDs
type
TAG
SEPARATOR
,
index_options
1. “NOOFFSETS”
gc_stats
1. current_hz
“3.0289804935455322”
bytes_collected
(integer) 68723
effectiv_cycles_rate
“0.036071268224674498”
cursor_stats
1. global_idle
(integer) 0
global_total
(integer) 0
index_capacity
(integer) 128
index_total
(integer) 0
num_docs
(integer) 599715
max_doc_id
(integer) 600432
num_terms
(integer) 878752
num_records
(integer) 4937876
inverted_sz_mb
“4.6262814823090749e+18”
offset_vectors_sz_mb
“0”
doc_table_size_mb
“4.6329217579703337e+18”
key_table_size_mb
“4.6255695592675082e+18”
records_per_doc_avg
“8.2337043428962087”
bytes_per_record_avg
“4.2159641108849231”
offsets_per_term_avg
“0”
offset_bits_per_record_avg
“nan”

``

Any thoughts on what the problem might be?

Thanks!

Filipe_C_Oliveira · December 16, 2019, 3:14pm

Hi there Paul, specifically focusing on FT.AGGREGATE performance recommendations I would recommend:

using the MAX parameter within your sortby clauses: SORTBY … MAX , since MAX is used to optimized sorting, by sorting only for the n-largest elements. https://oss.redislabs.com/redisearch/Aggregations/#parameters_in_detail
not using LIMIT in favor of using the cursor api, with the WITHCURSOR keyword, since cursors allow you to consume only part of the response, allowing you to fetch additional results as needed. This is much quicker than using LIMIT with offset, since the query is executed only once, and its state is stored on the server. https://oss.redislabs.com/redisearch/Aggregations/#cursor_api

Can you give it a try and let us know if that solved the issue?

Paul · December 16, 2019, 3:53pm

Hi Filipe,
I am actually using MAX, but did not include it in the query - the difference between using it and not using is very negligible.

Secondly, CURSORS wouldn’t be acceptable in my case because the data changes all the time, so going from one page to the next could have different records because some may have got added, other deleted, other modified.

Do you have any other suggestions? What would be the reason search is so much faster? The only reason I need aggregate is to be able to sort by multiple fields? Is there a different way to accomplish this with search?

Filipe_C_Oliveira · December 16, 2019, 4:09pm

Hi again Paul, given that using the cursor API is not a possible solution here, I think that we should investigate further why the MAX parameter on SORTBY is not having the expected effect on performance improvement ( given that the query is the same on FT.AGGREGATE and FT.SEARCH ). With that in mind, is it possible for you to create an issue on:
https://github.com/RediSearch/RediSearch/issues
with what you’ve described here, + recreation instruction, and ( if possible ) an rdb so that we can test this for you and provide a solution? The more info we have the faster we can provide an explanation/solution.

Paul · December 16, 2019, 4:44pm

Hi Filipe,
I created this issue: https://github.com/RediSearch/RediSearch/issues/1016

How would I go about giving you access to our QA environment where you could troubleshoot yourself? There are only about 165k records in that index, but FT.AGGREGATE still takes about a second to return with or without MAX.

Thanks for your help.

Topic		Replies	Views
Query efficiency of ft.aggregate RediSearch redisearch	0	332	August 30, 2023
Redis Enterprise Cloud vs local installation RediSearch	1	542	December 19, 2019
FT.AGGREGATE equivalent of FT.SEARCH RediSearch	1	1417	June 19, 2022
FT.AGGREGATE troubleshooting RediSearch	7	576	May 26, 2019
Redisearch performance degrades after a while - needs redis restart RediSearch redisearch	11	2125	August 13, 2020

FT.AGGREGATE performance problems

Related Topics