SLOWLOG - sorted set question

This one has for some reason has always seemed tricky to me -

An enterprise application is collecting timeseries data in a Sorted Set. If you notice the timestamp, the solution runs ZRANGEBYSCORE every hour, and deletes the last hour’s data via ZREMRANGEBYSCORE. How would you redesign this solution and avoid the slow execution?

What is the best approach?

If modules are an option, RedisTimeSeries would be worth checking out.

1 Like

Thanks for the comment and suggestion. I was more interested in the actual solution/best approach to this problem in more detail as it has always been confusing to me.

I suspect the problem has a couple of dimensions.

  1. ZREMRANGEBYSCORE has a complexity of O(log(N)+M) which is not great, especially the M bit - so if you’re removing 10k samples, it will be slow.
  2. This key is being hit by both a frequent but light ZADD and then is compounded by infrequent but heavy ZREMRANGEYSCORE

First, consider splitting it along key lines - each hour gets it’s it own ZSET. That way you can just expire the key instead of deleting it, if possible. Your ZRANGEBYSCORE may become more complicated as you’re checking multiple keys, but it can be done in a deterministic way.

I will also say this is a perfect use for RedisTimeSeries, as mentioned above. It will likely be faster / smaller out-of-the-box, especially given the way you’re using zsets currently.

Thanks for the detailed explanation of how this solution would work. Much appreciated.