Re partitioning strategy for Redis Gears

AlexMikhalev · May 22, 2020, 10:43am

Hello all,
What is the best strategy to make sure keys and compute is distributed evenly across all shards?
In the previous trend, I now know how to pin sets/streams/strings to shards using {} and hashtag{}. I build a pipeline with 3 steps - intake, split into sentences and spellcheck.
What I noticed is that only two out of 3 nodes are processing records.
Example of stream-based reader

bg = GearsBuilder('StreamReader')
bg.foreach(symspell_sentences)
bg.register('sentences_tospellcheck*', batch=1, mode="async_local",onRegistered=OnRegisteredSym, onFailedPolicy='continue', trimStream=True)

stream is populated with execute('XADD', 'sentences_tospellcheck{%s}' % hashtag(), '*', 'sentence_key', f"{sentence_key}",'content', f"{each_sent}")

I tried adding

gb.repartition(lambda x:len(x['value']))

to repartition based on length of the value but it doesn’t seem to have any impact.
All keys are sha256 file hashes - they can be uniformly distributed across all shards.
I am currently trying it on rgcluster docker image which is 5.0.7, when I tried on Redis 6 and manually compiled gears only one shard was occupied with keys and processing.
Any suggestions?

meirsh · May 22, 2020, 11:22am

Hey @AlexMikhalev

First if you want to run with your own cluster with redis 6 You need to perform RG.REFRESHCLUSTER on all the shards as describe here : https://oss.redislabs.com/redisgears/commands.html#rgrefreshcluster.

Second repartition step is only relevant to a global executions and so irrelevant here and you do not need it.

Can you share the output of RG.INFOCLUSTER (after you run the RG.REFRESHCLUSTER) of all the shards to make sure slot range are distributed correctly on the cluster.

I guess another gear is writing to a stream base on key space event, is this assumption correct? are you sure keys are written to all the shards?

Last, if you can share the output of RG.DUMPREGISTRATIONS on of each shard to make sure registrations reached all the shards.

AlexMikhalev · May 22, 2020, 12:47pm

Thank you, using Redis 6 and

redis-trib.py execute --addr ip:30001 --master-only RG.REFRESHCLUSTER

seems did the trick.
When are you planning to tag rgcluster docker image for Redis 6?

AlexMikhalev · May 22, 2020, 7:42pm

I managed to replicate the crash, I will be writing Github bug report. In the meantime, I would like to share a screenshot to illustrate(humour). Shard F node seems to be pretty relaxed

meirsh · May 22, 2020, 7:58pm

Thanks @AlexMikhalev, I think I already know why it crashed but I will make sure using your reproduction.

Regardless, hope the configuration change solved it and you can progress, let us know how it goes.

Topic		Replies	Views
RedisGears on Redis Cluster and SETS (sismember and SADD) RedisGears	5	917	January 10, 2021
Redis Gears cron/ scheduler RedisGears	2	2253	October 30, 2020
Is the Reader distributed? If yes how does that work? RedisGears	7	727	March 19, 2020
Best way to debug Stream Reader RedisGears	4	939	June 5, 2020
Transaction across multiple master nodes RedisGears	6	1148	June 10, 2020

Re partitioning strategy for Redis Gears

Related Topics