Re partitioning strategy for Redis Gears

Hello all,
What is the best strategy to make sure keys and compute is distributed evenly across all shards?
In the previous trend, I now know how to pin sets/streams/strings to shards using {} and hashtag{}. I build a pipeline with 3 steps - intake, split into sentences and spellcheck.
What I noticed is that only two out of 3 nodes are processing records.
Example of stream-based reader

bg = GearsBuilder('StreamReader')
bg.foreach(symspell_sentences)
bg.register('sentences_tospellcheck*', batch=1, mode="async_local",onRegistered=OnRegisteredSym, onFailedPolicy='continue', trimStream=True)

stream is populated with execute('XADD', 'sentences_tospellcheck{%s}' % hashtag(), '*', 'sentence_key', f"{sentence_key}",'content', f"{each_sent}")

I tried adding

gb.repartition(lambda x:len(x['value']))

to repartition based on length of the value but it doesn’t seem to have any impact.
All keys are sha256 file hashes - they can be uniformly distributed across all shards.
I am currently trying it on rgcluster docker image which is 5.0.7, when I tried on Redis 6 and manually compiled gears only one shard was occupied with keys and processing.
Any suggestions?

Hey @AlexMikhalev

First if you want to run with your own cluster with redis 6 You need to perform RG.REFRESHCLUSTER on all the shards as describe here : https://oss.redislabs.com/redisgears/commands.html#rgrefreshcluster.

Second repartition step is only relevant to a global executions and so irrelevant here and you do not need it.

Can you share the output of RG.INFOCLUSTER (after you run the RG.REFRESHCLUSTER) of all the shards to make sure slot range are distributed correctly on the cluster.

I guess another gear is writing to a stream base on key space event, is this assumption correct? are you sure keys are written to all the shards?

Last, if you can share the output of RG.DUMPREGISTRATIONS on of each shard to make sure registrations reached all the shards.

Thank you, using Redis 6 and

redis-trib.py execute --addr ip:30001 --master-only RG.REFRESHCLUSTER

seems did the trick.
When are you planning to tag rgcluster docker image for Redis 6?

I managed to replicate the crash, I will be writing Github bug report. In the meantime, I would like to share a screenshot to illustrate(humour). Shard F node seems to be pretty relaxed

Thanks @AlexMikhalev, I think I already know why it crashed but I will make sure using your reproduction.

Regardless, hope the configuration change solved it and you can progress, let us know how it goes.