Transaction across multiple master nodes

We are looking to execute transactions which involve keys which are located across multiple hash slots which in turn are owned by different master nodes. Does RedisGears provides that kind of capability with current version of is it planned to be included in any of the future versions.

Related conversation from Stack Overflow

@siddhant171990 sure, RedisGears is fully cluster-aware and your gear function runs on all the shards (unless you ask differently… ). The idea is that the function starts running on all the shards and does as much logic as possible on the local shard (map, filter, for each, and more operation that can be done locally). Then when it reached a step that requires moving data between shards, RedisGears reshuffle the data and then continue. The reshuffling can happen multiple times in the function life cycle.
So for a simple example if you want to count the number of keys on a cluster you can create the following gear function:

127.0.0.1:6379> RG.PYEXECUTE “GB().aggregate(0, lambda a, x: 1 + a, lambda a, x: x + a).run()”

    1. “5000000”
  1. (empty array)
    (73.99s)

The aggregate steps take a local aggregate function that runs separately on each shard (in this example it counts the number of keys on the shard), and a global aggregate that combines the results from all the shards (in this example it sums the number of keys returned from the other shards).

Of course, a better approach here is to use the ‘dbsize’ command of the local aggregate function which will be much faster than counting the keys and then sum the results from all the shards.

1 Like

I’m a little confused here. Why do we need to move data between shards here when the execution has to happen beyond the local shard?
On the other hand the statement a global aggregate that combines the results from all the shards, gives me an idea that the execution occurs independently on all the shards and the result is then combined together before its returned to the user.

Another thing, its probably unrelated, but when can we expect the JVM support to be our for RedisGears? Any ETA which we can look at?

RedisGears can provide cross-shard aggregations and scriptability, but if your goal is transactionality, then you should look into implementing the Saga pattern using Redis Streams.

More on Sagas: https://microservices.io/patterns/data/saga.html

A sample application that implements reliable retry on top of Streams that we wrote for the Redis Microservices for Dummies book: https://github.com/RedisLabs/redis-microservices-for-dummies

Long story short, you keep track in a stream key of all operations that need to be executed on other shards and have workers perform them. In the example code the workers are Python applications, but those could very well become Gears functions.

1 Like

@siddhant171990 Sorry if I was not clear but you got it correctly, the idea is that only the local aggregated data is moving between shards. But its totally possible to reshuffle the entire data. For example, the following gear function will swap keys and values (each value will become a key, and keys will be the values. Notice that such a function will come with a performance penalty because we potentially moving all the data between the shards…):

GB().repartiton(lambda x: x[‘value’]).foreach(lambda x: execute(‘set’, x[‘value’], x[‘key’])).run()

The repartition step makes sure data moved between the shards according to their value and all that left is to write the data back to Redis.

For more information and examples I suggest you start with the introduction (https://oss.redislabs.com/redisgears/intro.html).

Regarding the JVM support. I cannot give you the final ETA yet. What I can promise is that we are working on it and there is already a WIP PR Gears plugins by MeirShpilraien · Pull Request #344 · RedisGears/RedisGears · GitHub.

1 Like

Is there a performance impact with the cross-shard aggregations that make it best to try to avoid such operations if possible?

Yes, the cross-shard aggregations are costly and require to serialize and deserialize the data between the shards. Its best to do it as less as possible and with a small amount of data as possible. This is what the local aggregate function is for, if you do local aggregate you should be able to move much fewer data between shards and still perform the aggregations (as described with the number of keys example).