Migration Script for Redis Enterprise

Hello, I have a script that I wrote in Python to migrate keys/values from one Redis instance to another. At my company we are using Redis Enterprise, and as such, the DUMP and RESTORE commands are not allowed. This is forcing me to use the GET and SET commands in the script. I am suspecting that it is the use of GET/SET that is impacting the performance of the script. I have a few questions around this:

  1. Is it possible to create a high-performance migration script using regular Redis commands to migrate data with Redis Enterprise? i.e. one that does not require admin access

  2. Is it true that using DUMP/RESTORE would be faster than GET/SET, and if so, why?

Thanks

The script:

import argparse
import redis
from itertools import zip_longest


def connect_redis_strict(conn_dict):
    conn = redis.StrictRedis(host=conn_dict['host'],
                             port=conn_dict['port'],
                             db=conn_dict['db'],
                             password=conn_dict['password'])
    return conn


def connect_redis(conn_dict):
    conn = redis.Redis(host=conn_dict['host'],
                       port=conn_dict['port'],
                       db=conn_dict['db'])
    return conn


def conn_string_type(string):
    format = '<host>:<port>:<password>/<db>'
    try:
        host, port, password_db = string.split(':')
        password, db = password_db.split('/')
        db = int(db)
    except ValueError:
        raise argparse.ArgumentTypeError('incorrect format, should be: %s' % format)
    return {'host': host,
            'port': port,
            'password': password,
            'db': db}


def migrate_redis(source, destination):
    src = connect_redis_strict(source)
    dst = connect_redis(destination)
    for keybatch in batcher(src.scan_iter('*', 1000), 1000):
        process_batch(keybatch, src, dst)
    return


def process_batch(keybatch, src, dst):
    for key in keybatch:
        value = src.get(key)
        try:
            dst.set(key, value)
        except redis.exceptions.ResponseError:
            print("Failed to set key: %s" % key)
            pass


def batcher(iterable, n):
    args = [iter(iterable)] * n
    return zip_longest(*args)


def run():
    parser = argparse.ArgumentParser()
    parser.add_argument('source', type=conn_string_type)
    parser.add_argument('destination', type=conn_string_type)
    options = parser.parse_args()
    migrate_redis(options.source, options.destination)


if __name__ == '__main__':
    run()

Hi,
AFAIK, Redis Enterprise does support DUMP and RESTORE, which version are you using and how did you reach that conclusion? maybe you’re referring to an Active-Active database?

The main performance inefficiency of using GET and SET instead of DUMP and RESTORE is the luck of compression (DUMP uses LZF compression).

However, it seems to me that the main limitation of the migration script you wrote is the lack of pipelining. i.e. you’re wasting two round-trips on each key.
If you’ll change to use pipelines (preferably without transactions), batching 10 keys each time, it would be better (wasting two round trips on 10 keys rather than one).

But even better would be to somehow have a steady stream of GET commands (something like 10 pending commands at every given moment), and forward them as SET commands as soon as each reply is received.

Other limitations of this approach is that it only supports string type keys (doesn’t support hashes and lists, etc), and that it’s not atomic (you’re not getting a “point in time” consistent snapshot of the source.

Maybe these tips will help you in some way:

  1. Redis Enterprise does have a feature called “Replica-Of” that can replicate one database into another very efficiently, and you can enable it via REST API.
  2. If you’re looking to export the data out of Redis Enterprise, you can use the export feature (also REST API), and then import that RDB file into the other database (there are many tools that can do that, by converting the RDB file into a stream of RESTORE commands).