When I write some data to RedisGraph it serializes it in around 4minutes, but then it takes 45minutes to replicate to the replica. I use the replica for read queries.
I have read some vanilla Redis documentation on replication. I use the default repl-backlog-size and the default client-output-buffer-limit replica.
I use a more aggressive AOF rewrite settings as reading of AOF is relatively slow.
There are reads on the Replica but not often (one every 5 seconds).
Hardware of the Master and replica are the same. CPU is not capped.
I have looked at some historical data if the delay is 25K the replica keeps up. If its more that that something weird is happening and the number of commands on the replica falls down to15ops/s, but the CPU utilization is 100%. The repl delay is reducing very slowly.
The way replication currently works on RedisGraph V2.8.17 is a follows: GRAPH.QUERY commands which modify the underline graph in any way e.g. introduce a new node, delete an edge, processed by the master are replicated as is to the replica you can see the replicated commands if you run the MONITOR command on the replica.
Assuming both master and its replica have the same resources (hardware) and both represent the same graph (no data inconsistency) the time to execute the same GRAPH.QUERY command should be the same (given) the replica isn’t processing additional READ commands.
Will you be able to validate the above assumptions ?
Thanks!
Seems like the slave is unable to keep up.
Can you please share a number write queries you’re issuing? accompanied by their execution-plan GRAPH.EXPLAIN <GRAPH-KEY> <QUERY>
I want to make sure both the master and the slave produce the same execution-plan for a given query.
I am not sure but must be a lot of write queries, but I am not really sure. Redis metrics report 2k commands/s for a while (one hour)
I can implement a metric and come back with more exact data.
I don’t think I can run an explain on the replica, as it is RO.
Is the current slow performance of the replica just by design/not jet optimized? I know it’s a lot of data for the master, but it seems it’s eating it up much more efficient than the replication?
It seems that the full re-sync solves the issue immediately.
So one workaround strategy, that I can use is to after large ingestion and after the master processes the writes. I can go onto the slave and write SLAVEOF NO ONE and then SLAVEOF master_addr port. It seem to force the full resync.
I was also playing with client-output-buffer-limit slave if there is to large of a buffer for the replica to force a full resync. Without much luck. I put the soft limit down to 4mb and 600 seconds. Nothing happened.
I am still not sure if this is a bug or just a performance limitation of RedisGraph replication.
Do you want the explains when the replica is in this bad slow replication state? I can also wipe AOF and restart the replica to resync as a complete blank state.