RedisGraph replica replicating slowly

ludorb · December 19, 2022, 4:02pm

Hey,

I use RedisGraph version 2.8.17 ot to be more exact the docker.io/redislabs/redisgraph@sha256:8db03866292e35f2b3e2eb1e2df6c56299510b837f90d146e84b9615d5ad5a0d docker image.

When I write some data to RedisGraph it serializes it in around 4minutes, but then it takes 45minutes to replicate to the replica. I use the replica for read queries.
I have read some vanilla Redis documentation on replication. I use the default repl-backlog-size and the default client-output-buffer-limit replica.
I use a more aggressive AOF rewrite settings as reading of AOF is relatively slow.

Update 1 (to address the comment from SWilly22):

There are reads on the Replica but not often (one every 5 seconds).

Hardware of the Master and replica are the same. CPU is not capped.

I have looked at some historical data if the delay is 25K the replica keeps up. If its more that that something weird is happening and the number of commands on the replica falls down to15ops/s, but the CPU utilization is 100%. The repl delay is reducing very slowly.

End of Update 1

Any ideas on how to optimize this?

Best Tim

SWilly22 · December 20, 2022, 6:48am

The way replication currently works on RedisGraph V2.8.17 is a follows:
GRAPH.QUERY commands which modify the underline graph in any way e.g. introduce a new node, delete an edge, processed by the master are replicated as is to the replica you can see the replicated commands if you run the MONITOR command on the replica.

Assuming both master and its replica have the same resources (hardware) and both represent the same graph (no data inconsistency) the time to execute the same GRAPH.QUERY command should be the same (given) the replica isn’t processing additional READ commands.

Will you be able to validate the above assumptions ?
Thanks!

ludorb · December 20, 2022, 1:01pm

Hey, tnx for you questions. I hope I have answered you assumptions in update 1 in the original question.

Here is another example. I have ingested more data into the redis master.

And made the following observations:

It’s similar then the situation above
After a while there is a message on the slave

1:S 20 Dec 2022 10:40:27.877 # Closing client that reached max query buffer length: id=141 addr=170.36.6.4:6379 laddr=170.36.1.6:60932 fd=13 name= age=46279 idle=0 flags=Mb db=0 sub=0 psub=0 multi=-1 qbuf=1073746113 qbuf-free=268431157 argv-mem=406 obl=0 oll=0 omem=0 tot-mem=1342198190 events=r cmd=graph.QUERY user=(superuser) redir=-1 (qbuf initial bytes: "*3\r
$11\r
GRAPH.QUERY\r
$16\r
banana-split-serverless\r
$374\r
 MERGE (sourc") [log truncated]

where the addr is the masters IP
and laddr the slaves
Is this the message that the client buffer filled up?
This happens twice
And it seems that a full sync improves the situation

SWilly22 · December 20, 2022, 1:26pm

Seems like the slave is unable to keep up.
Can you please share a number write queries you’re issuing? accompanied by their execution-plan
GRAPH.EXPLAIN <GRAPH-KEY> <QUERY>

I want to make sure both the master and the slave produce the same execution-plan for a given query.

ludorb · December 20, 2022, 1:46pm

I am not sure but must be a lot of write queries, but I am not really sure. Redis metrics report 2k commands/s for a while (one hour)

I can implement a metric and come back with more exact data.

I don’t think I can run an explain on the replica, as it is RO.

Is the current slow performance of the replica just by design/not jet optimized? I know it’s a lot of data for the master, but it seems it’s eating it up much more efficient than the replication?

SWilly22 · December 20, 2022, 2:07pm

You can configure the replica to accept write commands if you issue the following command:
CONFIG SET slave-read-only no on the replica

ludorb · December 21, 2022, 10:35am

Tnx, for your time.

I had to fix the issue from yesterday and found a workaroud if the replica is stuck in the slow replication state.

It seems that the full re-sync solves the issue immediately.
So one workaround strategy, that I can use is to after large ingestion and after the master processes the writes. I can go onto the slave and write SLAVEOF NO ONE and then SLAVEOF master_addr port. It seem to force the full resync.

I was also playing with client-output-buffer-limit slave if there is to large of a buffer for the replica to force a full resync. Without much luck. I put the soft limit down to 4mb and 600 seconds. Nothing happened.

I am still not sure if this is a bug or just a performance limitation of RedisGraph replication.

Do you want the explains when the replica is in this bad slow replication state? I can also wipe AOF and restart the replica to resync as a complete blank state.

Topic		Replies	Views
Stability: RedisGraph Startup time and "module is currently replicating" issues Redis modules redisgraph	2	935	March 12, 2021
Concurrent connections request drop off RedisGraph	10	785	February 17, 2022
Concurrency bug with RedisGraph RedisGraph	30	1119	May 3, 2023
Functionality / Roadmap / .NET Client RedisGraph	1	1216	July 17, 2019
Performance problems RedisGraph	6	1227	January 19, 2021

RedisGraph replica replicating slowly

Related Topics