Performance problems

Hi,
I tried graph-500-22 benchmark with RedisGraph v2.2 . However, both the performance with multi-client and multi-omp-thread is unexpected. As for 1-client case, my result is like this:

omp_thread = 1 omp_thread = 24
q50 451.60382100000004 480.36615900000004
q99 3559.2315018 1001.8522311799999

It’s quite different from your result shown in RedisGraph 2.0 Boosts Performance Up to 6x | Redis Labs
I wonder how to explain it. Maybe I missed some settings? And here is the env and database I used in my test:

graph-500-22 : LDBC Graphalytics
Machine parameters : logical cpu : 24 ; Mem : 64254 MB ;
script for test :

#!/bin/sh

date;
# rm -f *.result;
rm -f *.content;
rm -f *.temp;
seed_range=`cat "graph500-22/graph500-22.v" | wc -l`;
echo "Node Count: $seed_range";

for((i=1;i<=$1;i++))
do
{
  for((j=1;j<=$2;j++))
  do
  {
    rand=10#$(date +%N);
    line=$(($rand%${seed_range}+1));
    seed=`sed -n "${line}p" "graph500-22/graph500-22.v"`;
    echo "graph.query GRAPH1 \"MATCH(n:vertex)-[*$3]->(m) WHERE n.id_in_graph=$seed RETURN count(m)\"" >
>cli${i}.temp;  }
  done
  # cat cli${i}.temp >> $1-cli-$2-query-$3-hop.content;
}&
done
wait

for((i=1;i<=$1;i++))
do
{
  cat cli${i}.temp >> $1-cli-$2-query-$3-hop.content;
  cat cli${i}.temp | /usr/local/redis-6.0.8/src/redis-cli >> $1-cli-$2-query-$3-hop.result;
}&
done
wait

rm -f *.temp;
date;

Thanks!

Hi there @CacaoGatto IMHO you should not use redis-cli for any type of benchmarking ( and also piping data from disk that will ultimately limit your client performance ). It was simply not designed for it.
Bottom line on any type of benchmarks you should ensure that the client that is measuring it is not the bottleneck.

Now focusing on using a benchmark tool for Redis you can use redis-benchmark, memtier_benchmark, or any other purpose built tool.

blog benchmark testing methodology

Specifically for this blog post we’ve used memtier_benchmark, as clearly stated on the benchmark details:

To get steady-state results, we discarded the previous Python benchmark client in favor of the memtier_benchmark, which provides low overhead and full-latency-spectrum latency metrics.

For each tested version, we performed:

1-Hop 22M queries
2-Hop 220K queries
3-Hop 22K queries
6-Hop 22K queries

All queries were under a concurrent parallel load of 22 clients. We reported the median (q50) and achievable throughput.

In order to ensure steady stable results we’ve repeated each benchmark variation a minimum of 3 times.
All benchmark variations were run on Amazon Web Services instances, provisioned through our benchmark-testing infrastructure. Both the benchmarking client and database servers were running on separate c5.12xlarge instances. The tests were executed on a single-shard setup, with RedisGraph versions 1.2 and 2.0.5.
We’ve also added an public facing rdb link:

To make it easy for anyone to replicate our results, here is a public link to the persistent store graph 500 dataset with scale 22 on RDB format.

OMP threads showcase table

Concerning the OpenMP effect showcase table you should start a standalone redis instance as follow:

22 threads variation

OMP_NUM_THREADS=22 redis-server --protected-mode no --save "" --daemonize yes --port 6379 --appendonly no --loadmodule ./redisgraph.so THREAD_COUNT 1

1 threads variation

OMP_NUM_THREADS=1 redis-server --protected-mode no --save "" --daemonize yes --port 6379 --appendonly no --loadmodule ./redisgraph.so THREAD_COUNT 1

benchmark command

To run the benchmark ( on a separate client VM ) do as follows:

memtier_benchmark --server <### YOUR DB IP  ###> --port 6379 -n 1000 -c 1 -t 1 --hide-histogram --command="graph.query graph500_22 \"MATCH (s:graph500_22_unique_node_out)-[*6]->(t) WHERE ID(s)=__key__ RETURN count(t)\"" --key-maximum=100000 --distinct-client-seed --command-key-pattern G

Looking at our blog results internal doc I can provide you the outputs used to produce the table:

OMP_NUM_THREADS=1

# for i in {1..3}; do memtier_benchmark --server 10.3.0.31 --port 6379 -n 1000 -c 1 -t 1 --hide-histogram --command="graph.query graph500_22 \"MATCH (s:graph500_22_unique_node_out)-[*6]->(t) WHERE ID(s)=__key__ RETURN count(t)\"" --key-maximum=100000 --distinct-client-seed --command-key-pattern G; sleep 60; done
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 167 secs]  0 threads:        1000 ops,       4 (avg:       5) ops/sec, 1.13KB/sec (avg: 1.34KB/sec), 202.67 (avg: 167.93) msec latency

1         Threads
1         Connections per thread
1000      Requests per client


ALL STATS
=======================================================================================================
Type              Ops/sec    Avg. Latency     q50 Latency     q99 Latency   q99.9 Latency       KB/sec
-------------------------------------------------------------------------------------------------------
Graph.querys         1.48       674.31900       539.64700      2191.35900      2396.15900         0.33
Totals               1.48       674.31900       539.64700      2191.35900      2396.15900         0.33
(...)
(...)

OMP_NUM_THREADS=22

# for i in {1..3}; do memtier_benchmark --server 10.3.0.31 --port 6379 -n 1000 -c 1 -t 1 --hide-histogram --command="graph.query graph500_22 \"MATCH (s:graph500_22_unique_node_out)-[*6]->(t) WHERE ID(s)=__key__ RETURN count(t)\"" --key-maximum=100000 --distinct-client-seed --command-key-pattern G; sleep 60; done
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 167 secs]  0 threads:        1000 ops,       4 (avg:       5) ops/sec, 1.13KB/sec (avg: 1.34KB/sec), 202.67 (avg: 167.93) msec latency

1         Threads
1         Connections per thread
1000      Requests per client


ALL STATS
=======================================================================================================
Type              Ops/sec    Avg. Latency     q50 Latency     q99 Latency   q99.9 Latency       KB/sec
-------------------------------------------------------------------------------------------------------
Graph.querys         5.95       167.92600       175.74300       353.02300       370.68700         1.34
Totals               5.95       167.92600       175.74300       353.02300       370.68700         1.34
(...)
(...)

Hope the above commands for starting Redis and running memtier should be enough for you to replicate results. Please do follow up with further requests or aknowledge that now this are ok.
Kind regards,
Filipe

1 Like

Thanks for your solution. I’ve tried Memtier_benchmark but there are still 2 doubts:

  1. how to solve the error “key placeholder can’t combined with other data”? My input is basically the same as yours. I checked the source code and found that the error was caused by this:
// check arg type
        if (current_arg->data.find(KEY_PLACEHOLDER) != std::string::npos) {
            if (current_arg->data.length() != strlen(KEY_PLACEHOLDER)) {
                benchmark_error_log("error: key placeholder can't combined with other data\n");
                return false;
            }

            current_arg->type = key_type;

Will simply changing the return value help?

  1. why the performance in this benchmark could be much better than simply launching client? I’ve read the source code of RedisGraph and here is the part of timing. I think it simply measures the performance of server, but do nothing with clients.
    QueryCtx_BeginTimer(); // Start query timing.
    /* Retrive the required execution items and information:
     * 1. AST
     * 2. Execution plan (if any)
     * 3. Whether these items were cached or not */
    AST *ast = NULL;
    bool cached = false;
    ExecutionPlan *plan = NULL;
    ExecutionCtx exec_ctx = ExecutionCtx_FromQuery(command_ctx->query);

    ast = exec_ctx.ast;
    plan = exec_ctx.plan;
    cached = exec_ctx.cached;
    ExecutionType exec_type = exec_ctx.exec_type;
    // See if there were any query compile time errors
    if(QueryCtx_EncounteredError()) {
        QueryCtx_EmitException();
        goto cleanup;
    }
    if(exec_type == EXECUTION_TYPE_INVALID) goto cleanup;

    bool readonly = AST_ReadOnly(ast->root);
    if(!readonly && _readonly_cmd_mode(command_ctx)) {
        QueryCtx_SetError("graph.RO_QUERY is to be executed only on read-only queries");
        QueryCtx_EmitException();
        goto cleanup;
    }

    // Set the query timeout if one was specified.
    if(command_ctx->timeout != 0) {
        if(!readonly) {
            // Disallow timeouts on write operations to avoid leaving the graph in an inconsistent state.
            QueryCtx_SetError("Query timeouts may only be specified on read-only queries");
            QueryCtx_EmitException();
            goto cleanup;
        }

        Query_SetTimeOut(command_ctx->timeout, plan);
    }

    bool compact = command_ctx->compact;
    ResultSetFormatterType resultset_format = (compact) ? FORMATTER_COMPACT : FORMATTER_VERBOSE;

    // Acquire the appropriate lock.
    if(readonly) {
        Graph_AcquireReadLock(gc->g);
    } else {
        Graph_WriterEnter(gc->g);  // Single writer.
        /* If this is a writer query we need to re-open the graph key with write flag
        * this notifies Redis that the key is "dirty" any watcher on that key will
        * be notified. */
        CommandCtx_ThreadSafeContextLock(command_ctx);
        {
            GraphContext_MarkWriter(ctx, gc);
        }
        CommandCtx_ThreadSafeContextUnlock(command_ctx);
    }
    lockAcquired = true;

    // Set policy after lock acquisition, avoid resetting policies between readers and writers.
    Graph_SetMatrixPolicy(gc->g, SYNC_AND_MINIMIZE_SPACE);
    result_set = NewResultSet(ctx, resultset_format);
    // Indicate a cached execution.
    if(cached) ResultSet_CachedExecution(result_set);

    QueryCtx_SetResultSet(result_set);
    if(exec_type == EXECUTION_TYPE_QUERY) {  // query operation
        ExecutionPlan_PreparePlan(plan);
        result_set = ExecutionPlan_Execute(plan);

        // Emit error if query timed out.
        if(ExecutionPlan_Drained(plan)) QueryCtx_SetError("Query timed out");

        ExecutionPlan_Free(plan);
        plan = NULL;
    } else if(exec_type == EXECUTION_TYPE_INDEX_CREATE ||
              exec_type == EXECUTION_TYPE_INDEX_DROP) {
        _index_operation(ctx, gc, ast, exec_type);
    } else {
        assert("Unhandled query type" && false);
    }
    QueryCtx_ForceUnlockCommit();
    ResultSet_Reply(result_set);    // Send result-set back to client.

Hi there @CacaoGatto , let me check the the changes I’ve made on memtier to support it and reply back… ( I had the idea that I had submitted that change in a PR )
I noticed you’ve opened the issue on GH also on memtier :+1:

Need a few hours to check and reply back.

Hi there @CacaoGatto I took the original branch from my fork of memtier that I’ve used to run the benchmark 1 year ago and prepared an updated PR to memtier ( fixing the issue you’ve opened ). Enable key placeholder to be combined with other data within the same command argument by filipecosta90 · Pull Request #154 · RedisLabs/memtier_benchmark · GitHub
You can use the following branch GitHub - filipecosta90/memtier_benchmark at key.placeholder to test right away.

I have tried this branch and it is working well. Thanks for your help!
The issue on GitHub has been closed as well

Hi there @fcosta_oliveira these days I have tested RedisGraph in different case with your branch and when I tried sending 1K 6-hop requests with 22 clients with the following command: (redis-server is set to 22 omp threads and 22 thread pool threads)

memtier_benchmark -c 1 -t 22 -n 1000 --hide-histogram --command="graph.query graph500_22 \"MATCH (s:graph500_22_unique_node_out)-[*6]->(t) WHERE ID(s)=__key__ RETURN count(t)\"" --key-maximum=100000 --distinct-client-seed --command-key-pattern G --key-prefix=""

the ALL STAT table I got is like this:

22        Threads
1         Connections per thread
1000      Requests per client


ALL STATS
=======================================================================================================
Type              Ops/sec    Avg. Latency     p50 Latency     p99 Latency    p100 Latency       KB/sec 
-------------------------------------------------------------------------------------------------------
Graph.querys        12.66       252.50357        38.65500      1036.28700      1048.57500         3.19 
Totals              12.66       252.50357        38.65500      1036.28700      1048.57500         3.19 

Obviously the latency seems too short compared with the 3-hop case with same other parameter:

22        Threads
1         Connections per thread
1000      Requests per client


ALL STATS
=======================================================================================================
Type              Ops/sec    Avg. Latency     p50 Latency     p99 Latency    p100 Latency       KB/sec 
-------------------------------------------------------------------------------------------------------
Graph.querys        48.23       379.90358       299.00700      1032.19100      1048.57500        12.09 
Totals              48.23       379.90358       299.00700      1032.19100      1048.57500        12.09 

I checked the content and noticed that Ops/sec * Avg. Latency / 1000 does not equal to 22 (22 clients on 22 threads). This occurs in the 3-hop and 6-hop cases, but not in 1-hop and 2-hop cases. However, the info printed in the process appears right:

[RUN #1 100%, 1770 secs]  1 threads:       21998 ops,       5 (avg:      12) ops/sec, 1.52KB/sec (avg: 3.13KB/sec)
[RUN #1 100%, 1770 secs]  0 threads:       22000 ops,       5 (avg:      12) ops/sec, 1.52KB/sec (avg: 3.13KB/sec)
, 1569.84 (avg: 1770.28) msec latency
# 1770.28 * 12.66 / 1000 = 22.4

I wonder which Latency I should refer to, and why do these two value turn to be different?