RedisGraph 72k nodes, 400k relationships - slow query performance help

Hello,

I’ve got what I think is a fairly simple graph of Person nodes tagged with an id attribute and some nodes optionally tagged with a special attribute. There is a :contributesTo self-relation between Person nodes.

Here are two Cypher files for loading the nodes and relations:

I’m trying to find all the Person nodes who ultimately have an ancestral :contributesTo relationship to another Person with the special attribute set, decided recursively (so if a person contributes to a non-special person, but that person then contributes to a “special” person, that initial person would still show up in the result).

So I wrote the query like this:

MATCH (d)-[:contributesTo*1..]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

But that just seems to hang indefinitely.

So then I tested specific cutoff depths.

This took 2039.9ms according to SLOWLOG:

MATCH (d)-[:contributesTo*0..1]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

This took 6712ms:

MATCH (d)-[:contributesTo*0..2]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

And this took 6.4243e+05ms:

MATCH (d)-[:contributesTo*0..3]->(r) WHERE r.special = 'T' RETURN DISTINCT d.id

So it’s obviously getting slower with increasing depth.

I do have indexes on the id and special node attributes:

CREATE INDEX ON :Person(id)
CREATE INDEX ON :Person(special)

But I’m pretty new to graph databases so not sure what to do next to make this faster. Any help would be greatly appreciated!

Edit: I just realized my queries weren’t using labels so I tried that -

MATCH (d:Person)-[:contributesTo*1..]->(r:Person) WHERE r.special = 'T' RETURN DISTINCT d.id

but it doesn’t seem to have made a perceptible difference.

Edit2: Forgot to mention this is on Redis 6.2.1, RedisGraph compiled from git SHA 92046602a775dcba2396d493597b60043f0e39bb, on an Ubuntu 20.04 machine with 32GB RAM and an Intel i7 CPU. Also, the queries are still slow without using DISTINCT.

Hi @abe,

The bulk of your query’s time will be spent in expanding paths in the variable-length traversal. Given your sample graph, this is a rather large job! You have 13,731 source nodes to consider. When traversing 1 edge this expands to 152,291 destinations, and when traversing up to 2 edges this expands to 9,264,429 destinations (and so forth):

GRAPH.QUERY Donations "MATCH (d:Person)-[:contributesTo*..1]->(r:Person) WHERE r.special = 'T' RETURN COUNT(d)"
1) 1) "COUNT(d)"
2) 1) 1) (integer) 152291
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 117.677467 milliseconds"

GRAPH.QUERY Donations "MATCH (d:Person)-[:contributesTo*..2]->(r:Person) WHERE r.special = 'T' RETURN COUNT(d)"
1) 1) "COUNT(d)"
2) 1) 1) (integer) 9264429
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 59246.093526 milliseconds"

To improve performance, it would be best to introduce additional constraints to reduce the search space of your queries.