Design for quick repeat creation of relationships

Hey! If I have a project that has, say, a set of nodes like this:

  • 250m persons
  • 2m companies
  • 200k locations
    If I use the bulk-loader to create those nodes once.

And then I essentially want to repeat the following:

  1. Create 2m to 10m relationships
  2. Perform some analysis, queries, etc.
  3. Delete all of the relationships (without disturbing the nodes)
    Then what would be the best way to accomplish this with the best efficiency? Through an API?

Is this viable with that number of relationships?

Thanks for any help.

Hi Warren,

There currently is no API to perform large batch modifications on existing graphs, but we have heard this feature request a few times and hope to roll out a solution that will operate similarly to the bulk-loader within 2-3 months.

At present, you would have to create your relationships through a Cypher query, which at that scale I would expect to take at least 30-60 seconds to execute (very rough estimate). We are currently investigating some potential bottlenecks in relationship creation, but this is likely to be a rather expensive operation regardless.

I’m curious about the use case you’re trying to address with this approach, though. Are we able to re-model the problem such that you can use persistent relationships rather than ephemeral ones? This would be dramatically more efficient than any approach that relies heavily on writing and deleting. I’d be happy to help you try to model this if you can provide additional information!

  • Jeff Lovitz

Jeff,

Hey! Thank you for the response. A batching feature similar to the bulk-loader sounds promising!

In this sample use case, if I have very many billions of relationships – where the whole amount is more than can be stored on a single RedisGraph server, and where analysis of subsets of 1m to 100m relationships make sense, then that would drive the desire to swap subsets of relationships in and out.

Which leads to a related question. Would it be advantageous to represent some information as nodes, rather than as relationships, in order to make the matrix algebra behind those queries run faster? Or does it not make a speed difference whether something is a node or a relationship?