prod recommendation example

Hey Folks-

I was hoping I could get some guidance on a basic query for getting recommended products for a user where:

  • there are person nodes w/ transact edges to …

  • order nodes w/ contains edges to …

  • products (which have inbound edges from people for view and addtocart events, though they’re not relevant to this example)

A visualization of the graph is included as an attachment.

The query I’m trying to come up with is the following: for a given user, I’m trying to generate a set of products that are related to the products found in the users transacted orders.

A query like:

match (p:person)-[:transact]->(:order)-[:contains]->(purchasedProduct:product)<-[:contains]-(:order)-[:contains]->(reccomendedProduct:product) where p.id=3932 return distinct reccomendedProduct.id

… seems like it should work but returns no results. A few things:

  • the returned distinct recommendedProduct ids are not removing the products themselves from their list and they should (i.e. distinct reccomendedProduct - purchasedProduct) – it’s unclear whether or not this is possible because of the directionality of the edge (from order to product, but we’re going from product to order…)

  • the last point assumes that I can travel to the related products order node and then fetch all related, products – this shouldn’t be an issue as the relationship is directional from the order to the product (via the :contains edge)

Do you happen to have any pointers on what might be wrong?

This is from a post I wrote a while back: https://www.joshdurbin.net/posts/2020-1-redis-graph-product-recommendation-part-1-data-loading/.

Thanks.

Hi Joshua

  1. First I want to understand: The query returns no results in every scenario?

  2. The recommended products are products that the client didn’t but at all? They are products that not connected to the client by any of their orders?

  3. I didn’t quite understand the last point. Can you elaborate?

Hy Dvir-

I think things are working, actually, I’m not sure if its the most performant at this time, but this the query:

match (p:person)-[:transact]->(:order)-[:contain]->(:product)<-[:contain]-(:order)-[:contain]->(prd:product) where p.id=${personId} return distinct prd

… the edge was improperly defined in the query.

Howdy!

I have an iteration on the original question related to efficient filtering:

I’m having trouble removing or filtering entries from a query. The query is the one in the my last response that takes a person ID and returns all the products found in orders that share a common product with an order that person placed.

match (p:person { id: 493 })-[:transact]->(:order)-[:contain]->(:product)<-[:contain]-(:order)-[:contain]->(prd:product) return distinct pro

This query is very crude to begin with, but there are some glaring obvious problems. The first I’m trying to resolve is removing the products the person ordered themselves from the query. To do that I’ve come up with

match (p:person {id: 493 })-[:transact]->(:order)-[:contain]->(prod:product) match (prod)<-[:contain]-(:order)-[:contain]->(rec_prod:product) return distinct rec_prod

Great! So, these two queries return the same values, see (with a count):

127.0.0.1:6379> graph.query prodrec "match (p:person { id: 493 })-[:transact]->(:order)-[:contain]->(:product)<-[:contain]-(:order)-[:contain]->(prd:product) return distinct count(prd)"
1) 1) "count(prd)"
2) 1) 1) (integer) 838
3) 1) "Query internal execution time: 2.208400 milliseconds"
127.0.0.1:6379> graph.query prodrec "match (p:person {id: 493 })-[:transact]->(:order)-[:contain]->(prod:product) match (prod)<-[:contain]-(:order)-[:contain]->(rec_prod:product) return distinct count(rec_prod)"
1) 1) "count(rec_prod)"
2) 1) 1) (integer) 838
3) 1) "Query internal execution time: 2.217700 milliseconds"

The person used in this example, person 493, has purchased 19 products in this mock data set.

127.0.0.1:6379> graph.query prodrec "match (p:person { id: 493 })-[:transact]->(:order)-[:contain]->(prd:product) return distinct count(prd)"
1) 1) "count(prd)"
2) 1) 1) (integer) 19
3) 1) "Query internal execution time: 0.638100 milliseconds"

When I go, then, to exclude these 19 products, instead of getting 838-19=819 products I get 838*2+1=1675 (not sure about the source arithmetic – that is, if that’s where that value comes from, but it seems reasonable).

The query for this is:

127.0.0.1:6379> graph.query prodrec "match (p:person { id: 493 })-[:transact]->(:order)-[:contain]->(prod:product) match (prod)<-[:contain]-(:order)-[:contain]->(rec_prod:product) where not prod.id=rec_prod.id return distinct count(rec_prod)"
1) 1) "count(rec_prod)"
2) 1) 1) (integer) 1675
3) 1) "Query internal execution time: 4.138700 milliseconds"

The expected value of 819 is confirmed by the following query, though it is slow…

127.0.0.1:6379> graph.query prodrec "match (p:person { id:493 } )-[:transact]->(:order)-[:contain]->(prod:product)<-[:contain]-(:order)-[:contain]->(rec_prod:product) where not (p)-[:transact]->(:order)-[:contain]->(rec_prod) and p.id=493 return distinct count(rec_prod)"
1) 1) "count(rec_prod)"
2) 1) 1) (integer) 819
3) 1) "Query internal execution time: 29.235000 milliseconds"

What am I doing wrong with the query that returns 1675:

match (p:person { id: 493 })-[:transact]->(:order)-[:contain]->(prod:product) match (prod)<-[:contain]-(:order)-[:contain]->(rec_prod:product) where not prod.id=rec_prod.id return distinct count(rec_prod)?

Cheers!

Let’s assume there’s just a single person in the graph, that person has just a single order, the order contains N products, with the filter: “where not prod.id=rec_prod.id” each product (prod) will add N-1 products (rec_prod).
This is probably not what you’re after.

The filter on the second query: not §-[:transact]->(:order)-[:contain]->(rec_prod)
will exclude all rec_prod which are in an order with product p.

That makes sense re the n-1 query issue. The suggested query update is functionally identical to the one mentioned in my last post, but adds considerable latency. Expand Into seems to add 4-5x the latency on the query response compared to the query that doesn’t filter.

Non filtering:

127.0.0.1:6379> graph.explain prodrec "match (p:person { id: 339 })-[:transact]->(:order)-[:contain]->(prod:product) match (prod)<-[:contain]-(:order)-[:contain]->(rec_prod:product) return distinct rec_prod.id, rec_prod.name"
1) "Results"
2) "    Distinct"
3) "        Project"
4) "            Conditional Traverse | (p:person)->(rec_prod:product)"
5) "                Index Scan | (p:person)"

Filtering:

127.0.0.1:6379> graph.explain prodrec "match (p:person { id: 339 })-[:transact]->(:order)-[:contain]->(prod:product) match (prod)<-[:contain]-(:order)-[:contain]->(rec_prod:product) where not (p)-[:transact]->(:order)-[:contain]->(rec_prod) return distinct rec_prod.id, rec_prod.name"
1) "Results"
2) "    Distinct"
3) "        Project"
4) "            Anti Semi Apply"
5) "                Conditional Traverse | (p:person)->(rec_prod:product)"
6) "                    Index Scan | (p:person)"
7) "                Expand Into | (p:person)->(rec_prod:product)"
8) "                    Argument"

In those examples, person 339 had only a few results from the original query – maybe 30. If I grab the customer with the most orders and query for them the delta is much great as is the result set – a result set of 1349 elements filtering took: 56.2 milliseconds while the non-filtering took 12.4 milliseconds. If I grab the next highest ordering customer and run the non-filtering query first (to account for potential query caching in the last example) it takes 9.7ms for a result set of 1249 (comparable-ish to the former) and the filtering query takes 52.1ms.

It would seem like the traverse should be the most expensive part of the operation, but doesn’t appear to be, unless it happens during the traversal, adding complexity.

The filter in this case is a two hop traversal operation, performed by the Expand-Into operation (as both the source and the destination of the path are resolved)
This adds a significant amount of work to perform.