RediSearch tag query performance and use case

Hi everyone, I just finished a few days ago the RU201 RediSearch course on Redis University and i posted a question on their Discord channel. I’ve been told to post it on this forum, so here I am. Thanks to everyone in advance, every input in highly appreciated!

I’ll try to explain the scenario I’m facing in a simple way. I need to store some documents that have an attribute for a path-like string, e.g. path/to/something. This paths can also contain the special character ‘+’, e.g. home/+/bathroom. The ‘+’ essentially means that it can be substituted by any other word, like a special wildcard.

I need to search my document based on this path attribute. Given a particular path (that can’t contain the ‘+’ wildcard), e.g. home/groundfloor/bathroom, I have to find all the documents that match that particular path. I’ll make it clearer with an example.
Let’s say that in my db I have the following 3 documents with these path attributes:

  1. home/+/bathroom
  2. home/groundfloor/bathroom
  3. home/+/bathroom/brightness

Given the path home/groundfloor/bathroom the search must return documents number 2 and 1 (since the ‘+’ matches with any word). I need to get the results as quickly as possible, performance is key in this case.

I’ll describe the approach I came up with:

  1. Create an index with a TAG field, used to store the different parts of the path.
    FT.CREATE idx SCHEMA path TAG
  2. Store the path separating each token in a different tag, here is an example for how to store Document 2, which has home/groundfloor/bathroom as path.
    FT.ADD idx doc2 1.0 FIELDS path "0-home, 1-groundfloor, 2-bathroom"
    The number before each token indicates his position in the path, used for querying (see the next point to understand how).
  3. Example of query given the path home/groundfloor/bathroom.
    FT.SEARCH idx "@path:{0-home | 0-+} @path:{1-groundfloor | 1-+} @path:{2-kitchen | 2-+}".
    I omitted escape characters for readability. This query works and correctly returns the documents number 1 and 2.

Now I have these questions about my approach:

  1. Is it any good? Are there better approaches? Remember that performance is my priority!
  2. I thought about using another numeric field or tag to store the “depth” of the path, which is the number of tokens it contains (e.g. home/bathroom has a depth of 2, where home/groundfloor/kitchen a depth of 3). This would allow me to search by depth too, because if the given path is home/groundfloor/bathroom I already know that I will only be interested in paths with a depth of 3. But since adding another field in the query means another intersection I don’t know if this would do more harm than good. So I thought about storing the depth directly in the tag used for the level 0 token of the path, e.g. Document 2 would be stored as:
    FT.ADD idx doc2 1.0 FIELDS path "0-home(3), 1-groundfloor, 2-bathroom", where the (3) in 0-home(3) would indicate the depth of the path. With this approach I would avoid an intersection in the query with another field used for the depth, possibly improving the performance.

I know it is a lot, but I really hope you could help me on this!!

@farcitoast

Forgive me if I fail to understand your use case, what I get from your explanation is you have records for each room to change attributes like brightness etc. you have multiple floors, properties and there any types of rooms like hall, bedroom , bathroom etc. you want to find attribute values to specific room on specific floor and property.

assuming this is the use case and you have exact attribute values to search for then you can create index using TAG or non-indexed text fields and search as below:

Create Index using TAG:

>> FT.CREATE idx SCHEMA property TAG propertynumber TAG floornumber TAG roomtype TAG roomnumber TAG brightness TAG 
"OK"   

Add test records:

FT.ADD idx doc1 1.0 FIELDS property home propertynumber 305 floornumber 3 roomtype hall roomnumber 305-1 brightness 0.5
FT.ADD idx doc2 1.0 FIELDS property home propertynumber 305 floornumber 3 roomtype bedroom roomnumber 305-2 brightness 0.5
FT.ADD idx doc3 1.0 FIELDS property home propertynumber 405 floornumber 4 roomtype bathroom roomnumber 405-1 brightness 1
FT.ADD idx doc4 1.0 FIELDS property home propertynumber 405 floornumber 4 roomtype bedroom roomnumber 405-2 brightness 0.5
FT.ADD idx doc5 1.0 FIELDS property home propertynumber 405 floornumber 4 roomtype bathroom roomnumber 405-3 brightness 0.4

Sample search:

>> FT.SEARCH idx "@property:{home} @floornumber:{3} " RETURN 2 brightness roomumber
1) (integer) 1
2) "doc2"
3) 1) "brightness"
   2) "0.5"
>>  FT.SEARCH idx "@property:{home} @floornumber:{4} @roomtype:{bathroom}" RETURN 2 brightness roomnumber
1) (integer) 2
2) "doc5"
3) 1) "brightness"
   2) "0.4"
   3) "roomnumber"
   4) "405-3"
4) "doc3"
5) 1) "brightness"
   2) "1"
   3) "roomnumber"
   4) "405-1"

This gives you more flexibility not only reaching to specific room on specific property and then floor but also do aggregation, groupby, or sorting or filter etc.

hope this helps. to gain better performance you may create multiple small indexes based on property types or areas or floors.

Regards,
Suyog

Thanks for your answer, but that’s not my use case. I don’t have floornumber, roomtype etc., it was just an example. All that I have to store for each record is a path (path/to/something) that can be anything, not only floornumber etc. I could simply store it in a TEXT field, but I’m trying to figure out the most efficient way to store it in order to get the documents quickly with a query, so I tried this approach:

I’ll try to explain my goal again. I have to search documents based on the path. I want to be able to get all the documents that match a particular path. Let’s say I want to find all the documents that match the home/groundfloor/bathroom path and these are the documents in my database:

If I search for home/groundfloor/bathroom the query should return me documents 1 and 2, because they have a path that matches the one that I’m searching. The whole purpose of this post is to find the best strategy to store this path inside the documents in order to be able to perform queries like the one I just described as quickly as possible.

In above case, if user search for home/groundfloor/bathroom the query should return 1, 2, 3,4,6 ?

No, just 1 and 2 because the ‘+’ has to be substituted by something that isn’t empty.