Indexing fields with diacritics

Hello,
In Redisearch 1.x I was storing fields with diacritics inside the hash, and replacing the diacritics with their equivalent in the English language during the FT.ADD operation, which then allowed me to match with or without diacritics during search or aggregation as I was also replacing the accents of the search terms.

What would be the most efficient way to do this in Redisearch 2.0? I could duplicate each property on my object of course without diacritics, and use it for indexing, however I am hoping there is a better way to achieve this.

Thanks!

Can you explain in more detail how you did it on RediSearch 1.x? (maybe give an example).

From what I understand your indexed data didn’t match the data inside the Redis hash? Is that correct?

Sure, here is an example:

  1. Inside the hash I would store “Rémi”
  2. I would then replace the “é” with “e” and index: “Remi” via FT.ADD - since I was able to manually index the data, I could make this change on the fly while indexing the document.
  3. During a search or aggregation I would also replace any accents with their English version prior to sending it over to Redis, which meant that whether the user searched for “Rémi” or “Remi”, the search would return the indexed record. Finally since I was also storing the hash key inside the index I would then retrieve the hash with all of the other fields stored and return it to the user.

Does this make sense?

There is no way currently to index “Rémi”, and automatically expect Redisearch to handle all of this logic behind the scenes is there?

Thanks again

@paulflo sorry for the late reply (was consulting with the team). Unfortunately we do not see a good way to do it other then replicate the data (as you suggested).
If you are “brave” you can write your own data expender (https://oss.redislabs.com/redisearch/Extensions/#the_query_expander_api) that will index the data the way you want, let me know if you want to try it and I will be happy to guide you.

Hi @meirsh, thanks for looking into it. I am using Redis Enterprise Cloud, so even if I were brave enough, which I am not, I am guessing I wouldn’t be able to add my own expanders.

I will just stick to creating a duplicate property for the fields that could have diacritics.

Thanks again