How can I make whitespace remain as is after tokenization (like the underscore)?

Hello,

According to the document https://oss.redislabs.com/redisearch/Escaping/ ,the underscore is ignored by the tokenizer. Now I would like to make the whitespace to be ignored as well.

I tried to remove the whitespace ( [’ '] ) from the ToksepMap_g in this file https://github.com/RedisLabsModules/RediSearch/blob/master/src/toksep.h , then recompiled the source code, but it didn’t work.

Any suggestion would be truly appreciated!

Nha.

Why not using tags? We only tokenize tags by comma and you can also change it on index creation (notice that you will have to escape the spaces on queries).

Hi, thank you for your quick response.

We need to utilize the stemming of TextField to do some prefix search. For example, we would like to search “Harry Po*” and get “Harry Potter” as a result, but we also want to make “Harry Potter” become a single word, so when the user search for “Potter” only, they won’t get any result. I believe TagField cannot fulfill this requirement.

Best,

Nha.

Vào 15:02:48 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

You will have to index the data both as TAG and as TEXT, then issue a union query on the TAG field and on the TEXT field.

Could you please explain your idea more detail? How can a union query on TAG and TEXT fields can give 0 result when user search for “Potter” only?

thank you.

best,

Vào 15:56:46 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

I think I missed understood you, you do not want results when the user searches for “Potter” only? If so then the TAG field is totally good for you:

127.0.0.1:6379> FT.CREATE idx SCHEMA name TAG SORTABLE

OK

127.0.0.1:6379> FT.ADD idx doc1 1.0 FIELDS name “Harry Potter”

OK
127.0.0.1:6379> FT.SEARCH idx “@name:{Harry\ Potter}”

  1. (integer) 1

  2. “doc1”

    1. name
  3. “Harry Potter”

127.0.0.1:6379> FT.SEARCH idx “@name:{Harry\ Po*}”

  1. (integer) 1

  2. “doc1”

    1. name
  3. “Harry Potter”
    127.0.0.1:6379> FT.SEARCH idx “@name:{Potter}”

  4. (integer) 0

Notice that as I mentioned before, you need to escape spaces on the query.

This is perfect. Thank you so so so much!

Is there anyway to solve the same problem, but we add the “weight” to the context. As far as I know, the TAG field does not have Weight.

Thank you again.

best,

Vào 16:39:41 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

Hi, I found my answer, hopefully this may help someone else:

127.0.0.1:6379> FT.CREATE idx SCHEMA name TAG SORTABLE city TAG

OK

127.0.0.1:6379> FT.ADD idx doc1 1.0 FIELDS name “Harry Potter” city “California”

OK

127.0.0.1:6379> FT.ADD idx doc2 1.0 FIELDS name “Someone else” city “Harry Potter state”

OK

127.0.0.1:6379> FT.SEARCH idx “@name:{harry\ p*} => { $weight: 1.0; $slop: 1; $inorder: true; } | @city:{harry\ p*} => { $weight: 2.0; $slop: 1; $inorder: true; }”

  1. (integer) 2

  2. “doc2”

    1. name

    2. “Someone else”

    3. city

    4. “Harry Potter state”

  3. “doc1”

    1. name

    2. “Harry Potter”

    3. city

    4. “California”

Again, thank you so much for your wonderful guidance.

Best,

Nha.

Vào 16:51:41 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, phqnha đã viết:

Sure, happy I could help.

Sorry I did not test it carefully. Look like the order is determined by the order when inserting the documents (in my case, last in first out,) and not affected by the $weight in my query. Is this a bug?

Thank you.

Vào 17:58:52 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

I am sorry but currently we are not supporting the $weight with tags, we are working to support it.

Hi, thank you for your response.

I realized the default scorer (tf*idf) give the infinity score if the target tag is found, that’s why the ranking didn’t work because it could not compare inf numbers. So I changed the scorer to BM 25 and I got very nice results.

best.

Vào 16:41:19 UTC+7 Thứ Tư, ngày 23 tháng 1 năm 2019, me...@redislabs.com đã viết:

By the way, you can write you own scorer if you want to, this way you might be able to do what you need with out modify the source:
https://oss.redislabs.com/redisearch/Extensions.html

Hi, it is not really related to my original issue, but since you mentioned about scorer function, I think posting it here is appropriate.

I would like to write a custom scorer, but after reading your documents and examples, I still can’t figure out how to include some custom data to calculate the score. Your scorer function signature is: double MyScorer(RSScoringFunctionCtx *ctx, RSIndexResult *h, RSDocumentMetadata *dmd, double minScore)

Now I would like to include 2 types of data to calculate the score:

  • A dynamic data: for example, we have many users with different weights that affect the final score, how could I inject a user’s weight into MyScorer?

  • A static data: some data belong to the document and will be included in the payload, but how can I get them from the payload, please give me a concrete example.

Thank you so much.

Vào 16:53:29 UTC+7 Thứ Tư, ngày 23 tháng 1 năm 2019, me...@redislabs.com đã viết:

So I agree our extension api is not well documented and we should fix it. Regarding you questions, you should import redisearch.h file (all the structs you need should be there)

  1. you can add this dynamic data on the ft.search request using payload, then you can access this payload using the given RSScoringFunctionCtx (ctx->payload)

  2. you can put the static data on the document payload and then access it using the give RSDocumentMetadata (dmd->payload)

Hope this answer your question, let me know if you need any further clarification.