Correct and fastest way to send Gears script to background using gears-cli

Found explanation hashing algorithm and role of {} in https://redis.io/topics/cluster-spec
Really cool feature - for example I can always make sure all sentences from same article stays on the same shard by using {article_id} inside of the key.

@AlexMikhalev Yes this is the idea so you will not try to create keys which are not match to the shard.

Sorry, thread went way too long, it provides enough context. For the last several hours I am struggling to get simpler step working:

from langdetect import detect   

def remove_prefix(text, prefix):
    return text[text.startswith(prefix) and len(prefix):]

def detect_language(record):
    #detect language of the article
    value=record['value']
    value1000=value[:1000]
    log("Value 1000 "+value1000)
    try:
        lang=detect(value1000)
    except:
        lang="empty"
    if lang=='en':
        article_id = remove_prefix(record['key'],'paragraphs:') 
        paragraph_key="langen:{%s}" % (article_id)
        log(f"Success lang {paragraph_key}",level='notice')
        log('Hashtag {%s}' % hashtag())
        execute('SET', paragraph_key, value)
        execute('SADD','successfull_lang{%s}' % hashtag(), paragraph_key)
    else:
        log("Failed to detect language: "+str(record['key']),level='notice')
        execute('SADD','articles_to_delete', record['key'])

gb = GB()
gb.foreach(detect_language)
gb.register('paragraphs:*',keyTypes=['string'], mode="sync")

execute('SET', paragraph_key, value) doesn’t return any errors, but also doesn’t create a record.
I tried changing key format, mode and logged everything around it, and still no records with langen prefix, despite log capturing Success lang langen:{0ec759a568cb64acd211d0977da4ee9b098a7dec}
and smembers successfull_lang{4MP} returning list of correct keys.
Any attempt to query keys via RedisInsight or redis-cli -c fails.

Log output example:

47:M 21 May 2020 08:36:38.924 * <module> GEARS: Value 1000 Infection by enveloped viruses, whose infectious particles are surrounded by a lipid bilayer, requires fusion of the host cell and viral membranes; this process is facilitated by one or more viral envelope glycoproteins [1] . Although details of this mechanism vary among viruses, viral glycoproteins typically consist of a surface subunit, which binds to a host cell receptor, and a transmembrane subunit responsible for drawing host and viral membranes together via the formation of a stable post-fusion conformation [2] . The "class I" viral fusion proteins, which include those of the human immunodeficiency viruses, influenza, Ebola viruses [exemplified by Ebola virus (EBOV) and Sudan virus (SUDV)] and Marburg virus (MARV), are defined by the formation of a core, trimeric α-helical bundle by the ectodomain of the transmembrane subunit during membrane fusion [1, 3] . The post-fusion conformations consist of a "trimer-of-hairpins" motif in which the ectodomain N-and C-terminal segments
47:M 21 May 2020 08:36:38.935 * <module> GEARS: Success lang langen:{0ec759a568cb64acd211d0977da4ee9b098a7dec}
47:M 21 May 2020 08:36:38.935 * <module> GEARS: Hashtag {4MP}

There is a cryptic message in “Executions”:

1) []
2) ["['Traceback (most recent call last):\\n', ' File \"<string>\", line 1, in <lambda>\\n', 'spam.error: execution plan does not exist\\r\\n\\n']\u0000"]

But I can’t relate it to the function.
Any hints? Only one function registered - this is the first step in pipeline.

@AlexMikhalev you do not see the key because its not necessarily belong to the shard so the shard refuse to create it. You are putting the ‘article_id’ and not the ‘record[‘key’]’ inside the {}. Try to put ‘record[‘key’]’ and it should work. I agree there should have been a better error message here, will see if I can fix it (basically its internal inside redis and it returns NULL and not error)


Regarding the cryptic error, where do you see it? (on which command?)

Cryptic error is on some events inside RedisInsigt, RedisGears, Executions.

Changing to record[‘key’] worked.
So if I manipulate the key shard will refuse to accept it? I will have to rethink my pipeline logic - my assumption was that I create keys with prefix corresponding to each processing step: paragraphs:*, en:*, sentences:*, tokens:*. This will make sure functions will only capture what they are intend to process. Is any better way?

@AlexMikhalev not sure I follow whats the issue, you can still create the keys with those prefixes, just make sure to add the {} with the original key name somewhere in the new key name so the shard will accept it. You can even put the artical_id, something like this should work ‘langen:%s:{%s}’ % (article_id, record[keys])

Issue keys become less humanly readable: instead of sentences:article_id, it will be sentence:article_id:paragraph:article_id, but I can live with it.
Error message would save me time - otherwise it looks like confirmed writes lost inside Redis Cluster.

Yes I agree, will just raise an error if got NULL reply with hint why the write might be rejected


Thank you for your help @meirsh I should have read your first reply more carefully, you actually gave me better example initially, my final version:

paragraph_key="en:%s:{%s}" % (article_id, hashtag())
2 Likes

Excellent discussion here. My data scientist ran into a similar issue I believe. I’m going to point him to this discussion.

Feel free to PM for lessons learned.