Dear all,
I am currently facing an issue when loading a PyTorch model to RedisAI as Redis crashes using the redislabs/redismod:edge
docker image. In order to exemplify the issue I will use the imagenet example provided in https://github.com/RedisAI/redisai-examples.
While I can successfully load the already serialised resnet50 model to RedisAI, Redis keeps crashing if I try to load a model that I have serialised on my own following the model_saver.py script. I am currently using PyTorch 1.6 on Python 3.7.
Thanks a lot!
Best regards,
manl
The bug report is as follows:
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1:M 27 Oct 2020 08:18:45.084 # Redis 6.0.1 crashed by signal: 11
1:M 27 Oct 2020 08:18:45.084 # Crashed running the instruction at: 0x7f53c9dca975
1:M 27 Oct 2020 08:18:45.084 # Accessing address: 0x18
1:M 27 Oct 2020 08:18:45.084 # Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
EIP:
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2f975)[0x7f53c9dca975]
Backtrace:
redis-server *:6379(logStackTrace+0x32)[0x562639f61872]
redis-server *:6379(sigsegvHandler+0x9e)[0x562639f61f4e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f53fabac730]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2f975)[0x7f53c9dca975]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2fbb8)[0x7f53c9dcabb8]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit16ScriptTypeParser18parseClassConstantERKNS0_6AssignE+0x8d)[0x7f53ca06154d]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f374af)[0x7f53c9dd24af]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f39e43)[0x7f53c9dd4e43]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a225)[0x7f53c9dd5225]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a30a)[0x7f53c9dd530a]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZNK5torch3jit16ScriptTypeParser17parseTypeFromExprERKNS0_4ExprE+0x1c5)[0x7f53ca063ba5]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f36844)[0x7f53c9dd1844]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f39e43)[0x7f53c9dd4e43]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a225)[0x7f53c9dd5225]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZNK5torch3jit14SourceImporter13loadNamedTypeERKN3c1013QualifiedNameE+0x2e)[0x7f53c9dc84fe]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3bf54)[0x7f53c9dd6f54]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f10e93)[0x7f53c9dabe93]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f17352)[0x7f53c9db2352]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f19d60)[0x7f53c9db4d60]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f1a311)[0x7f53c9db5311]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit21readArchiveAndTensorsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN3c108optionalISt8functionIFNS9_13StrongTypePtrERKNS9_13QualifiedNameEEEEENSA_ISB_IFNS9_13intrusive_ptrINS9_6ivalue6ObjectENS9_6detail34intrusive_target_default_null_typeISL_EEEESC_NS9_6IValueEEEEENSA_INS9_6DeviceEEERN6caffe29serialize19PyTorchStreamReaderE+0x6b2)[0x7f53c9dd6982]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3bc9d)[0x7f53c9dd6c9d]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3e3c4)[0x7f53c9dd93c4]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit4loadESt10unique_ptrIN6caffe29serialize20ReadAdapterInterfaceESt14default_deleteIS4_EEN3c108optionalINS8_6DeviceEEERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESI_St4hashISI_ESt8equal_toISI_ESaISt4pairIKSI_SI_EEE+0x179)[0x7f53c9dd9bf9]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit4loadERSiN3c108optionalINS2_6DeviceEEERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_St4hashISC_ESt8equal_toISC_ESaISt4pairIKSC_SC_EEE+0x75)[0x7f53c9dda3f5]
/usr/lib/redis/modules/backends/redisai_torch/redisai_torch.so(torchLoadModel+0x215)[0x7f53fa86b475]
/usr/lib/redis/modules/backends/redisai_torch/redisai_torch.so(RAI_ModelCreateTorch+0x8a)[0x7f53fa8641ea]
/usr/lib/redis/modules/redisai.so(RAI_ModelCreate+0x16d)[0x7f53fa9bc80d]
/usr/lib/redis/modules/redisai.so(RedisAI_ModelSet_RedisCommand+0x91b)[0x7f53fa9b422b]
redis-server *:6379(RedisModuleCommandDispatcher+0x54)[0x562639f91ca4]
redis-server *:6379(call+0x9d)[0x562639f1df0d]
redis-server *:6379(processCommand+0x327)[0x562639f1e687]
redis-server *:6379(processCommandAndResetClient+0x10)[0x562639f2c280]
redis-server *:6379(processInputBuffer+0x18f)[0x562639f307cf]
redis-server *:6379(+0xd4b4c)[0x562639fadb4c]
redis-server *:6379(aeProcessEvents+0x111)[0x562639f17a21]
redis-server *:6379(aeMain+0x2b)[0x562639f17eab]
redis-server *:6379(main+0x4db)[0x562639f147eb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f53fa9fb09b]
redis-server *:6379(_start+0x2a)[0x562639f14a7a]
------ INFO OUTPUT ------
# Server
redis_version:6.0.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:e02d1d807e41d65
redis_mode:standalone
os:Linux 4.19.76-linuxkit x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.3.0
process_id:1
run_id:e6a2f85ed2c4e92ed31dcff4906f4328e9323d73
tcp_port:6379
uptime_in_seconds:12
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:9951204
executable:/data/redis-server
config_file:
# Clients
connected_clients:1
client_recent_max_input_buffer:98074634
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:242778632
used_memory_human:231.53M
used_memory_rss:124690432
used_memory_rss_human:118.91M
used_memory_peak:242778632
used_memory_peak_human:231.53M
used_memory_peak_perc:193.70%
used_memory_overhead:105965986
used_memory_startup:7874368
used_memory_dataset:136812646
used_memory_dataset_perc:58.24%
allocator_allocated:109019704
allocator_active:109441024
allocator_resident:129314816
total_system_memory:8353112064
total_system_memory_human:7.78G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:421320
allocator_rss_ratio:1.18
allocator_rss_bytes:19873792
rss_overhead_ratio:0.96
rss_overhead_bytes:-4624384
mem_fragmentation_ratio:1.15
mem_fragmentation_bytes:16131624
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:98091618
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1603786712
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:1
total_commands_processed:5
instantaneous_ops_per_sec:0
total_net_input_bytes:102773871
total_net_output_bytes:0
instantaneous_input_kbps:51584.85
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
unexpected_error_replies:0
# Replication
role:master
connected_slaves:0
master_replid:4e792775378397fcff3cb2682c63199109f4eeeb
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
master_repl_meaningful_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.183659
used_cpu_user:0.277650
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
# Modules
module:name=search,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=graph,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ReJSON,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=rg,ver=999999,api=1,filters=0,usedby=[],using=[ai],options=[]
module:name=bf,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ai,ver=999999,api=1,filters=0,usedby=[rg],using=[],options=[]
module:name=timeseries,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
# Commandstats
cmdstat_config:calls=1,usec=43,usec_per_call=43.00
cmdstat_info:calls=4,usec=53,usec_per_call=13.25
# Cluster
cluster_enabled:0
# Keyspace
------ CLIENT LIST OUTPUT ------
id=21 addr=172.17.0.1:40400 fd=16 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=102773786 obl=0 oll=0 omem=0 events=r cmd=ai.modelset user=default
------ CURRENT CLIENT INFO ------
id=21 addr=172.17.0.1:40400 fd=16 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=102773786 obl=0 oll=0 omem=0 events=r cmd=ai.modelset user=default
argv[0]: 'AI.MODELSET'
argv[1]: 'imagenet_model'
argv[2]: 'torch'
argv[3]: 'cpu'
argv[4]: 'BLOB'
argv[5]: 'PK'
------ REGISTERS ------
1:M 27 Oct 2020 08:18:45.095 #
RAX:0000000000000000 RBX:000056263c008668
RCX:0000000000000000 RDX:0000000000000000
RDI:00007ffeb5711610 RSI:000056263c008668
RBP:00007ffeb5711890 RSP:00007ffeb5711610
R8 :0000000000000000 R9 :0000000000000001
R10:0000000000000001 R11:0000000000000020
R12:00007ffeb57126d0 R13:0000000000000138
R14:00007ffeb57126f0 R15:00007ffeb57126d0
RIP:00007f53c9dca975 EFL:0000000000010246
CSGSFS:002b000000000033
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161f) -> 000056263c005e50
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161e) -> 000056263c005e60
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161d) -> 00000000000000a0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161c) -> 000056263c005e50
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161b) -> 000056263c005e60
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161a) -> 0000000000000000
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711619) -> 00007f53c79a7e35
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711618) -> 00007ffeb5712540
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711617) -> 00007ffeb57118d0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711616) -> 0000000000000000
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711615) -> 00007f53c9dcaeee
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711614) -> 00007ffeb57126d0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711613) -> 00007ffeb57126f0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711612) -> 0000000000000138
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711611) -> 00007ffeb57126d0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711610) -> 0000000000000000
------ MODULES INFO OUTPUT ------
# graph_executing commands
# ai_git
ai_git_sha:7a30eb39f3b3ce74bf4427b9c53f0fe6163e0ca2
# ai_load_time_configs
ai_threads_per_queue:1
ai_inter_op_parallelism:0
ai_intra_op_parallelism:0
# ai_cpu
ai_self_used_cpu_sys:0.183659
ai_self_used_cpu_user:0.277918
ai_children_used_cpu_sys:0.000000
ai_children_used_cpu_user:0.000000
ai_queue_CPU_bthread_#1_used_cpu_total:0.000000
------ FAST MEMORY TEST ------
1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #0 terminated
1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #1 terminated
1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #2 terminated
*** Preparing to test memory region 56263a0ac000 (2277376 bytes)
*** Preparing to test memory region 56263b2a7000 (14139392 bytes)
*** Preparing to test memory region 7f53ba79d000 (205553664 bytes)
*** Preparing to test memory region 7f53c6ba5000 (524288 bytes)
*** Preparing to test memory region 7f53c6c25000 (331776 bytes)
*** Preparing to test memory region 7f53d5a28000 (282624 bytes)
*** Preparing to test memory region 7f53d5ff7000 (8192 bytes)
*** Preparing to test memory region 7f53d6000000 (302125056 bytes)
*** Preparing to test memory region 7f53ec023000 (331776 bytes)
*** Preparing to test memory region 7f53ec1f4000 (16384 bytes)
*** Preparing to test memory region 7f53ec3fb000 (8388608 bytes)
*** Preparing to test memory region 7f53ecbfc000 (8388608 bytes)
*** Preparing to test memory region 7f53ed3fd000 (8388608 bytes)
*** Preparing to test memory region 7f53edbfe000 (8388608 bytes)
*** Preparing to test memory region 7f53ee3ff000 (8388608 bytes)
*** Preparing to test memory region 7f53eec00000 (8388608 bytes)
*** Preparing to test memory region 7f53ef400000 (4194304 bytes)
*** Preparing to test memory region 7f53ef815000 (524288 bytes)
*** Preparing to test memory region 7f53ef896000 (8388608 bytes)
*** Preparing to test memory region 7f53f02b4000 (9437184 bytes)
*** Preparing to test memory region 7f53f0bb5000 (8388608 bytes)
*** Preparing to test memory region 7f53f13b6000 (8388608 bytes)
*** Preparing to test memory region 7f53f1bb7000 (8388608 bytes)
*** Preparing to test memory region 7f53f23b8000 (8388608 bytes)
*** Preparing to test memory region 7f53f2bb9000 (8388608 bytes)
*** Preparing to test memory region 7f53f37ef000 (139264 bytes)
*** Preparing to test memory region 7f53f3a25000 (8388608 bytes)
*** Preparing to test memory region 7f53f44c4000 (12288 bytes)
*** Preparing to test memory region 7f53f44c8000 (8388608 bytes)
*** Preparing to test memory region 7f53f4cc9000 (8388608 bytes)
*** Preparing to test memory region 7f53f54ca000 (8388608 bytes)
*** Preparing to test memory region 7f53f5ccb000 (8388608 bytes)
*** Preparing to test memory region 7f53f64cc000 (8388608 bytes)
*** Preparing to test memory region 7f53f6ccd000 (8388608 bytes)
*** Preparing to test memory region 7f53f74ce000 (8388608 bytes)
*** Preparing to test memory region 7f53f7ccf000 (8388608 bytes)
*** Preparing to test memory region 7f53f8d9f000 (16384 bytes)
*** Preparing to test memory region 7f53f8da4000 (8388608 bytes)
*** Preparing to test memory region 7f53f97fc000 (12288 bytes)
*** Preparing to test memory region 7f53f9800000 (8388608 bytes)
*** Preparing to test memory region 7f53fa000000 (8388608 bytes)
*** Preparing to test memory region 7f53fa826000 (180224 bytes)
*** Preparing to test memory region 7f53fa883000 (4096 bytes)
*** Preparing to test memory region 7f53fa8c2000 (4096 bytes)
*** Preparing to test memory region 7f53fa923000 (4096 bytes)
*** Preparing to test memory region 7f53fa9d2000 (20480 bytes)
*** Preparing to test memory region 7f53fab94000 (24576 bytes)
*** Preparing to test memory region 7f53fabb7000 (16384 bytes)
*** Preparing to test memory region 7f53faea0000 (16384 bytes)
*** Preparing to test memory region 7f53fb0c8000 (8192 bytes)
*** Preparing to test memory region 7f53fb0f6000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so (base 0x7f53c6e9b000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------
=== REDIS BUG REPORT END. Make sure to include from START to END. ===