🌚

Kam's Online Notebook


CPPExceptionTerminate Crash

又整一个(堆) callstack 没有 user code 的 crash,先看 crash log:

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x00000001b609a84c
Crashed Thread:  0

Thread 0 Crashed:
0   libsystem_kernel.dylib          __pthread_kill  +8
1   libsystem_pthread.dylib         pthread_kill  +212
2   libsystem_c.dylib               abort  +100
3   libc++abi.dylib                 __cxxabiv1::__aligned_malloc_with_fallback(unsigned long)  +0
4   libc++abi.dylib                 demangling_unexpected_handler()  +0
5   libobjc.A.dylib                 _objc_terminate()  +124
6   xyz                             CPPExceptionTerminate() /PATH/TO/KSCrashMonitor_CPPException.cpp:180 +12
7   libc++abi.dylib                 std::__terminate(void (*)())  +16
8   libc++abi.dylib                 std::terminate()  +44
9   libdispatch.dylib               _dispatch_client_callout  +36
10  libdispatch.dylib               _dispatch_main_queue_callback_4CF$VARIANT$armv81  +856
11  CoreFoundation                  __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__  +12
12  CoreFoundation                  __CFRunLoopRun  +2480
13  CoreFoundation                  CFRunLoopRunSpecific  +572
14  GraphicsServices                GSEventRunModal  +160
15  UIKitCore                       -[UIApplication _run]  +1052
16  UIKitCore                       UIApplicationMain  +164
17  xyz                             main xyz/main.m:33 +28
18  libdyld.dylib                   start  +4

这不对劲 🤨

从 line 5, 6 可以看出,是 throw 一个 uncaught C++ exception(或 NSException,依托于 C++ exception 实现),但是没有 user code。这个现象在 iOS 上很正常的,这是 runloop 本身的历史问题:

And, realistically, if someone were implementing the run loop today, that’s how it’d work. The reasons why the run loop works the way it currently works are mired in the depths of history.

https://developer.apple.com/forums/thread/116305

Runloop 的 exception handler 会 catch 这种异常然后 rethrow,unwind stack 之后原本的触发该异常的 callstack 就丢失了。所以一般自定义 Crash 采集,都会去解决这个问题——通过 link 一个 __cxa_throw 函数替换 C++ ABI 的实现,然后在这个地方就开始记录当前运行环境的堆栈,之后调用(通常是)原来的实现,里面调用 std::set_terminate 注册好的 handler。

下面就是 KSCrash 的做法:

extern "C"
{
    void __cxa_throw(void* thrown_exception, std::type_info* tinfo, void (*dest)(void*)) __attribute__ ((weak));

    void __cxa_throw(void* thrown_exception, std::type_info* tinfo, void (*dest)(void*))
    {
        static cxa_throw_type orig_cxa_throw = NULL;
        if (g_cxaSwapEnabled == false)
        {
            // 保存调用栈
            captureStackTrace(NULL, NULL, NULL);
        }
        unlikely_if(orig_cxa_throw == NULL)
        {
            orig_cxa_throw = (cxa_throw_type) dlsym(RTLD_NEXT, "__cxa_throw");
        }
        // 调用(可能是)原有的,看 dlsym 的搜索策略及具体存在情况
        orig_cxa_throw(thrown_exception, tinfo, dest);
        __builtin_unreachable();
    }
}

综上,目前所遇到的 crash 堆栈是「不正常」的,因为 KSCrash 本身是有处理 uncaught C++ exception 的逻辑。

问题在哪

首先这个 crash 日志是是 unwind 之后采集到的,即应该是应用层面上的异常导致,那么测试这两种场景下,「异常处理本身」发生异常到几种结果:

  1. 发生 ucaught C++ exception 时,记录堆栈到 __cxa_throw 未被调用:这种情况下采集不到日志,pass;
  2. 发生 uncaught C++ exception 时,CPPExceptionTerminate 未被调用:与 crash 到日志记录不符合,pass;
  3. 发生 NSException 时,KSCrash 注册的 handler 未被调用,有内味了:
Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x00007fff51829462
Crashed Thread:  0

Thread 0 Crashed:
0   libsystem_kernel.dylib          0x00007fff51829462 __pthread_kill + 10
1   libsystem_c.dylib               0x00007fff517b9a3c abort + 120
2   libc++abi.dylib                 0x00007fff4f6d27f8 abort_message + 231
3   libc++abi.dylib                 0x00007fff4f6d29c7 demangling_terminate_handler() + 262
4   libobjc.A.dylib                 0x00007fff50864d7c _objc_terminate() + 96
5   Crash-Tester                    0x000000010c6b9e0a CPPExceptionTerminate() + 2682
6   libc++abi.dylib                 0x00007fff4f6dfe97 std::__terminate(void (*)()) + 8
7   libc++abi.dylib                 0x00007fff4f6dfe39 std::terminate() + 41
8   libdispatch.dylib               0x00007fff516ad795 _dispatch_client_callout + 28
9   libdispatch.dylib               0x00007fff516b9caa _dispatch_main_queue_callback_4CF + 1212
10  CoreFoundation                  0x00007fff23b0ce49 CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE + 9
11  CoreFoundation                  0x00007fff23b07aa9 __CFRunLoopRun + 2329
12  CoreFoundation                  0x00007fff23b06e66 CFRunLoopRunSpecific + 438
13  GraphicsServices                0x00007fff38346bb0 GSEventRunModal + 65
14  UIKitCore                       0x00007fff47578dd0 UIApplicationMain + 1621
15  Crash-Tester                    0x000000010c673c60 main + 112
16  libdyld.dylib                   0x00007fff516ecd29 start + 1

那么排查的方向可以暂定为「某些代码通过 NSSetUncaughtExceptionHandler 注册了 handler 但出于某些原因没成功调用 KSCrash 注册的 handler」。设个 DDL,就明天,要把这个问题解决,折腾好些日子了。

EOF

— May 26, 2021