🌚

Kam's Online Notebook


记一个 callstack 没有 user code 的 crash

自 iOS 13.x release 之后,产品在线上陆陆续续有些和 UIKitCore 相关 crash 出现。由于数目不多、且自动提单系统没有 ref 到我,并且我在开发其他玩意儿,所以单挂着一段时间。后来运营同事反馈类似问题,才开始着手处理(影响 UGC 了不是)。

根据同事反馈是发布内容的时候,输入「#」后会 crash,先看下 crash 的 callstack:

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread:  0

Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSDictionaryM updateSelectionRects]: unrecognized selector sent to instance 0x2817c0560'

Thread 0 Crashed:
0   CoreFoundation                      __exceptionPreprocess  +228
1   libobjc.A.dylib                     objc_exception_throw  +60
2   CoreFoundation                      -[NSOrderedSet initWithSet:copyItems:]  +0
3   CoreFoundation                      ___forwarding___  +1320
4   CoreFoundation                      _CF_forwarding_prep_0  +96
5   UIKitCore                           -[UIKeyboardImpl unmarkText:]  +204
6   UIKitCore                           __59-[UIKeyboardImpl handleAcceptedCandidate:executionContext:]_block_invoke_2  +300
7   UIKitCore                           -[UIKeyboardTaskEntry execute:]  +188
8   UIKitCore                           -[UIKeyboardTaskQueue continueExecutionOnMainThread]  +324
9   UIKitCore                           -[UIKeyboardTaskQueue waitUntilTaskIsFinished:]  +172
10  UIKitCore                           -[UIKeyboardTaskQueue performSingleTask:]  +156
11  UIKitCore                           -[UIKeyboardImpl acceptCurrentCandidateForInput:]  +288
12  UIKitCore                           -[UIKeyboardCandidateController candidateView:didAcceptCandidate:atIndexPath:inGridType:generateFeedback:]  +180
13  UIKitCore                           -[UIKeyboardCandidateController candidateView:didAcceptCandidate:atIndexPath:inGridType:]  +100
14  TextInputUI                         -[TUICandidateView candidateGrid:didAcceptCandidate:atIndexPath:]  +208
15  TextInputUI                         -[TUICandidateGrid collectionView:didSelectItemAtIndexPath:]  +184
16  UIKitCore                           -[UICollectionView _selectItemAtIndexPath:animated:scrollPosition:notifyDelegate:deselectPrevious:]  +952
17  UIKitCore                           -[UICollectionView touchesEnded:withEvent:]  +572
18  UIKitCore                           forwardTouchMethod  +332
19  UIKitCore                           -[UIResponder touchesEnded:withEvent:]  +64
20  UIKitCore                           forwardTouchMethod  +332
21  UIKitCore                           -[UIResponder touchesEnded:withEvent:]  +64
22  UIKitCore                           forwardTouchMethod  +332
23  UIKitCore                           -[UIResponder touchesEnded:withEvent:]  +64
24  UIKitCore                           -[UIWindow _sendTouchesForEvent:]  +1024
25  UIKitCore                           -[UIWindow sendEvent:]  +3540
26  UIKitCore                           -[UIApplication sendEvent:]  +348
27  UIKitCore                           __dispatchPreprocessedEventFromEventQueue  +6688
28  UIKitCore                           __handleEventQueueInternal  +5368
29  UIKitCore                           __handleHIDEventFetcherDrain  +144
30  CoreFoundation                      __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__  +28
31  CoreFoundation                      __CFRunLoopDoSource0  +84
32  CoreFoundation                      __CFRunLoopDoSources0  +188
33  CoreFoundation                      __CFRunLoopRun  +780
34  CoreFoundation                      CFRunLoopRunSpecific  +480
35  GraphicsServices                    GSEventRunModal  +164
36  UIKitCore                           UIApplicationMain  +1936
37  MY_APP.                             _main 
38  libdyld.dylib                       start  +4

通常看到这种没有用户代码的 callstack,都会觉得是系统问题。但是 Google 一下并不常见,而且也不可能放着四五个版本更新不解决。所有还是认定了是我们的问题。

尝试复现,首先打个符号断点到-[UIKeyboardImpl unmarkText:],再去输入框测试不同的「#」输入姿势。毕竟找到触发方式能让我的 debug 之旅轻松一点。

  1. 全键盘输入英文内容后,输入「#」号,没有触发符号断点,也没有 crash;
  2. 全键盘输入中文,点击上方选字区域触发符号断点,按下左下角的 「123」 进入符号输入,输入「#」没有 crash;
  3. 九宫格输入中文,点击第一格的符号按键,会输入一个「,」,而后输入「#」,发生替换,触发断点并且 crash 了。

这下找到必现的步骤了。比较有趣的是,这个 crash 的每次的 exception 都不一样,有时是给 dictionary 发消息,有时是 string,也出现过 bad access,基本可以断定是:在发送消息前,指向这个对象的指针地址被回收重用了(对象被释放),或者直接成了 dangling pointer。

开启 Xcode 的 🧟‍♂️ 功能,可以帮助我们找到实际应该接受消息的对象类(callstack 没有体现发送消息的位置是比较奇怪的,可能 🧟‍♂️ 一直是这样而我没怎么用):

*** -[UITextInteractionSelectableInputDelegate updateSelectionRects]: message sent to deallocated instance 0x600003bf8da0

但,还不够,我们想要的知道的是,这个对象是何时被释放的。

break on -updateSelectionRects:

设置-[UITextInteractionSelectableInputDelegate updateSelectionRects]断点,
输入「#」时,命中断点:

(lldb) bt2
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
  * frame #0: 0x0000000128e98aea UIKitCore`-[UITextInteractionSelectableInputDelegate updateSelectionRects]
    frame #1: 0x0000000128d90771 UIKitCore`-[UIKeyboardImpl deleteBackwardAndNotify:] + 391
    frame #2: 0x00000001533475f1 UIKit`-[UIKeyboardImplAccessibility deleteBackwardAndNotify:] + 46
    frame #3: 0x0000000128d96fc4 UIKitCore`-[UIKeyboardImpl handleDeletionForCandidate:] + 140
    frame #4: 0x0000000128d96e82 UIKitCore`-[UIKeyboardImpl acceptCandidate:forInput:] + 2310
	...

查看$rdi并且给这个地址设置 watchpoint:

(lldb) po $rdi
<UITextInteractionSelectableInputDelegate: 0x600001c27d00>

(lldb) watchpoint set expression -- 0x600001c27d00
Watchpoint created: Watchpoint 1: addr = 0x600001c27d00 size = 8 state = enabled type = w
    new value: 4997267760

放行,触发 watchpoint,可以看到应该是输入「#」号之后,present 了一个 viewController,UIKit 刷新输入控件的 delegate 然后把旧的释放了:

Process 35093 resuming

Watchpoint 1 hit:
old value: 4997267760
new value: 105553190428256
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 1
  * frame #0: 0x000000011f1c1ef7 libobjc.A.dylib`object_setClass + 96
    frame #1: 0x000000011fb2ba54 CoreFoundation`-[NSObject(NSObject) __dealloc_zombie] + 170
    frame #2: 0x0000000128fdc256 UIKitCore`-[UIResponder dealloc] + 145
    frame #3: 0x0000000153362f0f UIKit`-[UIResponderAccessibility dealloc] + 55
    frame #4: 0x000000011f1da934 libobjc.A.dylib`objc_object::sidetable_release(bool, bool) + 174
    frame #5: 0x0000000128d79c53 UIKitCore`-[UIKeyboardImpl setDelegate:force:] + 1423
    frame #6: 0x00000001289eee15 UIKitCore`-[UIInputResponderController _reloadInputViewsForKeyWindowSceneResponder:] + 2289
	...
    frame #15: 0x00000001288a0b22 UIKitCore`-[UIViewController presentViewController:animated:completion:] + 155
    frame #16: 0x000000010e5165c7 APP`-[UIViewController gl_pt_presentViewController:animated:completion:] at UIViewController+PageTrack.m:37:5
    frame #17: 0x00000001533bdfbc UIKit`-[UIViewControllerAccessibility presentViewController:animated:completion:] + 221
    frame #18: 0x000000010f0bbfff APP `-[AViewController showTopicVc] at AViewController.m:1689:5
    frame #19: 0x000000010f0b8d69 APP `-[AViewController publishTextView:shouldChangeTextInRange:replacementText:] at AViewController.m:1538:13
    frame #20: 0x000000010f1ca238 APP `-[ATextInputTableViewCell textView:shouldChangeTextInRange:replacementText:] at ATextInputTableViewCell.m:127:16
    frame #21: 0x000000011aa8da84 APP `-[YYTextView setMarkedText:selectedRange:] at YYTextView.m:3368:23
    frame #22: 0x0000000128e4a724 UIKitCore`-[UIResponder(UITextInput_Internal) _setAttributedMarkedText:selectedRange:] + 162
    frame #23: 0x0000000128e9920b UIKitCore`-[UITextInteractionSelectableInputDelegate _setMarkedText:selectedRange:] + 38
    frame #24: 0x0000000128d85997 UIKitCore`-[UIKeyboardImpl unmarkText:] + 165

再放行,就会出现 crash 并且 deallocated instance 地址确实是我们之前观察的对象:

*** -[UITextInteractionSelectableInputDelegate updateSelectionRects]: message sent to deallocated instance 0x600001c27d00

也就在这一轮调用,我们不应该在-setMarkedText:selectedRange:中 present 新的 viewController,利用 GCD 到 mainqueue 在下一次 runloop present,或者 resign first responder 再 present 应该都是解决办法。

但上述「打断点寻找被释放对象」其实这是我事后复盘想到的办法,这个有点碰运气的成分,应为这恰好在被释放之前触发了一次断点,要是没有呢?

最初的笨办法

最初的我先从能得到的符号入手,一步一步找到最终发送updateSelectionRects的位置。
设置-[UIKeyboardImpl unmarkText:]断点,
触发后给calljmp啊等指令位置设置断点:

UIKitCore`-[UIKeyboardImpl unmarkText:]:
    0x1297dd8f2 <+0>:   pushq  %rbp
    0x1297dd8f3 <+1>:   movq   %rsp, %rbp
    0x1297dd8f6 <+4>:   pushq  %r15
    0x1297dd8f8 <+6>:   pushq  %r14
    ...
    0x1297dd977 <+133>: movq   0xf89a8a(%rip), %rsi      ; "length"
    0x1297dd97e <+140>: callq  *%r13
    0x1297dd981 <+143>: movq   0xfbfbe8(%rip), %rsi      ; "_setMarkedText:selectedRange:"
    0x1297dd988 <+150>: movq   %r12, %rdi
    0x1297dd98b <+153>: movq   %r14, %rdx
    0x1297dd98e <+156>: movq   %rax, %rcx
    0x1297dd991 <+159>: xorl   %r8d, %r8d
->  0x1297dd994 <+162>: callq  *%r13
    0x1297dd997 <+165>: movq   0xf966a2(%rip), %rsi      ; "unmarkText"
    0x1297dd99e <+172>: movq   %rbx, %rdi
    0x1297dd9a1 <+175>: callq  *%r13

这一轮确定 crash 发送在_setMarkedText:selectedRange:
再来一轮给_setMarkedText:selectedRange:设置断点:

UIKitCore`-[UITextInteractionSelectableInputDelegate _setMarkedText:selectedRange:]:
->  0x124db91e5 <+0>:  pushq  %rbp
    0x124db91e6 <+1>:  movq   %rsp, %rbp
    0x124db91e9 <+4>:  pushq  %r14
    0x124db91eb <+6>:  pushq  %rbx
    0x124db91ec <+7>:  movq   %rdi, %rbx
    0x124db91ef <+10>: movq   0xef20da(%rip), %rax      ; UITextInteractionSelectableInputDelegate._textInput
    0x124db91f6 <+17>: movq   (%rdi,%rax), %rdi
    0x124db91fa <+21>: movq   0xeac36f(%rip), %rsi      ; "_setMarkedText:selectedRange:"
    0x124db9201 <+28>: movq   0xbfa610(%rip), %r14      ; (void *)0x0000000115966e40: objc_msgSend
    0x124db9208 <+35>: callq  *%r14
    0x124db920b <+38>: movq   0xe995ee(%rip), %rsi      ; "updateSelectionRects"
    0x124db9212 <+45>: movq   %rbx, %rdi
    0x124db9215 <+48>: movq   %r14, %rax
    0x124db9218 <+51>: popq   %rbx
    0x124db9219 <+52>: popq   %r14
    0x124db921b <+54>: popq   %rbp
    0x124db921c <+55>: jmpq   *%rax

这一下就有趣了,我们看到了 +38 位置处加载字符串到$rsi的代码,然后jmpqobjc_msgSend(0x0000000115966e40 -> $r14 -> $rax),推测就是 +55 处,给 deallocated instance ($rdi寄存器是消息接受者)发送消息。通过反汇编可以看到$rsi -> $rbx -> $rsi 这样的拷贝路径,也就是_setMarkedText:selectedRange:updateSelectionRects使用一个消息接受者。那么我们用上一节提到的 watchpoint 观察这个$rdi,之后的流程都一样了。

EOF

— Aug 16, 2020