ftrace學(xué)習(xí)筆記
- 1. 前言
- 2. ARM64棧幀結(jié)構(gòu)
- 3. 編譯階段
- 3.1 未開啟ftrace時的blk_update_request
- 3.2 開啟ftrace時的blk_update_request
- 4. 鏈接階段
- 4.1 未開啟ftrace時的blk_update_request
- 4.2 開啟ftrace時的blk_update_request
- 5. 運行階段
- 5.1 ftrace_init執(zhí)行后的blk_update_request
- 5.2 設(shè)定trace函數(shù)blk_update_request
- 6. 鉤子函數(shù)的替換過程
- 7.總結(jié)
- 參考文檔
1. 前言
本文主要是根據(jù)閱碼場 《Linux內(nèi)核tracers的實現(xiàn)原理與應(yīng)用》視頻課程,我自己在aarch64上的實踐。通過觀察鉤子函數(shù)的創(chuàng)建過程以及替換過程,理解trace的原理。本文同樣以blk_update_request函數(shù)為例進(jìn)行說明。kernel版本:5.10平臺:arm642.ARM64棧幀結(jié)構(gòu)
在開始介紹arm64架構(gòu)下的ftrace之前,先來簡要說明一下arm64棧幀的相關(guān)知識。arm64有31個通用寄存器r0-r30,其中r0-r7用于Parameter/result 寄存器; r29為Frame Pointer寄存器,r30為Link寄存器,指向上級函數(shù)的返回地址;SP為棧指針。將以如下代碼為例,說明它的棧幀結(jié)構(gòu):/* * ARCH: armv8 * GCC版本:aarch64-linux-gnu-gcc (Linaro GCC 5.4-2017.01) 5.4.1 20161213 */intfun2(int c,int d){
??
return0;
}
?
intfun1(int a,int b){
??
int c =
1;
??
int d =
2;
??
???fun2(c, d);
??
return0;
}
?
intmain(int argc,char **argv){
??
int a =
0;
??
int b =
1;
?? fun1(a,b);
}
aarch64-linux-gnu-objdump -d a.out 反匯編后的結(jié)果為:0000000000400530 :
?
/* 更新sp到fun2的棧底 */?
400530:?????? d10043ff??????? sub???? sp, sp, #
0x10?
400534:?????? b9000fe0??????? str???? w0, [sp,#
12]
?
400538:?????? b9000be1??????? str???? w1, [sp,#
8]
?
40053c:??????
52800000??????? mov???? w0, #
0x0???????????????????????
// #0?
400540:??????
910043ff??????? add???? sp, sp, #
0x10?
400544:?????? d65f03c0??????? ret
?
0000000000400548 :
?
/* 分配48字節(jié)??臻g,先更新sp=sp-48, 再入棧x29, x30, 此時sp指向棧頂 */?
400548:?????? a9bd7bfd??????? stp???? x29, x30, [sp,#
-48]!
?
/* x29、sp指向棧頂*/?
40054c:??????
910003fd??????? mov???? x29, sp
?
/* 入棧fun1參數(shù)0 */?
400550:?????? b9001fa0??????? str???? w0, [x29,#
28]
?
/* 入棧fun1參數(shù)1 */?
400554:?????? b9001ba1??????? str???? w1, [x29,#
24]
?
/* 入棧fun1局部變量c */?
400558:??????
52800020??????? mov???? w0, #
0x1???????????????????????
// #1?
40055c:?????? b9002fa0??????? str???? w0, [x29,#
44]
?
/* 入棧fun1局部變量d */?
400560:??????
52800040??????? mov???? w0, #
0x2???????????????????????
// #2?
400564:?????? b9002ba0??????? str???? w0, [x29,#
40]
?
400568:?????? b9402ba1??????? ldr???? w1, [x29,#
40]
?
40056c:?????? b9402fa0??????? ldr???? w0, [x29,#
44]
?
/* 跳轉(zhuǎn)到fun2 */?
400570:??????
97fffff0??????? bl?????
400530
?
400574:??????
52800000??????? mov???? w0, #
0x0???????????????????????
// #0?
400578:?????? a8c37bfd??????? ldp???? x29, x30, [sp],#
48?
40057c:?????? d65f03c0??????? ret
?
0000000000400580 :
?
/* 分配48字節(jié)??臻g,先更新sp=sp-48, 再入棧x29, x30, 此時sp指向棧頂*/?
400580:?????? a9bd7bfd??????? stp???? x29, x30, [sp,#
-48]!
?
/* x29、sp指向棧頂*/?
400584:??????
910003fd??????? mov???? x29, sp
?
/* 入棧main參數(shù)0 */?
400588:?????? b9001fa0??????? str???? w0, [x29,#
28]
?
/* 入棧main參數(shù)1 */?
40058c:?????? f9000ba1??????? str???? x1, [x29,#
16]
?
/* 入棧變量a */?
400590:?????? b9002fbf??????? str???? wzr, [x29,#
44]
?
400594:??????
52800020??????? mov???? w0, #
0x1???????????????????????
// #1?
/* 入棧變量b */?
400598:?????? b9002ba0??????? str???? w0, [x29,#
40]
?
40059c:?????? b9402ba1??????? ldr???? w1, [x29,#
40]
?
4005a0:?????? b9402fa0??????? ldr???? w0, [x29,#
44]
?
/* 跳轉(zhuǎn)到fun1 */?
4005a4:??????
97ffffe9??????? bl?????
400548
?
4005a8:??????
52800000??????? mov???? w0, #
0x0???????????????????????
// #0?
4005ac:?????? a8c37bfd??????? ldp???? x29, x30, [sp],#
48?
4005b0:?????? d65f03c0??????? ret
?
4005b4:??????
00000000??????? .inst??
0x00000000 ; undefined
對應(yīng)棧幀結(jié)構(gòu)為:總結(jié)一下:通過對aarch64代碼反匯編的分析,可以得出:1.?????每個函數(shù)在入口處首先會分配棧空間,且一次分配,確定棧頂,之后sp將不再變化;2.?????每個函數(shù)的棧頂部存放的是caller的棧頂指針,即fun1的棧頂存放的是main棧頂指針;3.?????對于最后一級callee函數(shù),由于x29保存了上一級caller的棧頂sp指針,因此不在需要入棧保存,如示例中fun2執(zhí)行時,此時x29指向fun1的棧頂sp下面我們將根據(jù)是否開啟ftrace配置,并區(qū)分編譯階段、鏈接階段和運行階段,分別查看鉤子函數(shù)的替換及構(gòu)建情況。
3. 編譯階段
3.1 未開啟ftrace時的blk_update_request
00000000000012ac :
??? 12ac:?????? d10183ff??????? sub??? ?sp, sp, #0x60
??? 12b0:?????? a9017bfd??????? stp???? x29, x30, [sp,#16]
??? 12b4:?????? 910043fd??????? add???? x29, sp, #0x10
??? 12b8:?????? a90253f3??????? stp???? x19, x20, [sp,#32]
??? 12bc:?????? a9035bf5??????? stp???? x21, x22, [sp,#48]
??? 12c0:?????? a90463f7??????? stp???? x23, x24, [sp,#64]
??? 12c4:?????? f9002bf9??????? str???? x25, [sp,#80]
??? 12c8:?????? aa0003f6??????? mov???? x22, x0
??? 12cc:?????? 53001c38??????? uxtb??? w24, w1
??? 12d0:?????? 2a0203f5??????? mov???? w21, w2
??? 12d4:?????? 2a1803e0??????? mov???? w0, w24
??? 12d8:?????? 94000000??????? bl????? 12c
??? ...
在未使能內(nèi)核配置項CONFIG_FTRACE時,反匯編blk_update_request函數(shù)可以看出,不包含鉤子函數(shù)。3.2 開啟ftrace時的blk_update_request
0000000000003f10 :
??? 3f10:?????? d10183ff??????? sub???? sp, sp, #0x60
??? 3f14:?????? a9017bfd??????? stp???? x29, x30, [sp,#16]
??? 3f18:?????? 910043fd??????? add???? x29, sp, #0x10
??? 3f1c:?????? a90253f3??????? stp???? x19, x20, [sp,#32]
??? 3f20:?????? a9035bf5??????? stp???? x21, x22, [sp,#48]
??? 3f24:?????? a90463f7??????? stp???? x23, x24, [sp,#64]
??? 3f28:?????? f9002bf9??????? str???? x25, [sp,#80]
??? 3f2c:?????? aa0003f6??????? mov???? x22, x0
??? 3f30:?????? 53001c38??????? uxtb??? w24, w1
??? 3f34:?????? 2a0203f5??????? mov???? w21, w2
??? 3f38:?????? aa1e03e0??????? mov???? x0, x30
??? 3f3c:?????? 94000000??????? bl????? 0 <_mcount>
??? ...
在使能內(nèi)核配置項CONFIG_FTRACE時,可以看到blk_update_request函數(shù)增加了如下部分:??? 3f3c:?????? 94000000??????? bl????? 0 <_mcount>
那么 bl 0 <_mcount> 是由誰在何時插入的呢? 答案是編譯器在編譯時插入,編譯選項-pg -mrecord-mcoun會在編譯時在每個可trace函數(shù)插入bl 0 <_mcount>,并將所有可trace的函數(shù)放到一個__mcount_loc的section中。通過查看blk-core.o的可重定位段,可以看到有大量的地址需要定位到_mcount函數(shù),其中3f3c地址正是位于blk_update_request,它會在鏈接階段被重定位到_mcount函數(shù)的地址。ubuntu@VM-0-9-ubuntu:~/qemu/kernel/linux/block$ aarch
64-linux-gnu-objdump -r blk-core.o | grep _mcount
0000000000000014 R_AARCH64_CALL26? _mcount
000000000000005c R_AARCH64_CALL26? _mcount
00000000000000ac R_AARCH64_CALL26? _mcount
0000000000000108 R_AARCH64_CALL26? _mcount
0000000000000164 R_AARCH64_CALL26? _mcount
00000000000001bc R_AARCH64_CALL26? _mcount
0000000000000214 R_AARCH64_CALL26? _mcount
...
0000000000003f3c R_AARCH64_CALL26? _mcount
...
我們還可以看到,blk-core.o有一個.rela__mcount_loc的可重定位段,里面存放了所有需要可trace函數(shù)中需要重定位到函數(shù)_mcount的地址。ubuntu@VM-0-9-ubuntu:~/qemu/kernel/linux/block$ aarch64-linux-gnu-objdump -r blk-core.o
...
RELOCATION RECORDS FOR [__mcount_loc]:
OFFSET????? ?????TYPE????????????? VALUE
0000000000000000 R_AARCH64_ABS64?? .text 0x0000000000000014
0000000000000008 R_AARCH64_ABS64?? .text 0x000000000000005c
0000000000000010 R_AARCH64_ABS64?? .text 0x00000000000000ac
0000000000000018 R_AARCH64_ABS64?? .text 0x0000000000000108
...
00000000000001b8 R_AARCH64_ABS64?? .text 0x0000000000003f3c
...
4. 鏈接階段
4.1 未開啟ftrace時的blk_update_request
未使能內(nèi)核配置項CONFIG_FTRACE時,鏈接階段與編譯階段一樣,反匯編blk_update_request函數(shù)可以看出,不包含鉤子函數(shù)4.2 開啟ftrace時的blk_update_request
ffff8000104e43c8 :
ffff8000104e43c8:?????? d10183ff??????? sub???? sp, sp, #0x60
ffff8000104e43cc:?????? a9017bfd??????? stp???? x29, x30, [sp,#16]
ffff8000104e43d0:?????? 910043fd??????? add???? x29, sp, #0x10
ffff8000104e43d4:?????? a90253f3??????? stp???? x19, x20, [sp,#32]
ffff8000104e43d8:?????? a9035bf5??????? stp???? x21, x22, [sp,#48]
ffff8000104e43dc:?????? a90463f7??????? stp???? x23, x24, [sp,#64]
ffff8000104e43e0:?????? f9002bf9??????? str???? x25, [sp,#80]
ffff8000104e43e4:?????? aa0003f6??????? mov???? x22, x0
ffff8000104e43e8:?????? 53001c38??????? uxtb??? w24, w1
ffff8000104e43ec:?????? 2a0203f5??????? mov???? w21, w2
ffff8000104e43f0:?????? aa1e03e0??????? mov???? x0, x30
ffff8000104e43f4:?????? 97ed1fde??????? bl????? ffff80001002c36c <_mcount>
ffff8000104e43f8:?????? 2a1803e0??????? mov???? w0, w24
ffff8000104e43fc:?????? 97fff432??????? bl????? ffff8000104e14c4
...
在鏈接階段,使能內(nèi)核配置項CONFIG_FTRACE時,可以看到編譯階段的如下代碼??? 3f3c:?????? 94000000??????? bl????? 0 <_mcount>
在鏈接階段已經(jīng)被替換為:ffff8000104e43f4:?????? 97ed1fde??????? bl????? ffff80001002c36c <_mcount>
其中_mcount函數(shù)反匯編為:ffff80001002c36c <_mcount>:
ffff80001002c36c:?????? d65f03c0??????? ret
5. 運行階段
5.1ftrace_init執(zhí)行后的blk_update_request
(gdb) x/20i blk_update_request
?? 0xffff8000104e43c8 :???? sub???? sp, sp, #0x60
?? 0xffff8000104e43cc :?? stp???? x29, x30, [sp,#16]
?? 0xffff8000104e43d0 :?? add???? x29, sp, #0x10
?? 0xffff8000104e43d4 :? stp???? x19, x20, [sp,#32]
?? 0xffff8000104e43d8 :? stp???? x21, x22, [sp,#48]
?? 0xffff8000104e43dc :? stp???? x23, x24, [sp,#64]
?? 0xffff8000104e43e0 :? str???? x25, [sp,#80]
??0xffff8000104e43e4 :? mov???? x22, x0
?? 0xffff8000104e43e8 :? uxtb??? w24, w1
?? 0xffff8000104e43ec :? mov???? w21, w2
?? 0xffff8000104e43f0 :? mov???? x0, x30
??0xffff8000104e43f4 :? nop
?? 0xffff8000104e43f8 :? mov???? w0, w24
?? 0xffff8000104e43fc :? bl????? 0xffff8000104e14c4
內(nèi)核在start_kernel執(zhí)行時,會調(diào)用ftrace_init,它會將所有可trace函數(shù)中的_mcount進(jìn)行替換,如上可以看出鏈接階段的 bl ffff80001002c36c <_mcount> 已經(jīng)被替換為nop指令5.2 設(shè)定trace函數(shù)blk_update_request
執(zhí)行如下命令來trace函數(shù)blk_update_requestubuntu@VM-0-9-ubuntu:~$echo blk_update_request > /sys/kernel/debug/tracing/set_ftrace_filter
ubuntu@VM-0-9-ubuntu:~$echo function > /sys/kernel/debug/tracing/current_tracer
我們再來查看blk_update_request反匯編代碼(gdb) x/20i blk_update_request
?? 0xffff8000104e43c8 :???? sub???? sp, sp, #0x60
?? 0xffff8000104e43cc :?? stp???? x29, x30, [sp,#16]
?? 0xffff8000104e43d0 :?? add???? x29, sp, #0x10
?? 0xffff8000104e43d4 :? stp???? x19, x20, [sp,#32]
?? 0xffff8000104e43d8 :? stp???? x21, x22, [sp,#48]
?? 0xffff8000104e43dc :? stp???? x23, x24, [sp,#64]
?? 0xffff8000104e43e0 :? str???? x25, [sp,#80]
?? 0xffff8000104e43e4 :? mov???? x22, x0
?? 0xffff8000104e43e8 :? uxtb??? w24, w1
?? 0xffff8000104e43ec :? mov???? w21, w2
?? 0xffff8000104e43f0 :? mov???? x0, x30
?? 0xffff8000104e43f4 :? bl????? 0xffff80001002c370
?? 0xffff8000104e43f8 :? mov???? w0, w24
?? 0xffff8000104e43fc :? bl????? 0xffff8000104e14c4
可以看到之前在blk_update_request的nop指令被替換成bl 0xffff80001002c370
(gdb) disassemble ftrace_caller
Dump of assembler code
for function ftrace_caller:
??
0xffff80001002c374 <
0>:???? stp???? x29, x30, [sp,#
-16]!
??
0xffff80001002c378 <
4>:???? mov???? x29, sp
??
// x30是blk_update_request的lr,-4是當(dāng)前執(zhí)行函數(shù)的入口地址,也就是ftrace_caller的ip??
// 它將作為參數(shù)0傳遞給ftrace_ops_no_ops??
0xffff80001002c37c <
8>:???? sub???? x0, x30, #
0x4??
// 參考前面arm64棧幀結(jié)構(gòu),x29指向上一級函數(shù)blk_update_request棧頂??
//[x29]指向blk_mq_end_request函數(shù)的棧頂??
//[[x29] 8]為blk_mq_end_request的ip(實際是ip的下條指令)??
0xffff80001002c380 <
12>:??? ldr???? x1, [x29]
??
0xffff80001002c384 <
16>:??? ldr???? x1, [x1,#
8]
??
0xffff80001002c388 <
20>:??? bl?????
0xffff800010188ffc
??
0xffff80001002c38c <
24>:??? nop
??
0xffff80001002c390 <
28>:??? ldp???? x29, x30, [sp],#
16??
0xffff80001002c394 <
32>:??? ret
End of assembler dump.
可以看到ftrace_caller會調(diào)用ftrace_ops_no_ops,我們在ftrace_ops_no_ops源碼中看到它會遍歷ftrace_ops_list鏈表,并執(zhí)行這個鏈表上的回調(diào)函數(shù),這里看下ftrace_ops_list上都鏈接了哪些func(gdb) p *ftrace_ops_list
$
4 = {
? func =
0xffff8000101a0b1c ,
//ftrace_ops_list鏈表唯一func? next =
0xffff800011c5a438 ,
//說明ftrace_ops_list鏈表只有一個func? flags =
8273,
??
private =
0xffff800011cf94e8 ,
??saved_func =
0xffff8000101a0b1c ,
??local_hash = {
??? notrace_hash =
0xffff800010cf7118 ,
????filter_hash =
0xffff00000720af80,
????regex_lock = {
????? owner = {
??????? counter =
0????? },
......
從ftrace_ops_list鏈表中可以看到只有一個function_trace_call函數(shù)組成,因此可以說ftrace_caller最終會調(diào)用到function_trace_call。通過前面的分析,我們一步步找到了blk_update_request的鉤子函數(shù)function_trace_call,其函數(shù)原型如下,其中參數(shù)ip指向ftrace_caller,參數(shù)parent_ip指向blk_mq_end_request:staticvoidfunction_trace_call(unsignedlong ip, unsignedlong parent_ip,???????????????????????????????????????????????????????????? struct ftrace_ops *op, struct pt_regs *pt_regs)下一節(jié)我們將追蹤鉤子函數(shù)的構(gòu)造以及替換過程。6. 鉤子函數(shù)的替換過程
前面我們看到blk_update_request的nop指令被替換成bl ftrace_caller,那么此處的ftrace_caller是在哪里定義的呢?我們可以看到arch/arm64/kernel/entry-ftrace.S有如下的定義:/* * void ftrace_caller(unsigned long return_address) * @return_address: return address to instrumented function * * This function is a counterpart of _mcount() in 'static' ftrace, and * makes calls to: *???? - tracer function to probe instrumented function's entry, *???? - ftrace_graph_caller to set up an exit hook */SYM_FUNC_START(ftrace_caller)
??????? mcount_enter
?
??????? mcount_get_pc0? x0?????????????
//???? function's pc??????? mcount_get_lr?? x1?????????????
//???? function's lr?
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)?????
// tracer(pc, lr);??????? nop????????????????????????????
// This will be replaced with "bl xxx"???????????????????????????????????????
// where xxx can be any kind of tracer.?
#ifdef CONFIG_FUNCTION_GRAPH_TRACERSYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
// ftrace_graph_caller();??????? nop????????????????????????????
// If enabled, this will be replaced???????????????????????????????????????
// "b ftrace_graph_caller"#endif?
???????
mcount_exitSYM_FUNC_END(ftrace_caller)通過 gdb可以看到ftrace_caller的反匯編代碼如下:(gdb) disassemble ftrace_caller
Dump of assembler code
for function ftrace_caller:
??
0xffff80001002c370 <
0>:???? stp???? x29, x30, [sp,#
-16]!
??
0xffff80001002c374 <
4>:???? mov???? x29, sp
??
0xffff80001002c378 <
8>:???? sub???? x0, x30, #
0x4??
0xffff80001002c37c <
12>:??? ldr???? x1, [x29]
??
0xffff80001002c380 <
16>:??? ldr???? x1, [x1,#
8]
??
0xffff80001002c384 <
20>:??? nop?????????????????
/*ftrace_call*/??
0xffff80001002c388 <
24>:??? nop?????????????????
/*ftrace_graph_call,暫不討論*/??
0xffff80001002c38c <
28>:??? ldp???? x29, x30, [sp],#
16??
0xffff80001002c390 <
32>:??? ret
End of assembler dump.
當(dāng)執(zhí)行echo blk_update_request >set_ftrace_filter時相當(dāng)于使能了blk_update_request的鉤子替換標(biāo)志,當(dāng)執(zhí)行echo function >current_tracer時會檢查這個標(biāo)志,并執(zhí)行替換,它會產(chǎn)生如下的調(diào)用鏈:/sys/kernel/debug/tracing
# echo function > current_tracer[??
45.632002] CPU:
0 PID:
111 Comm: sh Not tainted
5.10.0-dirty #
35[??
45.632457] Hardware name: linux,dummy-virt (DT)
[??
45.632697] Call trace:
[??
45.632981]? dump_backtrace
0x0/
0x1f8[??
45.633169]? show_stack
0x2c/
0x7c[??
45.634039]? ftrace_modify_all_code
0x38/
0x118[??
45.634269]? arch_ftrace_update_code
0x10/
0x18[??
45.634495]? ftrace_run_update_code
0x2c/
0x48[??
45.634727]? ftrace_startup_enable
0x40/
0x4c[??
45.634943]? ftrace_startup
0xec/
0x11c[??
45.635137]? register_ftrace_function
0x68/
0x84[??
45.635369]? function_trace_init
0xa0/
0xc4[??
45.635574]? tracer_init
0x28/
0x34[??
45.635768]? tracing_set_tracer
0x11c/
0x17c[??
45.635982]? tracing_set_trace_write
0x124/
0x170[??
45.636224]? vfs_write
0x16c/
0x368[??
45.636409]? ksys_write
0x74/
0x10c[??
45.636594]? __arm64_sys_write
0x28/
0x34[??
45.636923]? el0_svc_common
0xf0/
0x174[??
45.637138]? do_el0_svc
0x84/
0x90[??
45.637330]? el0_svc
0x1c/
0x28[??
45.637510]? el0_sync_handler
0x3c/
0xac[??
45.637721]? el0_sync
0x140/
0x180進(jìn)一步查看ftrace_modify_all_code的代碼,我們可以看到如下的調(diào)用流程:ftrace_modify_all_code(command)
? \--ftrace_update_ftrace_func(ftrace_ops_list_func)
?????? |--pc = (
unsignedlong)