luacov代码覆盖率不准确问题

背景

最近在给基于openresty的应用补充单元测试, 我们选用了busted框架来跑单元测试, busted中集成了luacov, 加上配置就可以输出代码覆盖率,具体方式是.busted 配置文件中加上 coverage 选项就开启了代码覆盖率输出

coverage = true

配置好后,make test从生成的 luacov.report.out 文件中可以获取到跑单元测试时,每个文件中代码行数的覆盖情况, 以及整体代码覆盖率的汇总,类似如下:

---------------------------------------------------------
Summary
---------------------------------------------------------
File                                Hits Missed Coverage
--------------------------------------------------------
foo.lua                             89   41     68.46%
bar.lua                             29   16     64.44%
...
--------------------------------------------------------
Total                               774  739    51.16%

问题

单看上面的报告似乎没有问题,但是仔细查看单个文件的覆盖行,会发现luacov统计的并不准确,如 foo.lua 这个文件,在大括号结尾的前一行被标记为未覆盖,但是显然这些行实际该和它上一行一样不参与到覆盖率的计算

  5     FOO = {
            errId = "000001",
**0         errMsg = "error foo"
  5     },
  5     BAR = {
            errId = "000002",
**0         errMsg = 'error bar'
  5     },
  5     FOO_BAR = {
            errId = "000003",
**0         errMsg = 'error foo bar'
  5     },

在luacov的github issue上看到有用户遇到了同样的问题,luacheck的作者回答说和luajit的IIRC有关,并且写了一个luacov的C扩展 cluacov,将统计的精度和运行效率提升了

于是luarocks install cluacov, 从luacov的代码可以看到,如果安装了cluacov, luacov会默认使用这个扩展,因此不需要进行任何的配置

local cluacov_ok = pcall(require, "cluacov.version")
local deepactivelines

if cluacov_ok then
   deepactivelines = require("cluacov.deepactivelines")
end

满怀期待地再次make test, 直接出现了error, 于是只能单独执行luacov看发生了什么

luajit -lluacov xxx.lua 直接提示core dumped,于是gdb查看core dump信息

(gdb) bt
#0  0x00007f787f4dd1d4 in add_activelines () from /usr/local/lib/lua/5.1/cluacov/deepactivelines.so
#1  0x00007f787f4dd342 in l_deepactivelines () from /usr/local/lib/lua/5.1/cluacov/deepactivelines.so
#2  0x0000000000407ec2 in lj_BC_FUNCC ()
#3  0x000000000040a5ff in gc_call_finalizer ([email protected]=0x7f787f4a83f0, [email protected]=0x7f787f4a8380, mo=<optimized out>,
    [email protected]=0x7f787f4c7630) at lj_gc.c:511
#4  0x000000000040a765 in gc_finalize ([email protected]=0x7f787f4a8380) at lj_gc.c:558
#5  0x000000000040be48 in lj_gc_finalize_udata ([email protected]=0x7f787f4a8380) at lj_gc.c:565
#6  0x0000000000414591 in cpfinalize (L=0x7f787f4a8380, dummy=<optimized out>, ud=<optimized out>) at lj_state.c:272
#7  0x00000000004082b8 in lj_vm_cpcall ()
#8  0x0000000000414a24 in lua_close (L=0x7f787f4a8380) at lj_state.c:298
#9  0x0000000000404df8 in main (argc=3, argv=<optimized out>) at luajit.c:584

由于没有开启 -g 调试选项, 堆栈信息只能看到发生在了 deepactivelines.so 中,这正是刚才安装的C扩展,看来是遇上麻烦了,于是clone了源码,加上调试选项重新编译替换 deepactivelines.so

(gdb) bt
#0  0x00007f38e8c002a4 in add_activelines ([email protected]=0x7f38e8bcb380, proto=0xf66ffbf) at src/cluacov/deepactivelines.c:58
#1  0x00007f38e8c00412 in l_deepactivelines (L=0x7f38e8bcb380) at src/cluacov/deepactivelines.c:97
#2  0x0000000000407ec2 in lj_BC_FUNCC ()
#3  0x000000000040a5ff in gc_call_finalizer ([email protected]=0x7f38e8bcb3f0, [email protected]=0x7f38e8bcb380, mo=<optimized out>,
    [email protected]=0x7f38e8bd5ab8) at lj_gc.c:511
#4  0x000000000040a765 in gc_finalize ([email protected]=0x7f38e8bcb380) at lj_gc.c:558
#5  0x000000000040be48 in lj_gc_finalize_udata ([email protected]=0x7f38e8bcb380) at lj_gc.c:565
#6  0x0000000000414591 in cpfinalize (L=0x7f38e8bcb380, dummy=<optimized out>, ud=<optimized out>) at lj_state.c:272
#7  0x00000000004082b8 in lj_vm_cpcall ()
#8  0x0000000000414a24 in lua_close (L=0x7f38e8bcb380) at lj_state.c:298
#9  0x0000000000404df8 in main (argc=3, argv=<optimized out>) at luajit.c:584
(gdb) f 0
#0  0x00007f38e8c002a4 in add_activelines ([email protected]=0x7f38e8bcb380, proto=0xf66ffbf) at src/cluacov/deepactivelines.c:58
58	    const void *lineinfo = proto_lineinfo(proto);
(gdb) p *proto
Cannot access memory at address 0xf66ffbf

这次可以看到奔溃发生在了 add_activelines的 proto_lineinfo(proto) 处,翻阅代码 proto 是一个GCproto类型的指针,但是指向的内容是个非法的地址,翻看cluacov中的lj2头文件中定义的GCproto

typedef struct GCproto {
  GCHeader;
  uint8_t numparams;    /* Number of parameters. */
  uint8_t framesize;    /* Fixed frame size. */
  MSize sizebc;         /* Number of bytecode instructions. */
  GCRef gclist;
  MRef k;               /* Split constant array (points to the middle). */
  MRef uv;              /* Upvalue list. local slot|0x8000 or parent uv idx. */
  MSize sizekgc;        /* Number of collectable constants. */
  MSize sizekn;         /* Number of lua_Number constants. */
  MSize sizept;         /* Total size including colocated arrays. */
  uint8_t sizeuv;       /* Number of upvalues. */
  uint8_t flags;        /* Miscellaneous flags (see below). */
  uint16_t trace;       /* Anchor for chain of root traces. */
  /* ------ The following fields are for debugging/tracebacks only ------ */
  GCRef chunkname;      /* Name of the chunk this function was defined in. */
  BCLine firstline;     /* First line of the function definition. */
  BCLine numline;       /* Number of lines for the function definition. */
  MRef lineinfo;        /* Compressed map from bytecode ins. to source line. */
  MRef uvinfo;          /* Upvalue names. */
  MRef varinfo;         /* Names and compressed extents of local variables. */
} GCproto;

联想我使用的luajit是openresty 1.19.3.1中编译得到的,默认开启了gc64模式,而这里的头文件中没有gc64相关的宏, 很可能和这有关,查看openresty的luajit中的GCproto类型,多了一个条件编译, 开启gc64后多了一个字段

#if LJ_GC64
   uint32_t unused_gc64;
#endif

于是用openresty中几个相同的文件替换重新编译替换 deepactivelines.so

问题解决

使用新编译的so之后运行果然不再产生core dumped文件了,相同的测试代码,使用cluacov扩展得到的覆盖率如下,可以看到和luacov相比提升了2.48%,

---------------------------------------------------------
Summary
---------------------------------------------------------
File                                  Hits Missed Coverage
----------------------------------------------------------
foo.lua                               89   2      97.80%
bar.lua                               29   16     64.44%
...
----------------------------------------------------------
Total                                 774  669    53.64%

翻看foo.lua文件覆盖率统计, 之前恼人的 **0 已经消失了

  5     FOO = {
            errId = "000001",
            errMsg = "error foo"
  5     },
  5     BAR = {
            errId = "000002",
            errMsg = 'error bar'
  5     },
  5     FOO_BAR = {
            errId = "000003",
            errMsg = 'error foo bar'
  5     },

到此问题应该是解决了,于是将这个so同步更新到了ci的机器上,以后输出的覆盖率会更准确一些了, 代码改动见github


wechat
微信扫一扫,订阅我的博客动态^_^