来自 UCSB 的团队 Shellphish,为参加 DARPA 举办的 CGC 竞赛,设计并实现了 CRS(Cyber Reasoning System)Mechaphish。该系统包含自动化漏洞挖掘模块 Driller、Exploit自动生成引擎 Rex、自动补丁模块 Patcherex 以及 ropchain 生成模块 angrop。本文主要对其中的 Exploit 自动生成引擎 Rex 进行介绍,通过分析 Rex 源码,重点对 Crash 复现及可利用判断部分进行说明。弟作为一只资深菜鸡,文中难免存在不当之处,望各位师傅指正 Orz…

一、概述

Exploit 自动生成引擎 Rex 在硬件模拟器 QEMU 与二进制分析平台 angr 的基础上,通过 Concolic Execution 实现 Exploit 的自动生成。将待分析的应用程序及导致应用程序崩溃的 Crash 作为系统输入,Rex 将复现崩溃路径,并对崩溃时的寄存器状态及内存布局进行分析,判断 Crash 的可利用性,并自动生成 Exploit。

源码中对漏洞类型的定义:

二、安装

安装 Rex 存在两种方式:1)安装 Mechaphish,安装文档;2)仅安装 Rex,参考文档。二者的差别在于 Mechaphish 包含漏洞挖掘模块 Driller、自动利用模块 Rex、自动补丁模块 Patcherex 以及 ropchain 生成模块 angrop。由于各模块之间相互独立,因此本文选择仅安装自动利用模块 Rex。本地环境采用 Ubuntu 16.04.5 Desktop(64 bit)。部署过程中,Rex 所需依赖如下:

安装依赖过程中部分路径需要调整,根据提示信息修改即可。各个依赖所承担的功能如下:

组件名称 功能
angr A powerful and user-friendly binary analysis platform!
tracer Utilities for generating dynamic traces.
angrop angrop is a rop gadget finder and chain builder.
compilerex POV templates and compilation support for CGC binaries. compilerex is a hacky cgc binary compiler
shellphish-qemu Shellphish’s pip-installable package of QEMU
povsim POV simulation for CGC.

安装完成后,使用以下代码对 Rex 的功能进行测试。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# triage a crash
>>> crash = rex.Crash("./legit_00003", b"\x00\x0b1\xc1\x00\x0c\xeb\xe4\xf1\xf1\x14\r\rM\r\xf3\x1b\r\r\r~\x7f\x1b\xe3\x0c`_222\r\rM\r\xf3\x1b\r\x7f\x002\x7f~\x7f\xe2\xff\x7f\xff\xff\x8b\xc7\xc9\x83\x8b\x0c\xeb\x80\x002\xac\xe2\xff\xff\x00t\x8bt\x8bt_o_\x00t\x8b\xc7\xdd\x83\xc2t~n~~\xac\xe2\xff\xff_k_\x00t\x8b\xc7\xdd\x83\xc2t~n~~\xac\xe2\xff\xff\x00t\x8bt\x8b\xac\xf1\x83\xc2t~c\x00\x00\x00~~\x7f\xe2\xff\xff\x00t\x9e\xac\xe2\xf1\xf2@\x83\xc3t")
>>> crash.crash_types
['write_what_where']
>>> crash.explorable()
True
# explore the crash by setting segfaulting pointers to sane values and re-tracing
>>> crash.explore()
# now we can see that we control instruction pointer
>>> crash.crash_types
'ip_overwrite'
# generate exploits based off of this crash
# it may take several minutes
>>> arsenal = crash.exploit()
# we generated a type 1 POV for every register
>>> len(arsenal.register_setters) # we generate one circumstantial register setter, one shellcode register setter
2
# and one Type 2 which can leak arbitrary memory
>>> len(arsenal.leakers)
1
# exploits are graded based on reliability, and what kind of defenses they can
# bypass, the two best exploits are put into the 'best_type1' and 'best_type2' attributes
>>> arsenal.best_type1.register
'ebp'
# exploits can be dumped in C, Python, or as a compiled POV
>>> arsenal.best_type2.dump_c('legit3_x.c')
>>> arsenal.best_type2.dump_python('legit3_x.py')
>>> arsenal.best_type2.dump_binary('legit3_x.pov')
# also POVs can be tested against a simulation of the CGC architecture
>>> arsenal.best_type1.test_binary()
True

测试结果如下:

三、源码分析

查看 Rex 源码的目录结构:

分析各类之间的依赖关系,从逻辑上大致可分为四部分:

  • Exploit_factory:调用各模块,负责自动生成 Exploit;
  • Crash:复现崩溃路径,判定 Crash 的可利用性;
  • Technique:对于可利用的 Crash,采用针对性的技术,生成 Exploit;
  • Shellcode_factory:shellcode 仓库,根据需要选用合适的 Shellcode。

下文重点对 Crash 可利用性判定部分进行分析。

四、Crash 可利用性判定

Rex 以 Concolic Execution 的方式复现 crash 路径,分析崩溃时寄存器状态及内存布局,并对 crash 的可利用性进行判定,相关功能代码集中在 Crash.py 中。对原理感兴趣的同学可以参考论文《SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis》,以下是对论文原文的引用:


Vulnerable States. Unlike AEG/Mayhem, but similar to AXGEN, we generate exploits by performing concolic execution on crashing program inputs using angr. We drive concolic execution forward, forcing it to follow the same path as a dynamic trace gathered by concretely executing the crashing input applied to the program. Concolic execution is stopped at the point where the program crashed, and we inspect the symbolic state to determine the cause of the crash and measure exploitability. By counting the number of symbolic bits in certain registers, we can triage a crash into a number of categories such as frame pointer overwrite, instruction pointer overwrite, or arbitrary write, among others.


1、Concrete Execution

Concolic Execution 原理请感兴趣的同学自行查阅。angr 在实现 concolic execution 时,需要提供 crash_addr。

因此,通过 QEMU 加载二进制程序及 PoC,以获取 crash_addr。相关功能在 Tracer 模块中实现。

Crash.py 中调用 Tracer 模块的代码如下:

1
2
3
4
tracer_args={
'ld_linux': os.path.join(bin_location, 'tests/i386/ld-linux.so.2'),
'library_path': os.path.join(bin_location, 'tests/i386')}
r = tracer.QEMURunner(binary=binary, input=input_data, argv=argv, trace_timeout=trace_timeout, **tracer_args)

2、Concolic Execution

在获取 crash_addr 之后,对 angr 进行配置,并执行 Concolic Execution。 其中,较为关键的配置包括:

  • 初始状态设定
  • State Plugin 选择
  • 路径探索策略。

(1)初始状态设定

配置 simulation_manager 中的 save_unconstrained 参数。 其中 r 为 tracer.QEMURunner() 返回值,当 PoC 成功触发崩溃时 r.crash_mode 为 True,失败时为 False

通过 full_init_state()方法,设置程序的初始状态:

  • 设置 tracing 模式:mode = ‘tracing’

  • add_options:
Option name Description
so.MEMORY_SYMBOLIC_BYTES_MAP Maintain a mapping of symbolic variable to which memory address it “really” corresponds to, at the paged memory level?
so.TRACK_ACTION_HISTORY track the history of actions through a path (multiple states). This action affects things on the angr level
so.CONCRETIZE_SYMBOLIC_WRITE_SIZES Concretize the sizes of symbolic writes to memory
so.CONCRETIZE_SYMBOLIC_FILE_READ_SIZES Concreteize the sizes of file reads
so.TRACK_MEMORY_ACTIONS Keep a SimAction for each memory read and write
  • remove_options:
    由于 ‘tracing’ 模式下预制了一些选项,因此在优化策略时,不仅需要add_options,而且需要 remove_options。定义在./angr/sim_options.py中:
Option name Description
so.TRACK_REGISTER_ACTIONS Keep a SimAction for each register read and write
so.TRACK_TMP_ACTIONS Keep a SimAction for each temporary variable read and write
so.TRACK_JMP_ACTIONS Keep a SimAction for each jump or branch
so.ACTION_DEPS Track dependencies in SimActions
so.TRACK_CONSTRAINT_ACTIONS Keep a SimAction for each constraint added
so.LAZY_SOLVES Don’t check satisfiability until absolutely necessary
so.SIMPLIFY_MEMORY_WRITES Run values stored to memory through z3’s simplification
so.ALL_FILES_EXIST Attempting to open an unkown file will result in creating it with a symbolic length
  • 设置约束条件:

(2) State Plugins

SimState 属于 angr 核心概念之一,并被设计为插件式的架构,可以根据分析任务的不同,选用针对性的插件。Rex 默认选用了 ‘posix’ 与 ‘preconstrainer’。插件源码位于./angr/state_plugins/目录下。

  • SimSystemPosix()
    Data storage and interaction mechanisms for states with an environment conforming to posix.
    Available as state.posix.

  • SimStatePreconstrainer()
    This state plugin manages the concept of preconstraining - adding constraints which you would like to remove later.
    :param constrained_addrs : SimActions for memory operations whose addresses should be constrained during crash analysis

(3) 路径探索策略

路径搜索策略的选择,对符号执行来说举足轻重。由于 Rex 在采用 Concolic Execution,因此设置了 ‘Tracer’、’Oppologist’ 两种路径搜索策略。

angr 内置的路径搜索方法存储于 ./angr/exploration_techniques/ 目录下。Crash.py 中调用代码如下:

3、Crash Triage

_triage_crash() 中根据 eip、ebp 中符号变量的个数,及发生崩溃时的操作,对 Crash 类型进行判定。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def _triage_crash(self):
ip = self.state.regs.ip
bp = self.state.regs.bp

# any arbitrary receives or transmits
# TODO: receives
zp = self.state.get_plugin('zen_plugin') if self.os == 'cgc' else None
if zp is not None and len(zp.controlled_transmits):
l.debug("detected arbitrary transmit vulnerability")
self.crash_types.append(Vulnerability.ARBITRARY_TRANSMIT)

# we assume a symbolic eip is always exploitable
if self.state.solver.symbolic(ip):
# how much control of ip do we have?
if self._symbolic_control(ip) >= self.state.arch.bits:
l.info("detected ip overwrite vulnerability")
self.crash_types.append(Vulnerability.IP_OVERWRITE)
else:
l.info("detected partial ip overwrite vulnerability")
self.crash_types.append(Vulnerability.PARTIAL_IP_OVERWRITE)

return

if self.state.solver.symbolic(bp):
# how much control of bp do we have
if self._symbolic_control(bp) >= self.state.arch.bits:
l.info("detected bp overwrite vulnerability")
self.crash_types.append(Vulnerability.BP_OVERWRITE)
else:
l.info("detected partial bp overwrite vulnerability")
self.crash_types.append(Vulnerability.PARTIAL_BP_OVERWRITE)

return

# if nothing obvious is symbolic let's look at actions
# grab the all actions in the last basic block
symbolic_actions = [ ]
if self._t is not None and self._t.last_state is not None:
recent_actions = reversed(self._t.last_state.history.recent_actions)
state = self._t.last_state
# TODO: this is a dead assignment! what was this supposed to be?
else:
recent_actions = reversed(self.state.history.actions)
state = self.state
for a in recent_actions:
if a.type == 'mem':
if self.state.solver.symbolic(a.addr):
symbolic_actions.append(a)

# TODO: pick the crashing action based off the crashing instruction address,
# crash fixup attempts will break on this
#import ipdb; ipdb.set_trace()
for sym_action in symbolic_actions:
if sym_action.action == "write":
if self.state.solver.symbolic(sym_action.data):
l.info("detected write-what-where vulnerability")
self.crash_types.append(Vulnerability.WRITE_WHAT_WHERE)
else:
l.info("detected write-x-where vulnerability")
self.crash_types.append(Vulnerability.WRITE_X_WHERE)
self.violating_action = sym_action
break

if sym_action.action == "read":
# special vulnerability type, if this is detected we can explore the crash further
l.info("detected arbitrary-read vulnerability")
self.crash_types.append(Vulnerability.ARBITRARY_READ)

self.violating_action = sym_action
break

return

五、小结

以上是对 Exploit 自动生成引擎 Rex 的简要介绍,包括 Rex 是什么、如何安装、源码结构,并结合论文,着重对 Crash 可利用性判定的相关代码进行分析。弟作为一只二进制菜鸡,胡言乱语了这么多,望各位师傅批评指正 Orz…

六、参考资料:

  1. rex https://github.com/shellphish/rex
  2. Mechaphish https://github.com/mechaphish
  3. Shellphish http://shellphish.net/cgc/
  4. angr docs https://docs.angr.io
  5. angr https://github.com/angr
  6. 《SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis》 https://github.com/Ma3k4H3d/Papers/blob/master/2016_SP_angrSoK.pdf