uninformed 08 02

Published in

· 28 Dec 2019

  

PatchGuard Reloaded: A Brief Analysis of PatchGuard Version 3 
September, 2007 
Skywing  
skywing@valhallalegends.com 
http://www.nynaeve.net/ 
         
Abstract: Since the publication of previous bypass or circumvention techniques 
for Kernel Patch Protection (otherwise known as ``PatchGuard''), Microsoft has 
continued to refine their patch protection system in an attempt to foil 
known bypass mechanisms.  With the release of Windows Server 2008 Beta 3, 
and later a full-blown distribution of PatchGuard to Windows Vista / 
Windows Server 2003 via Windows Update, Microsoft has introduced the next 
generation of PatchGuard to the general public (``PatchGuard 3'').  As with 
previous updates to PatchGuard, version three represents a set of 
incremental changes that are designed to address perceived weaknesses and 
known bypass vectors in earlier versions.  Additionally, PatchGuard 3 
expands the set of kernel variables that are protected from unauthorized 
modification, eliminating several mechanisms that might be used to 
circumvent PatchGuard while co-existing (as opposed to disabling) it.  This 
article describes some of the changes that have been made in PatchGuard 3. 
This article also proposes several new techniques that can be used to 
circumvent PatchGuard's defenses.  Countermeasures for these techniques are 
also discussed. 

1) Introduction 

PatchGuard is a controversial feature of Windows x64 editions, starting with 
Windows Server 2003 x64 / Windows XP x64, and continuing on with Windows Vista 
x64 and Windows Server 2008 x64.  The design goals behind PatchGuard are to 
prevent the kind of rampant hooking and modification of various kernel code 
and data structures that has been so common on x86 versions of Windows. 
Microsoft has stated that the vast majority of kernel crashes are caused by 
third party drivers, and the author's experiences with Windows firmly support 
this supposition.  Because accessing internal kernel data structures and 
hooking kernel functions typically requires intricate synchronization with the 
rest of the system in order to be performed in a completely safe fashion, 
especially on multiprocessor machines, many third party drivers that perform 
these sorts of dangerous tasks have historically made egregious mistakes that 
have often lead to system stability or a compromise of system security.  The 
latter is especially common in cases where third party programs hook 
functions, such as system calls, and subsequently fail to perform sufficient 
parameter validation. 

Microsoft's solution to this problem is to attempt to forcibly prevent third 
party code from making unauthorized modifications to internal kernel data 
structures and code through technical means in addition to discouraging 
developers from performing such tasks.  However, due to the nature of how the 
Windows kernel (and its supporting drivers) are designed, it is not feasible 
for kernel mode drivers to run at a lower effective privilege level than the 
kernel itself.  This poses a problem with respect to Microsoft's goal of 
blocking unauthorized kernel patches due to the fact that there is no 
hardware-enforced separation between the kernel itself and third-party 
drivers.  As such, said third party drivers have free reign to manipulate 
kernel code and data as desired. 

Although emerging technologies such as TPM and hardware-assisted 
virtualization (hypervisors) may eventually provide a mechanism to deploy a 
hardware-enforced boundary between certain key parts of the kernel and the 
third party drivers that interact with it, such an approach is not generally 
applicable to most computers sold today, given the current state of the 
technology involved (with respect to both hardware and software capabilities). 
Lacking a complete, hardware-enforced solution, Microsoft has turned to other 
approaches to dissuade third party software from making unauthorized kernel 
modifications.  Specifically, the resulting kernel patch protection mechanism 
("PatchGuard") is instead based on highly obfuscated code that, while running 
at the same effective privilege level as both the kernel itself and third 
party drivers, is designed to be resilient against detection and/or 
modification by third party drivers.  This code is responsible for 
periodically checking the integrity of key kernel code and data structures and 
will bring down the system if such modifications are detected.  By virtue of 
the fact that attempting to blithely patch the kernel as was once possible on 
Windows x86 editions, attempting to perform the same operations will result in 
a system crash on x64 versions of Windows.  As such, third party drivers are 
effectively preventing from making such modifications on a large-scale basis 
with respect to code deployed on customer systems. 

However, like all systems that are founded upon the principal of security 
through obscurity, PatchGuard has inherent weaknesses.  These weaknesses can 
be exploited by third party drivers to either disable PatchGuard entirely or 
circumvent its checks altogether while peacefully co-existing with PatchGuard. 
Microsoft is fully aware of these deficiencies with respect to the fundamental 
approach taken by PatchGuard and has resorted to periodically updating 
PatchGuard in such a way as to block known public bypass techniques.  The net 
result is that Microsoft gives the impression of a ``moving target'' to any 
ISV that would defy Microsoft's wishes with respect to circumventing 
PatchGuard.  This helps to show that any code designed to stop or disable 
PatchGuard may become invalidated at some point in the future such as when 
Microsoft releases a new update for PatchGuard.  This has resulted in a small 
arms race with code to circumvent PatchGuard being written by third parties, 
and Microsoft responding by developing and deploying countermeasures in the 
form of an updated version of PatchGuard that is not susceptible to these 
bypass techniques.  This cycle has continued through several iterations 
already; in fact, PatchGuard is now being deployed to the general public in 
its third iteration. 

2) Protection Improvements 

PatchGuard 3 implements several incremental improvements designed to protect 
PatchGuard from third party code attempting to disable it as compared to 
PatchGuard 2.  The majority of the alterations to PatchGuard's self-defense 
logic appear to be direct responses to previously published, publicly-known 
bypass techniques, rather than general improvements meant to make PatchGuard 3 
more resilient to analysis and attack.  In this vein, while the alterations to 
PatchGuard 3 (over PatchGuard 2) are effective at disabling most 
previously-published bypass mechanisms that the author is aware of, it is not 
exceedingly difficult to alter many previous attack mechanisms to be effective 
against PatchGuard 3.  Many of the protection systems that were implemented in 
PatchGuard 2 are still present in PatchGuard 3 in some form or another, though 
some of them have been altered to resist previously-published attacks. 

This chapter will describe a number of specific improvements that have been 
made. 

2.1) Multiple Concurrent PatchGuard Check Contexts 

In previous PatchGuard releases, there existed a single PatchGuard check 
context that would periodically be used to verify the integrity of protected 
regions.  Some bypass techniques relied on the fact that there existed only 
one PatchGuard context by virtue of disabling any invasive kernel patching 
that would be required to ``catch PatchGuard in the act'' after locating 
PatchGuard.  PatchGuard 3 improves upon this by creating at least one 
PatchGuard context if PatchGuard is enabled, with a probability of a second 
context being initialized at system boot time This is randomized based on the 
processor time stamp counter, as all other PatchGuard randomization is done. 
Both PatchGuard check contexts, which include all of the data used by 
PatchGuard to check system integrity (including the self-decrypting check 
routine in non-paged pool memory), operate completely independently if two 
contexts happen to be used for a particular system boot. 

There are several advantages to randomly creating more than one check context. 
First of all, because the second context is not always created, an element of 
uncertainty is (theoretically) introduced into the testing and development 
process for PatchGuard bypass techniques, as it is possible that at first 
glance, an individual that is researching PatchGuard 3 might not notice that 
there is a chance to create more than one context.  This may result in lost 
time during the debugging process, as some bypass techniques are affected by 
the number of active contexts.  For example, the original bypass technique 
described by the author for PatchGuard 2 [1] effectively turned itself off 
after the first positive indication that PatchGuard was caught (although in 
this particular instance, the PatchGuard-catching hooks could have allowed to 
remain in place afterwards).   

A better example of bypass techniques that might be affected by this sort of 
scheme are those that rely on searching system pool memory for a sign of 
PatchGuard.  For example, a theoretical bypass scheme that operates by 
pro-actively locating the PatchGuard context in non-paged pool and disabling 
it somehow (perhaps by rewriting the self-decrypting code stub to expand into 
a no-operation function) might run afoul of this approach randomly during 
testing if it were not designed to re-try a pool memory scan after a positive 
hit on PatchGuard.  It also eliminates the degree of confidence that such 
memory scan approaches provide, as previously, if one had a way to locate the 
PatchGuard context in non-paged pool memory, one would either know for certain 
that PatchGuard had in fact been disabled by getting a single hit (which could 
be taken as an indication that it would now be safe to perform actions 
blockedg by PatchGuard).  With multiple check contexts having a probability to 
run, it is no longer possible for a bypass technique to have logic along the 
lines of ``if a PatchGuard context has been located and disabled, then it is 
safe to continue'', because there may exist a non-constant number of contexts 
in the wild. 

2.2) Filtering of Exception Codes Used to Trigger PatchGuard Execution 

Like PatchGuard 2, and PatchGuard 1 before it, the third iteration of 
PatchGuard is primarily executed through an unhandled exception in a DPC 
routine which, through the use of a series of structured exception handlers, 
eventually results in the self-decrypting PatchGuard stub being called in 
non-paged pool memory (based off of the DPC arguments).  This presented itself 
as a liability, as evidenced by the previous article [2] published on 
Uninformed on the subject of disabling PatchGuard (release 2).  The problem 
with using SEH to trigger execution is that there are a number of points in 
the SEH dispatching mechanism that can easily be modified by an external 
caller in order to gain execution after an exception is raised, but before a 
registered exception handler itself might be called.   

Previous techniques exploited this weakness by positioning themselves in after 
the access violation exception raised when a PatchGuard-repurposed DPC routine 
dereferenced a specially-crafted invalid pointer argument but before the SEH 
logic that invokes the PatchGuard check context in response to the access 
violation exception.  Specifically, the operating system exported routine used 
by the Microsoft C/C++ compiler for all compiler-generated SEH frames, 
_C_specific_handler, was targeted by bypass attempts described in the 
aforementioned articles.  As the SEH frame responsible for running PatchGuard 
appears to have been written in C for PatchGuard releases 1 and 2, 
_C_specific_handler would be called before the user-supplied SEH logic which 
would be responsible for executing the PatchGuard integrity check logic 
contained within the current PatchGuard context.  At this point, a bypass 
technique need only abort the execution of the PatchGuard check routine and 
cleanly extricate itself from the call stack to a known-good location in order 
to disable PatchGuard. 

However, in order for such a bypass mechanism to properly function, one would 
need to ensure that the particular exception being examined by 
_C_specific_handler is in fact PatchGuard and not a legitimate kernel mode 
exception.  Applying a PatchGuard-style bypass to the latter case would be 
disastrous and almost certainly result in the system crashing or being 
corrupted immediately after the fact.  Given this, positively identifying an 
exception from the exception dispatcher interception point is key to any 
bypass technique built upon exception dispatcher redirection.  While the 
previous two PatchGuard releases made identifying PatchGuard a trivial task. 
In both cases, a special form of invalid address, a ``non-canonical address'', 
is dereferenced to trigger the access violation that ultimately results in 
PatchGuard's check context being executed A non-canonical address is an 
address that does not fall within the subset of a 64-bit address space 
presented by modern x64 processors.   

The advantage of using a non-canonical address is clear when one examines the 
PatchGuard execution environment for a moment.  In Windows kernel mode 
programming, it is not generally possible to blindly dereference a bogus 
kernel mode pointer.  This often results in a sequence of events that bring 
down the system, depending on where the dereferenced location is.  A 
non-canonical address is a special (undocumented) exception to this rule, as 
the processor reports the exception via a general protection fault and not the 
typical page fault mechanism.  In this case, the operating system reports the 
exception as an access violation related to an access of the highest kernel 
address (0xFFFFFFFFFFFFFFFF).  This distinct signature can be used to locate 
and disable PatchGuard in a relatively safe fashion, as bogus kernel mode 
addresses should never make it to SEH dispatching (in kernel mode) unless the 
system is about to crash due to a fatal driver or kernel bug (PatchGuard being 
a special case).  Thus, it was previously possible to positively identify 
PatchGuard by looking for an access violation that referenced 
0xFFFFFFFFFFFFFFFF. 

PatchGuard 3 improves the situation somewhat by performing some pre-filtering 
of the exception data through an exception handler written in assembly (which 
thus does not invoke _C_specific_handler) before the _C_specific_handler based logic 
that actually invokes the PatchGuard check routine is executed.  Specifically, 
the pre-filtering exception handler, whose code is given below, alters the 
exception code to take on a random value which overlaps with many valid kernel 
mode exceptions.  For example, some status codes that are applicable to the 
file system space are used, such as STATUS_INSUFFICIENT_RESOURCES, 
STATUS_DISK_FULL, STATUS_CANT_WAIT.  Additionally, the exception address is 
altered as well (in some cases even set to be pointing into the middle of an 
instruction), and the dereferenced address (the second exception parameter for 
access violations) is also set to a randomized value.  After these alterations 
are made, the assembly-language exception handler passes control on to the 
_C_specific_handler based exception handler, which invokes PatchGuard.  Annotated 
disassembly for one of the assembly-language pre-filter exception handlers is 
provided below: 

; 
; EXCEPTION_DISPOSITION 
; KiCustomAccessHandler8 ( 
; /* rcx */ IN PEXCEPTION_RECORD               ExceptionRecord, 
; /* rdx */ IN ULONG64                         EstablisherFrame, 
; /* r8  */ IN OUT PCONTEXT                    ContextRecord, 
; /* r9  */ IN OUT struct _DISPATCHER_CONTEXT* DispatcherContext 
; ); 
KiCustomAccessHandler8 proc near 
  test    [rcx+_EXCEPTION_RECORD.ExceptionFlags], 66h 
loc_14009B4C7: 
  jnz     short retpoint 
  rdtsc 
; Randomize ExceptionInformation[ 1 ] 
; ( This is the "referenced address" for 
;   an access violation exception.) 
; 
; ( Note that rax is not set to any 
;   specific defined value in this 
;   context.  It depends upon the value 
;   that RtlpExecututeHandlerForException 
;   and by extension RtlDispatchException 
;   last set rax to. ) 
  mov     [rcx+(_EXCEPTION_RECORD.ExceptionInformation+8)], rax  
  xor     [rcx+(_EXCEPTION_RECORD.ExceptionInformation+8)], rdx 
  shr     eax, 5 
  and     eax, 70h 
  sub     [r8+98h], rax 
  and     edx, 7Fh 
  or      edx, 0C0000000h 
; Set ExceptionCode to a random value.  The code 
; always has 0xC0000000 set, and the lowest byte 
; is always masked with 7F.  This often results 
; in an exception code that appears like a 
; legitimate exception code.) 
  mov     [rcx+_EXCEPTION_RECORD.ExceptionCode], edx  
  lea     rax, loc_14009B4C7+1 
; Set ExceptionAddress to a bogus value.  In this case, 
; it is set to in the middle of an instruction.  This 
; may interfere with attempts to unwind successfully from 
 the exception. 
  mov     [rcx+_EXCEPTION_RECORD.ExceptionAddress], rax 
; Set Context->Rip to the same 
; bogus exception address value. 
  mov     [r8+0F8h], rax   
  and     qword ptr [r8+88h], 0 
retpoint: 
  mov     eax, 1 
  retn 

KiCustomAccessHandler8 endp 

As a direct result of scrubbing the exception and context records by the 
assembly-language exception routine, it is no longer possible to use the old 
mechanism of looking for an access violation referencing 0xFFFFFFFFFFFFFFFF in 
order to differentiate a PatchGuard exception from the many legitimate kernel 
mode exceptions.  In other words, PatchGuard attempts to hide in plain sight 
amongst the normal background noise of kernel mode exceptions, the vast 
majority of which exist inside filesystem-related code. 

2.3) Executing PatchGuard Without SEH 

One recurring theme that has continued to remain a staple for PatchGuard since 
its inception is the use of structured exception handling to obfuscate the 
calls to PatchGuard.  The intention here is to use the many differences of SEH 
between x64 and x86, and the lack of disassembler support for x64 SEH to make 
it difficult to understand what is happening when calls to PatchGuard are 
being made.  Ironically, this use of x64 SEH as an obfuscation mechanism has 
been a catalyst for much of the author's research [2] into Windows x64 SEH. 
Today, it is the author's opinion that x64 exception handling is now publicly 
documented to an extent that is comparable (or even exceeds) that available 
for x86 SEH. 

Although x64 SEH may have been useful as an obfuscation technology initially, 
it had clearly worked its way up to a major liability after PatchGuard 2 had 
been released.  This is due to the fact that SEH-related aspects of PatchGuard 
had been successfully used to defeat PatchGuard on multiple occasions.  With 
the advent of PatchGuard 3, the authors of PatchGuard siezed the opportunity 
to extricate themselves in some respect from the liability that x64 SEH had 
become. 

PatchGuard 3 introduces a special mode of operation that allows it to function 
without using SEH.  This is a significant change (and improvement) with 
respect to how PatchGuard has traditionally operated.  It eliminates a major 
class of single points of failure in that the exception dispatching path is 
particularly vulnerable to external interference in terms of third party 
drivers intercepting SEH dispatching before control is transferred to actual 
exception handlers.  The SEH-less mode of PatchGuard 3 operates by copying a 
small section of code into non-paged pool memory (as part of a PatchGuard 
context block).  This code is then referenced by a timer object's 
DeferredRoutine at the non-paged pool location in question.  The code referred 
to by the timer object is essentially a stripped down version of what happens 
when any of the re-purposed DPC routines are invoked by PatchGuard: it sets up 
a call to the first stage self-decrypting stub that ultimately calls the 
system check routine. 

By completely eliminating SEH as a launch vector for PatchGuard, many bypass 
techniques that hinged on being able to catch PatchGuard in the SEH 
dispatching code path are completely invalidated.  In an example of defense in 
depth in terms of software protection systems, the old, SEH-based system is 
still retained (with the previously mentioned modifications), such that a 
would-be attacker now has multiple isolated launch vectors that he or she must 
deal with in order to block PatchGuard from executing.  Annotated disassembly 
of the direct call routine that is copied to non-paged pool and invoked 
without SEH is presented below: 

KiTimerDispatch proc near 
  pushf 
  sub     rsp, 20h 
  mov     eax, [rsp+28h+var_8] 
  xor     r9d, r9d 
  xor     r8d, r8d 
  mov     [rsp+28h+arg_0], rax 
; [rcx+40] -> PatchGuard Decryption Key 
  mov     rax, [rcx+40h]  
  mov     rcx, 0FFFFF80000000000h 
  xor     rax, rdx 
; Form a valid address for the PatchGuard context block by 
; xoring the decryption key with the DeferredContext 
; argument. 
  or      rax, rcx 
; Set the initial code for the stage 1 self-decrypting stub. 
  mov     rcx, 8513148113148F0h  
  mov     rdx, [rax] 
  mov     dword ptr [rax], 113148F0h 
  xor     rdx, rcx 
  mov     rcx, rax 
; Call the stage 1 self-decrypting stub. 
  call    rax 
  add     rsp, 20h 
  pop     rcx 
  retn 
KiTimerDispatch endp 

2.4) Randomized Call Frames in Repurposed DPC Routine Exception Paths 

One of the bypass vectors proposed for PatchGuard 2 was to intercept execution 
at _C_specific_handler, detect PatchGuard, and resume execution at the return 
point of the PatchGuard DPC (i.e. inside the timer or DPC dispatcher).  This 
is trivially possible due to the extensive unwind metadata present on Windows 
x64 combined with the fact that a DPC that has been re-purposed by PatchGuard 
does no useful work (other than invoking PatchGuard) and has no meaningful 
effect on any out parameters or return value. 

In order to counteract this weakness, PatchGuard 3 introduces a random number 
of function calls when a re-purposed DPC is called, but before any exception 
is triggered.  The intent with this randomization of the call frame stack is 
to invalidate the approach of always unwinding one level deep in order to 
effect a return from the DPC routine in question.  Because there are a random 
number of call frames between the point at which an exception is raised and 
the start of the PatchGuard DPC routine, and the fact that the PatchGuard DPC 
routines are not exported, it is more difficult to safely return out of a 
PatchGuard DPC routine from the anywhere in the SEH dispatching code path. 

An example of the call frame randomization code is provided below (in this 
case, ecx is initialized to small, random number that denotes the number of 
calls to make).  There are a number of routines in the form of 
KiCustomRecurseRoutineN where N is [0..9], each identical. 

KiCustomRecurseRoutine4 proc near 
  sub     rsp, 28h 
  dec     ecx 
  jz      short retpoint 
  call    KiCustomRecurseRoutine5 
retpoint:  
  mov     eax, [rdx] 
  add     rsp, 28h 
  retn 
KiCustomRecurseRoutine4 endp 

Although unwinds can still be performed, an attacker would need to be able to 
locate the actual return address of the PatchGuard DPC routine which might 
involve differentiating between the bogus KiCustomRecurseRoutine calls and the 
actual call into the DPC routine itself. 

3) Additional Protection Mechanisms 

PatchGuard 3 and PatchGuard 2 both share some additional protection mechanisms 
that have not been previously described. This chapter includes a description 
of these protection mechanisms.  

3.1) Timer List Obfuscation 

PatchGuard 2 and PatchGuard 3 employ an obfuscation scheme that is used to 
obfuscate timer and DPC object pointers in the timer list.  This obfuscation 
scheme hinges around two special kernel variables, KiWaitAlways and 
KiWaitNever that represent two random obfuscation keys that are calculated at 
boot time.  These obfuscation keys are used to encode various pointers (such 
as links to DPC objects in a KTIMER object residing in the kernel timer list) 
that are intended to be protected from outside interference.  For example, the 
following algorithm is used to decode the KDPC link in a KTIMER object when a 
timer DPC is going to be executed at expiration: 

ULONGLONG Deobfuscated; 
PKDPC     RealDpc; 

Deobfuscated = Timer->Dpc ^ KiWaitNever; 
Deobfuscated = _rotl64(Deobfuscated, (UCHAR)KiWaitNever); 
Deobfuscated = Deobfuscated ^ Timer; 
Deobfuscated = _byteswap_uint64(Deobfuscated); 
Deobfuscated = Deobfuscated ^ KiWaitAlways; 

RealDpc      = (PKDPC)Deobfuscated; 

By virtue of being non-exported kernel variables, the original intention of 
such a scheme was to make it difficult for third party drivers to easily 
interfere with the timer list or certain other protected pointers.  However, 
the algorithm itself is fairly easy to understand once one locates code that 
references it (such as most any timer-related code in the kernel), which 
simply leaves detecting the values of KiWaitAlways and KiWaitNever at runtime 
as the only remaining protection for the timer list to DPC object obfuscation. 

Ironically, the kernel debugger extension !kdexts.timer implements the 
decoding algorithm (in kdexts!KiDecodePointer) so that a valid timer list can 
be presented to the user if the timer display command is invoked.  Because the 
kernel debugger has access to PDB symbols for the kernel, it can trivially 
locate KiWaitAlways and KiWaitNever. 

3.2) Anti-Debugging Code at PatchGuard Initialization Time 

As with PatchGuard 2, PatchGuard 3 includes a sizable amount of anti-debugging 
code at runtime that is intended to frustrate attempts to step through the 
PatchGuard initialization routines with a debugger.  Most of this code is 
based upon checking if a debugger is present while the PatchGuard 
initialization routines are executing (which should not typically occur as the 
PatchGuard initializtion routines are only called if a debugger is not 
attached), and if a debugger is so detected, disable interrupts and entering a 
spin loop so as to unrecoverably freeze the system. 

Although this anti-debugging code may appear intimidating at first, disabling 
them is only a matter of locating all references to KdDebuggerNotPresent 
within the PatchGuard initialization routine and patching out the checks into 
the debugger.  For example, the author used the following set of commands in 
the debugger at initialization time to disable the anti-debugging checks for 
Windows Vista x64 SP0, kernel version 6.0.6000.16514: 

bp nt!KeInitAmd64SpecificState + 12 "r @edx = 1 ; r @eax = 1 ; g" 
bp nt!KiFilterFiberContext 
eb nt!KiFilterFiberContext+0x20 eb 
eb nt!KiFilterFiberContext+0x19a eb 

eb fffff800`01c63d22 eb 
eb fffff800`01c64686 eb 
eb fffff800`01c652be eb 
eb fffff800`01c65334 eb 
eb fffff800`01c65880 eb 
eb fffff800`01c65a65 eb 
eb fffff800`01c67479 eb 
eb fffff800`01c68798 eb 
eb fffff800`01c6a940 eb 
eb fffff800`01c6b7a9 90 90 
eb fffff800`01c6b7dd eb 
eb fffff800`01c6bad9 eb 
eb fffff800`01c6d0e7 eb 
eb fffff800`01c6d2f6 eb 
eb fffff800`01c6d650 eb 
eb fffff800`01c65c3a 90 90 90 90 90 90 
eb fffff800`01c690b1 90 90 90 90 90 90 

3.3) KeBugCheckEx Protection 

One of the first bypass mechanisms proposed for PatchGuard 1 was to hook the 
code responsible for bugchecking the system[4].  From there, an 
attacker would simply resume normal system execution. 

There are several defensive mechanisms in place to prevent this.  In the the 
current version of PatchGuard, the entire contents of the thread stack are 
filled with zeros, making it difficult to resume execution of whichever thread 
was responsible for calling into PatchGuard.  Furthermore, PatchGuard appears 
to make a copy of KeBugCheckEx at system initialization time, and copy this 
version over the actual code residing within the kernel at runtime just before 
bringing down the system in a bug check.  This is clearly visible by making a 
modification to KeBugCheckEx in the debugger just as one enters the PatchGuard 
check context, and then setting a breakpoint on the internal function in the 
PatchGuard context to call KeBugCheckEx after clearing the stack and all 
registers.  If one then examines KeBugCheckEx, any modifications that have 
been made will have vanished. 

Additionally, PatchGuard appears to disable DbgPrint (patching it out with a 
"ret" opcode) before calling KeBugCheckEx.  This may have been a (failed) 
attempt to prevent easy access to execution within KeBugCheckEx without 
actually patching KeBugCheckEx itself, which would circumvent the 
aformentioned protection on modifications to the bugcheck code itself. 
(KeBugCheckEx ordinarily utilizes DbgPrintEx to display a banner to the 
debugger when a bug check occurs.  However, because PatchGuard only patches 
DbgPrint, there is no little to no effect in terms of what ends up happening 
when the bug check finally does happen.) 

This code can be seen in the PatchGuard check routine, just before a call to 
the KeBugCheckEx wrapper is made.  The pointer to DbgPrint is established 
during PatchGuard initialization at boot time. 

mov     rax, [rbx+PATCHGUARD_CONTEXT.DbgPrint] 
mov     byte ptr [rax], 0C3h ; '+' ; ret 

3.4) Two-Stage Code Deobfuscation 

One of the more interesting defensive features of PatchGuard 2 and PatchGuard 
3 is the mechanism by which it obfuscates the PatchGuard check context, or the 
code and data necessary to verify system integrity.  PatchGuard contexts are 
obfuscated such that they are completely randomized in-memory while inactive, 
and change their location and obfuscation keys (and thus contents) each time 
the context is invoked to check system integrity. 

The decryption phase of PatchGuard is split into two stages.  The first stage 
is essentially a small stub that remains completely obfuscated in-memory until 
just before it is called.  The caller overwrites the first instruction in the 
stub that is called with a "lock xor qword ptr [rcx], rdx" instruction.  The 
arguments to the stub are the address of the stub itself (in rcx), and the 
decryption key (in rdx).  Thus, the first instruction now modifies itself (and 
more importantly the subsequent instruction, as each instruction is 4 bytes 
long but modifies 8 bytes of opcode bytes), which results in being another xor 
instruction.  A small series of these xor instructions continues until the 
second stage of the decoding stub is completely decoded. 

At this point, the second stage of the decoding stub is plaintext and may now 
execute.  The second stage consists of a loop of xor operations starting at 
the end of the PatchGuard context and moving backward until the entire check 
routine is decoded.  Additionally, the decryption key is shifted each xor 
round during the second stage decoding process. 

After the second stage decoding loop is complete, control is transferred to 
the now-plaintext integrity check routine (all of the supporting data, such as 
critical function pointers into the kernel, will also have been translated 
into plaintext at this point by the second stage decoding loop). 

Source code to a basic program to decrypt a PatchGuard memoy context is 
included with the article.  The program expects to be supplied with a file 
containing "dq" logs from the kernel debugger that cover the entire memory 
context, along with the decryption key (at KDPC + 0x40) and 
KDPC->DeferredContext values. 

4.5) Code Patching Support 

Given PatchGuard's penchant for blocking attempts to patch the kernel, one 
would think that all kernel code is essentially expected to be fixed in stone 
at boot time.  However, this is not really the case.  There are a number of 
approved kernel patches that PatchGuard supports.  For example, several 
functions (such as SwapContext) can be patched in approved ways if hypervisor 
support is enabled.  In the case of SwapContext, for instance, a runtime patch 
is made to redirect execution to EnlightenedSwapContext through a jump 
instruction being written to the start of the routine.  PatchGuard appears to 
detect and permits patches to these functions through special exemptions (one 
can observe the address of functions such as SwapContext being stored in the 
PatchGuard context at initialization time, presumed to be for such a purpose). 

The code responsible for checking the integrity of the SwapContext patch is 
provided below.  Because the check ensures that a branch can only occur to 
EnlightenedSwapContext, it would be difficult to utilize this code to perform 
an arbitrary patch at SwapContext. 

cmp     rdi, [rbx+PATCHGUARD_CONTEXT.SwapContext] 
jnz     short NotSwapContextExemption 
cmp     byte ptr [rdi], 0EBh ; 'd' ; backward jmps (short) 
jnz     short NotSwapContextExemption 
cmp     byte ptr [rdi+1], 0F9h ; '·' 
jnz     short NotSwapContextExemption 
cmp     byte ptr [rdi-5], 0E9h ; 'T' ; jmp (long) 
jnz     short NotSwapContextExemption 
mov     rcx, [rbx+PATCHGUARD_CONTEXT.EnlightenedSwapContext] 
movsxd  rax, dword ptr [rdi-4] 
sub     rcx, rdi 
cmp     rax, rcx 
jz      short BadSwapContextHook 

There also exists a second set of patches that PatchGuard must allow for 
compatibility with older processors.  Very early releases of x64 processors by 
Intel did not implement the prefetch instruction, and so the kernel has 
support for detecting an illegal opcode fault on a prefetch instruction, and 
reacting by patching out the prefetch opcode on-the-fly.  However, this sort 
of on-the-fly patching is not normally permitted by PatchGuard (for obvious 
reasons), at least not without special support.  During initialization, 
PatchGuard generates some code that executes a prefetch operation, and then 
checks whether the the count of patched prefix instructions was incremented 
after executing the patch code.  Assuming that the processor is an older model 
without prefetch support, then a special exemption (the "prefetch whitelist") 
is activated the exempts a list of RVAs from the image base from PatchGuard's 
checks.  This list of RVAs is stored in a binary resource appended to 
ntoskrnl.exe (named "PREFETCHWLIST"). 

The code for detecting if the prefetch exemption should be enabled at boot 
time is as follows (the result of the check is, for Windows Server 2008 Beta 
3, stored at offset 2B1 into the PatchGuard context): 

call    KeGetPrcb 
mov     ecx, 2 
cmp     [rax+63Dh], cl  ; Prcb->CpuVendor 
mov     [rsp+0EC8h+var_D48], rax 
jnz     short SkipEnablePrefetchPatchExemption 
lea     rdx, [rsi+214h] ; PrefetchRoutineCode 
mov     dword ptr [rdx], 0C3090D0Fh ; prefetch [rcx] ; ret 
mov     ebx, cs:KiOpPrefetchPatchCount 
lea     rcx, [rsp+0EC8h+arg_18] 
call    rdx 
mov     ecx, cs:KiOpPrefetchPatchCount 
cmp     ebx, ecx 
jz      short SkipEnablePrefetchPatchExemption 

mov     [rsi+2B1h], dil ; EnablePrefetchPatchExemption 

SkipEnablePrefetchPatchExemption: 
; 
; Initialization continues ... 
; 
mov     eax, 100000h 

4) Bypass Mechanisms and Countermeasures 

Like PatchGuard 2, it would be folly to state that PatchGuard 3 is 
invulnerable to assault by third party driver code intent on performing 
operations blocked by PatchGuard.  There are many possible attacks for the new 
defenses in PatchGuard 3 (as well as several possible countermeasures that 
Microsoft could take in order to break the proposed bypass mechanisms in a 
future PatchGuard iteration).  This article will describe specific attacks 
that is capable of defeating PatchGuard 3. 

4.1) Hybrid Exception Interception and Memory Searching 

As PatchGuard 3 utilizes completely randomized (self-decrypting) blocks of 
code and data for its constituent PatchGuard contexts in the SEH execution 
case, it is not generally possible to trivially locate and disable PatchGuard 
contexts through a non-paged pool scan.  Additionally, due to PatchGuard 3's 
break on relying upon SEH to invoke PatchGuard in all cases, it is also not 
generally possible to disable PatchGuard 3 reliably via interception of the 
SEH dispatching code path. 

While these defenses do complement one another, there still exists weaknesses 
that can be exploited by a third party.  Specifically, when PatchGuard is 
running through a re-purposed DPC routine that is invoked via SEH, it is 
vulnerable in that the SEH dispatching code path can be intercepted to locate 
(and disable) PatchGuard just before it is executed.  Furthermore, in the case 
where PatchGuard runs without any SEH obfuscation, it is vulnerable to a 
memory search, as there is (necessarily) some static code placed in non-paged 
pool memory which makes the translation between the DPC function calling 
convention and the PatchGuard stage 1 decryption routine's calling convention. 

By combining a memory search approach with the previously described SEH 
interception approach, it is possible to attack both launch vectors of 
PatchGuard simultaneously, with the effect of disabling it no matter which 
vector(s) are used in a particular boot. 

However, there are still some sticking points that need to be resolved in the 
SEH interception case.  As previously mentioned, the SEH-obfuscation-based 
launch vector was significantly improved over PatchGuard 2, with obfuscation 
of the exception information and randomization of the call stack from the 
point of view of the exception dispatcher logic itself.  These obstacles must 
be overcome in order to successfully mount an attack using this approach 
against PatchGuard 3. 

The first problem relating to the obfuscation and randomization of the 
exception information turns out to not be the roadblock that one might think 
at first glance.  There are some weaknesses of the obfuscation logic that 
allow the true colors of the exception to show through if one is clever about 
examining the information available at the point of _C_specific_handler. 
Furthermore, it is also possible to hook at a lower level than 
_C_specific_handler, such as KiGeneralProtectionFault (easily located by 
examining the IDT), which would get one in before the assembly-language 
exception handler logic has a chance to fudge the exception information. 

Although the KiGeneralProtectionFault vector is easier to implement in that it 
completely bypasses one of the new defensive mechanisms with respect to the 
SEH-related PatchGuard execution code path, it is again still possible to 
attack PatchGuard using _C_specific_handler by relying upon information 
leakage when _C_specific_handler is called.  Specifically, all exceptions 
altered by PatchGuard originate within the confines of the kernel itself, all 
of the exceptions have two parameters (most of the "legitimate" versions of 
exceptions like STATUS_INSUFFICIENT_RESOURCES always have zero parameters, 
because they originate from within RtlRaiseStatus which never stores any 
exception parameters in the exception record), and somewhere in the call stack 
the kernel routine responsible for dispatching DPCs or timer DPCs is going to 
be present. 

By combining these facts, it is possible to make a highly accurate 
determination as to whether an exception is caused by PatchGuard.  The latter 
piece of information (checking whether the routine responsible for calling the 
DPC or timer DPC is in the call stack) also proves valuable when one must 
later counteract the second defense added to the SEH code path, that is, the 
randomization of the call stack. 

In order to determine whether the DPC or timer DPC dispatcher is in a given 
call stack, it is first necessary to locate it in the kernel image.  There are 
some complications here.  First of all, the timer DPC dispatcher routine has 
three call instructions that can call a timer DPC, not all of which are 
readily triggerable.  Additionally, neither the timer DPC dispatcher or the 
DPC dispatcher are exported. 

However, while it is not possible to simply ask for the addresses of those two 
routines, it is possible to find them programmatically by requesting that a 
DPC and a timer DPC be executed through the documented APIs for DPCs and timer 
DPCs.  From within the DPC or timer DPC routine, it is then possible to locate 
the return address via the use of the ReturnAddress() compiler intrinsic. 
This works because the return address will be guaranteed to reside within the 
DPC or timer DPC dispatcher.  Alternatively, an assembly language routine 
could be written that simply examines the current pointer at [rsp] at the time 
of the call. 

This still leaves a problem in the timer DPC dispatcher case, as there are 
three call instructions, and it is not easy to observe calls from all three 
call sites within the timer DPC dispatcher on-demand, since it is necessary to 
programmatically find the return points at runtime.  However, once again, the 
very same metadata that is critical to x64 SEH support dooms PatchGuard with 
respect to this approach, as it is possible to go from an arbitrary 
instruction in the middle of any function to the start of that function, by 
following chained unwind metadata until an unwind metadata block is reached 
that has no parent [4].  This top-level unwind metadata block has a reference 
to the first instruction in the function.  Now that it is possible to locate 
the start of a function from any arbitrary valid instruction location within 
that function, it becomes trivial to determine if two addresses reside in the 
same function; to do this, one must only follow the unwind metadata chain for 
both addresses, and then check to see whether both top-level unwind metadata 
blocks refer to the same function.  With this technique, combined with the 
ability to locate at least one call site within the timer DPC dispatcher, it 
again becomes possible to identify the timer DPC dispatcher, as no matter 
which call site is used, it will be guaranteed that the call site resides 
within the timer DPC dispatcher routine KiTimerExpiration.  By comparing 
top-level unwind metadata blocks, it becomes possible to authoritatively 
discern whether any arbitrary instruction resides within the timer DPC 
dispatcher or not. 

It is also possible to bypass the alterations to the exception (and 
instruction pointer) addresses that KiCustomAccessHandler (the 
assembly-language "first chance" exception handler routines for the repurposed 
DPC routines) makes by performing a stack trace from the _C_specific_handler 
itself instead of relying on the context record or exception handler 
information.  This is because the call stack is conveyed as if the faulting 
instruction in the repurposed DPC call stack was the site of a call to 
KiGeneralProtectionFault.  As a result, it is possible to substitute the 
current context for the context presented to _C_specific_handler for unwind 
purposes.  This also provides a layer of defense against Microsoft altering 
other registers in the exception handler context in future PatchGuard 
revisions, which could cause manual unwinds to return incorrect register 
values, resulting in system crashes after an unwind intended to effect a hard 
return out of the re-purposed DPC routine. 

Furthermore, by clever usage of this mechanism for determining whether an 
address resides within a particular function, it is also now possible to 
determine the real return address for any given re-purposed DPC routine. 
Specifically, by checking whether each address in the call stack as of 
_C_specific_handler is within either the DPC dispatcher or the 
timer DPC dispatcher, one can determine whether a given call frame corresponds 
to the call site that called the re-purposed DPC routine or not, irrespective 
of any random amount of bogus function calls that may be layered on top of the 
re-purposed DPC.  This in turn defeats the remaining improvement to the SEH 
PatchGuard code path, as it once again becomes possible to cleanly unwind from 
any arbitrary point in the PatchGuard exception callstack. 

Through the combination of the ability to either circumvent entirely or "see 
through" the deception that KiCustomAccessHandler creates over the exception 
information passed to _C_specific_handler, and the ability to 
recover the correct return address of a repurposed DPC routine, it now becomes 
possible to disable the SEH control flow path of PatchGuard 3.  This leaves 
the remaining problem of locating the non-SEH control flow path of PatchGuard 
in non-paged pool memory as the last piece of the puzzle with respect to this 
method of disabling PatchGuard.  However, locating the trampoline routine that 
adapts a DPC routine call to a PatchGuard stage 1 decryption stub call is 
trivial, as the adapter trampoline is static and contains a very recognizable 
signature in terms of the constants written to the beginning of the decryption 
stub.  In order to disable the trampoline routine, it is enough to simply 
patch it with a "ret" instruction (effectively the same thing as the SEH 
bypass technique, but as implemented in code instead of a virtual unwind). 

The source code to a working implementation of the hybrid exception 
interception and memory searching bypass technique for PatchGuard 3 is 
included with the article. 

Although this approach is successful in disabling the current iteration of 
PatchGuard 3, it is not without its weaknesses.  Microsoft could, for 
instance, disable this technique via altering the SEH-less PatchGuard 
DPC-to-decryption-stub adapter to not be static (i.e. randomization of the 
code placed into non-paged pool at runtime).  There are also a number of 
assumptions of the SEH-based approach that could be invalidated by Microsoft 
in a future PatchGuard release.  However, in keeping with the fact that it is 
possible to gain control flow at a lower level than the exception dispatcher 
path itself (i.e. patching KiGeneralProtectionFault), the author feels that it 
would be better to focus on removing relevant information before any exception 
handlers (assembler or C-language) are called instead of after the defining 
moment (in other words, the exception) occurs, as it is the exception that 
presents the first easily-accessible interception point to an outside 
attacker. 

4.2) Timer DPC Dispatcher and DPC Dispatching 

Although PatchGuard 3 eliminates SEH as a single point of failure with respect 
to executing the system integrity checks, the timer and DPC dispatchers 
continue to remain attractive targets.  One simple bypass mechanism is to 
locate the call sites in both routines (such as by recording the addresses of 
both dispatcher routines as described in bypass technique 1, and then 
performing disassembly to locate and patch all call sites.  At each call site, 
it is possible to detect that PatchGuard is being executed by looking for 
either a non-canonical DeferredContext parameter value or a DeferredRoutine 
that resides within the non-paged pool.  (In PatchGuard 3, implementing the 
former check alone proves sufficient, as for the ease of the implementation of 
PatchGuard 3, both the repurposed DPC routines and the non-SEH-based control 
path use compatible calling conventions, which stipulate a non-canonical 
obfuscated pointer value as the DeferredContext parameter.) 

The main disadvantage of this approach involves inherent difficulties in 
performing arbitrary code patching in x64 (specifically, the large size of any 
code patch and the large number of now relatively common 
instruction-pointer-relative instructions).  However, given that this is a 
difficulty that impacts any code patching on x64, the author feels that it 
should not be considered a significant problem for  a determined attacker.  In 
fact, Microsoft Research's very own Detours implements a code patching system 
for x64[5], illustrating that code patching on x64 in general 
is not a task that should be considered insurmountable by any means. 

Because the timer and DPC dispatchers remain relatively unprotected targets 
that have not been involved in public bypass source code that has been 
released to date, the author would recommend bolstering the defenses of the 
timer and DPC dispatcher for the next PatchGuard release, as the two routines 
continue to represent an attractive single point of failure.  Adding a third 
PatchGuard execution mechanism that does not involve traditional DPCs at all 
would be an example of one approach to eliminate the DPC dispatcher related 
logic as a single point of failure.  It may also be possible to increase the 
difficulty of locating all the call sites within the DPC dispatching related 
code through a combination of differing static call stack differences for each 
of the three call sites of the timer DPC dispatcher (i.e. adding dummy 
function calls) combined with call stack randomization on top of static call 
stack differences between each of the three timer DPC dispatcher calll sites. 
Randomized call stacks alone would not suffice as by examining the call stacks 
of many iterations of timer DPC requests, it would become easy to eliminate 
the randomized entries (which would not be common to all recorded call stacks) 
with a relatively high degree of accuracy given a large sample size.  A 
disadvantage to taking such an approach is that it would essentially result in 
adding deliberately-difficult-to-maintain "spaghetti code" into yet another 
critical area of the operating system (timer DPC dispatcher logic).  The 
author suspects that the maintainer of the timer DPC dispatcher code would 
likely not appreciate having to deal with such things. 

4.3) Canceling the PatchGuard Timer(s) 

As PatchGuard continues to rely upon timer DPCs for the execution of its check 
routines, the kernel timer DPC list itself continues to remain a relatively 
attractive target for attack.  The timer DPC list is common to all control 
paths leading to PatchGuard, as timers are always used for the delayed 
execution component that periodically calls the check routine. 

There are presently two obstacles in the way of the timer DPC list.  The first 
of which is that altering it relies upon locating non-exported kernel 
variables.  Although it may be possible to do so via fingerprinting, this does 
make the approach slightly less desirable than it might initially appear. 
However, fingerprinting can work if done carefully, and there are many short 
functions that reference the timer list in a fairly predictable fashion (e.g. 
KeCancelTimer).  One other possible way to find the DPC list would be to 
create and set a timer (thus inserting it into the timer list), and then scan 
every 8-byte-aligned value in a non-paged uninitialized data section in 
ntoskrnl, treating each valid address as a linked list and searching the first 
several entries for the timer that was just linked into the list.  While a 
rather ugly and bruteforce-based approach (and not entirely safe either as one 
would need to be relying heavily on MmIsAddressValid), scanning the ntoskrnl 
data sections is one alternative to fingerprinting in terms of finding the 
timer list. 

The secondary problem with this approach is that starting with PatchGuard 2, 
the timer list itself is obfuscated such that the link between a KTIMER object 
and its corresponding KDPC is obfuscated.  This obfuscation mechanism, as 
previously described{backref to 1}, hinges upon two additional non-exported 
kernel variables (KiWaitAlways, KiWaitNever) that act as obfuscation keys. 
Locating these variables would be likely entail code analysis or 
fingerprinting of (possibly exported) routines that need to insert a timer 
into the timer list, such as KeSetTimerEx. 

Another alternative approach that dispenses with fingerprinting and/or 
bruteforce-based approaches altogether, at the expense of requriring added 
complexity (a user mode component), would be to postpone the activation of any 
driver code that would run afoul of PatchGuard until after Win32 in user mode 
has been started.  A user mode service could then be created that would 
download the symbols for the kernel binary in use, retrieve the addresses of 
KiTimerTableListHead (the timer list), KiWaitNever and KiWaitAlways, and pass 
these addresses on to the driver via any standard user mode to kernel mode 
communication mechanism (such as DeviceIoControl).  Because the kernel 
debugger relies on the ability to retrieve these variables by name via the PDB 
symbols for the !kdexts.timer extension, Microsoft would not be able to block 
this approach by removing or renaming the obfuscation key variables without 
imparing the functionality of existing debugger binaries. 

Once one has located the KiTimerTableListHead, KiWaitAlways, and KiWaitNever, 
it is a fairly simple (if perhaps unsafe without synchronization, though one 
could always take the "sledgehammer" approach and stop all but one CPU and 
raise IRQL to HIGH_LEVEL) to traverse the timer list, deobfuscate 
the DPC link on each corresponding timer object, and from there check each 
timer to see whether it bears the characteristics of being a PatchGuard timer 
(which may include attributes like a timer interval several minutes into the 
future, a non-canonical DeferredContext value, and possibly a DPC routine 
pointer into non-paged pool).  After one has located the timer in question, it 
can be easily disabled (either removing it from the list entirely, such as via 
KeCancelTimer, or by rewriting the DPC routine to point to an empty function 
that simply returns without performing any operation. 

Because Microsoft has functionality in the debugger that depends on the 
ability to use these variables to access the timer list, they have 
unfortunately backed themselves into something of a corner with respect to 
current operating system versions, as it is generally Microsoft's policy that 
existing debugger binaries continue to function properly after hotfixes or 
service pack to a particular already-released operating system version.  The 
best ways to counteract this approach would be to make it more difficult to 
pick out the PatchGuard DPC in-memory with respect to all of the other timer 
DPC objects that are in the list at any given time for a typical system, and 
to create additional launch vectors for PatchGuard that do not depend so 
heavily on the timer list.  There exist a number of other ways to execute code 
without drawing the attention of someone that does not know what they are 
looking, many of which are less obvious than a timer. 

4.4) Page-Table Swap 

Like all memory accesses in the Windows kernel, PatchGuard's system integrity 
check routine operates in protected mode with paging enabled.  It may 
theoretically be possible to take advantage of this fact to hide kernel 
patches from PatchGuard. 

The proposed bypass technique would involve patching the first instruction in 
the timer and DPC dispatchers to branch to third party code.  When a DPCs and 
timer DPCs are about to be considered for execution, as signaled by a call to 
one of the two dispatcher routines, a shadow copy of the page tables is 
created.  This shadow copy is configured to be identical to the normal page 
table for the current process, except that the page table entries for any 
kernel code  pages that have been patched are altered to refer to physical 
pages that are representative of the original state.  The return address of 
the DPC or timer DPC dispatcher on the stack is swapped with a pointer into 
driver-supplied code, and cr3 is reconfigured to point to the shadow page 
table.  Then, execution is transferred back to the timer or DPC dispatcher 
entrypoint (which no longer shows any signs of patching due to the page table 
swap), and DPCs are dispatched.  When the dispatcher is finished with its 
work, which would include invoking PatchGuard if PatchGuard is to be executed 
in any batched timer DPCs, then control is returned to driver-supplied code, 
which then mirrors any page table modifications since the shadow copy was made 
back to the actual page table for the process, and cr3 is returned to its 
original value.  Control is then transferred to the normal return point of the 
dispatcher. 

This approach does not involve disabling PatchGuard at all.  Instead, it 
describes a potential way to "peacefully coexist" with it, so long as only 
kernel code patches are being done.  (Data pages, which could be expected to 
be modified by a DPC, are considered by the author to be much less practical 
to protect from PatchGuard in this fashion.)  Because the DPC and timer DPC 
dispatcher logic executes at IRQL DISPATCH_LEVEL, thread context 
switching is disabled for the current thread, making the cr3 swap approach 
relatively feasible. 

Because this approach does not involve attacking PatchGuard directly, it 
automatically circumvents all of the myriad defensive mechanisms built into 
PatchGuard in current releases, making it a fairly attractive potential avenue 
of attack.  However, there are some downsides.  Among other things, the 
synchronization required to pull a page tabpe swap off in a multiprocessor 
environment are likely to be complex and difficult to safely duplicate if one 
allows DPC routines to perform operations that alter PTEs.  Additionally, 
there would be a performance impact incurred by this approach as it would need 
to run continuously in a relatively high-impact path (DPC dispatching) 
throughout system lifetime.  The performance implications of invalidating TLBs 
on every DPC batch may be problematic in some circumstances (swapping cr3 
automatically clears out TLBs). 

Another disadvantage of this approach is that by virtue of the fact that all 
DPCs (and potentially all device hardware interrupts) may run with the shadow 
copy of the page table, most hardware-related events will not be subject to 
kernel code patches hidden by this mechanism.  This may or may not be a 
problem depending on what the goal of the desired kernel patching is. 

Microsoft could counteract this approach by making a copy of all PTEs that 
describe the kernel at PatchGuard initialization time, and then validate all 
kernel code PTEs from within the PatchGuard check routine.  Additionally, if 
Microsoft could make the assumption that PatchGuard always executes in the 
system process, another approach could be to require that cr3 take on a known 
value. 

4.5) DPC Exception Handler Patching 

One of the changes introduced in PatchGuard 3 over PatchGuard 2 was a slight 
change to the protocol used to invoke the first stage of the decryption 
process.  Specifically, all callers of an encrypted PatchGuard context now 
include a static 8-byte string (of instruction opcodes) that is xor'd with a 
value at the start of the PatchGuard context to form the initial decryption 
key. 

The reasons for making this change over the original behavior are unclear to 
the author, but it unfortunately represents an easy target for disabling 
PatchGuard, as the string itself (0x8513148113148F0) is fairly unique and 
unlikely to appear outside of PatchGuard in terms of kernel code. 
Furthermore, all PatchGuard callers, including all ten of the repurposed DPC 
routine exception handlers and the non-paged pool memory DPC adapter (if used) 
reference the string with no obfuscation to speak of.  This presents an 
extremely easy, fingerprint-based approach to disabling PatchGuard.  By 
scanning non-paged pool space for this string, as well as kernel code regions, 
it is trivially easy to locate an instruction in the middle of the every 
single code path responsible for invoking PatchGuard's check context. 

After the instructions referencing the 8-byte string have been located, it is 
trivial to patch them to execute an unwind out of the exception handler logic 
(or in the case of the non-paged pool memory code, simply return directly). 
Such an attack prevents PatchGuard from ever starting, and furthermore has the 
advantage of a minimum of additional supporting logic required (when compared 
to many of the other bypass techniques outlined in this article). 

It would be trivial for Microsoft to disable this technique.  The 
recommendation of the author would be to get rid of the static 8-byte string 
referenced in every PatchGuard caller.  Ironically, PatchGuard 2 necessarily 
has a similar 4-byte string (which is also still used in PatchGuard 3), 
representing the initial instruction of the first stage decryption stub. 
Unlike with PatchGuard 3, however, PatchGuard 2 takes care to obfuscate the 
process of writing the opcode string out to the PatchGuard context, so that 
one cannot simply use a single blanket fingerprint to cover all cases.  The 
change made in PatchGuard 3 completely blows this work out of the water, so to 
speak, and it has the added advantage of being twice as large as a value to 
fingerprint as well. 

4.6) System Call MSR Swap 

A variation on the technique described in {backref:4}, it should theoretically 
be possible to swap the system call MSRs (or in fact several other processor 
control registers that are protected by PatchGuard) for the duration of DPC or 
timer DPC dispatching online, with the "tainted" values being restored after 
the dispatcher returns.  The system call MSRs are responsible for designating 
the address of the system call dispatcher, and are thus an attractive target 
for third parties that would like to perform system call hooking. 

The same basic concepts would be applied to this technique as previously 
described in the cr3 swap technique.  If system calls are the only desired 
targets to hook, then the cr3 swap can be eliminated as unnecessary for single 
processor systems (as it would be safe to make and restore changes to the 
actual underlying physical pages before and after a DPC dispatcher call, using 
the return address on the stack as a way to return to the altered location 
without leaving opcodes patched in the kernel across dispatcher invocations). 
For multi-processor systems, some mechanism would need to be developed to 
allow the MSR swap to be made across DPC dispatchers while preventing code 
patches from becoming visible to a second processor.  This is necessary 
because there could be more than one PatchGuard context executing 
simultaneously with the PatchGuard 3 addition of a probability to initialize a 
second check context at system boot time. 

In order to block such a technique, Microsoft would likely be best served by 
making it difficult to locate all the regions necessary to patch in order to 
maintain the deception of an unpatched system across PatchGuard checks.  The 
principal way to do this would be to create other, alternative launch vectors 
for PatchGuard that are unrelated to DPCs and, preferably, do not involve 
exported APIs that are easy to intercept from a third party perspective. 

5) Conclusion 

Although PatchGuard 3 does bring some pointed counter-attacks to many 
previously disclosed bypass techniques, version 3, like its predecessors, is 
hardly immune to being either disabled completely or simply co-existed with. 
It is likely that future revisions to PatchGuard will continue to be 
vulnerable to a variety of bypass techniques, though it is certain within 
Microsoft's reach to counter many of the publicly disclosed bypass vectors. 
It is anticipated by the author that until PatchGuard can be implemented with 
hardware support, such as via a combination of trusted boot (TPM) and a 
permanent hypervisor, future revisions will continue to be vulnerable to 
attack from determined individuals. 

On the other hand, Microsoft's efforts with PatchGuard appear to have paid off 
so far in terms of preventing a mass-uptake of PatchGuard-violating drivers on 
Windows x64.  In other words, a case could be made that Microsoft doesn't need 
to be perfect with PatchGuard, only "good enough" to give vendors cold feet 
about trying to ship products that bypass it.  Only time will tell if this 
continues to remain the case into the future, however. 

References 

[1] Skywing. Subverting PatchGuard version 2. 
    http://www.uninformed.org/?v=6&a=1&t=sumry; accessed September 16, 2007 

[2] Skywing. Programming against the x64 exception handling support, part 7: Putting it all together, or building a stack walking routine. 
    http://www.nynaeve.net/?p=113; accessed September 16, 2007 

[3] skape. Improved Automated Analysis of Windows x64 Binaries.

  
http://uninformed.org/index.cgi?v=4&a=1&t=sumry; accessed September 16, 2007 

[4] skape, Skywing.  Bypassing PatchGuard on Windows x64. 
    http://uninformed.org/index.cgi?v=3&a=3&t=sumry; accessed September 16, 2007 

[5] Microsoft. Detours.  
    http://research.microsoft.com/sn/detours/; accessed September 16, 2007

uninformed 08 02

Share this article

Let's discover also

uninformed 06 01

uninformed 05 03

uninformed 08 02

uninformed 09 04

uninformed 01 01

uninformed 04 07

uninformed 01 03

uninformed 09 03

uninformed 01 06

uninformed 02 02

Recent Articles

White Chocolate Mousse Cake 🍫

Carrot Cake Roll

Springtime Strawberry Rhubarb Cheesecake

What a fog

Easy No-Bake Pistachio Cheesecake

Proto-writing and cranial deformation of the Paracas civilization

Quipu and Quellca: numbers and writing of the Incas

The Ishango bone, a clue to archaeo-mathematics in antediluvian Africa

The origin of Christopher Columbus: the official documents

Origin of the name America

Recent Comments