Introduction

There are not many ways to monitor system service requests on Windows because of Kernel Patch Protection (KPP). Some system service requests can be implied through using callbacks for threads, processes, etc., as provided by ObRegisterCallBacks. However, you cannot hook the system service dispatch table because KPP will prevent you from doing so. I will detail a proof-of-concept for monitoring system service requests through exceptions from user-mode that will be handled in kernel-mode.

System Service Dispatch Table (SSDT)

The System Service Dispatch Table, also known as the System Service Descriptor Table, is a structure that contains pointers to routines. It is a list of pointers that reference handlers for each system call that is invoked. This is the table utilized when user-mode switches to kernel-mode through a system service request and the internal corresponding routine in ntoskrnl.exe/win32k.sys is called. The processor mode switch is not to be confused with a context switch nor does it result in one. NTDLL.DLL is a module exported and mapped into user-mode processes. It contains callbacks from system routines such as when a thread is created in the process, but it also contains stubs for system service dispatching. These are also known as system call wrappers.

mov eax,00000010 ; this is the system service request number
syscall ; initiates the system call
ret 

The system service request number is used to locate the corresponding service handler in the dispatch table. When a system service request occurs, the kernel routine KiSystemService is executed which creates a trap frame that saves the previous processor mode state (user-mode) which contains information such as registers. Later on in the chain of routines, the kernel service routine address is calculated and executed. The trap frame is used when the system task is finished so we can return to our previous user-mode state. This table below is provided to aid in the visualization of the execution logic

System Call -> (user-mode to kernel-mode switch) -> KiSystemService chain (KiSystemService uses the system service request number to match to a service handler)

Service Number	Service Routine
0x0003000F	NtClose
0x00060034	NtDelayExecution
0x0000003A	NtWriteVirtualMemory

Kernel Patch Protection (KPP)

Kernel Patch Protection (KPP), formerly known as PatchGuard, is a kernel code and data structure integrity protection feature in Microsoft Windows 64-Bit introduced in Windows XP. KPP can provide security mechanisms to protect from system call interception and redirection, which otherwise can be exploited by malware, as well as maintaining the integrity of other data in the NT Kernel.

KPP protects structures such as kernel stacks, the system service dispatch table, global descriptor table, and the interrupt descriptor table from modifications, as well as many other types of structures. KPP also protects certain system routines from modification.

The security mechanism is employed through obscure routine offshoots and multiple KPP contexts exist at a time to routinely check kernel-mode code pages and other kernel data. KPP contexts are encrypted and there is also a series of timed checks. A check can occur at any time; it can be milliseconds before something is checked, seconds, minutes, and so on.

There are several schedulers to fire KPP routines such as timer objects, system thread creation, ACPI events, etc.

When a KPP exception occurs such as detecting a code change, there is a resulting kernel bug check which will display CRITICAL_STRUCTURE_CORRUPTION and similar types of critical kernel errors.

Kernel Patch Protection is what prevents us from initially being able to hook the SSDT and filter routines through it, but this is the reason methods have been devised to simulate the functionality of what KPP prevents us from exploiting.

Several bypasses have been created to for KPP. One includes hooking RtlCaptureContext which is called by KeBugCheckEx. KeBugCheckEx is a kernel routine used to shut down the system and displays an error message on the screen with additional information such as parameters. RtlCaptureContext, according to Microsoft’s documentation, captures the thread context of the caller and reads it into a supplied buffer passed as an argument to the routine. The bypass then checks if KPP made the call to RtlCaptureContext and restores the initial execution information of the unit. In KeBugCheckEx, the additional parameters used to leak information about KPP contexts as well. KPP also creates a copy of KeBugCheckEx and then invokes the copy.

Interrupt Descriptor Table (IDT)

The Interrupt Descriptor Table (IDT) is a table that is specific to Intel’s architecture and is implemented in protected mode (32-bit). It is referenced when interrupts, traps, and specific tasks are invoked. The IDT associates each interrupt with an interrupt service routine. An interrupt is invoked with the INT (Interrupt) x86 instruction.

1: Int 0x03 -> KiBreakpointTrap

In the case provided above, interrupt type-3 is used to invoke a breakpoint, when the interrupt is fired, it executes KiBreakpointTrap which is an interrupt service routine.

The address of the interrupt descriptor table is stored in the IDTR x86 CPU register and the address is loaded into this register using the LIDT (load interrupt descriptor table) x86 instruction.

The Interrupt Descriptor Table can have up to 256 entries. On Windows, the IDT can be visualized like this

Interrupt Number	Interrupt Service Routine
0x00000000	KiDivideErrorFault
0x00000001	KiDebugTrapOrFault
0x00000002	KiNmiInterrupt
0x00000003	KiBreakpointTrap

Kernel Patch Protection prevents the IDT from being hooked, intercepted, and modified. Hooking the IDT could also allow for you to intercept system calls which could serve as an alternative to SSDT hooking. Windows 64-bit always uses the sysenter/syscall instructions which use MSRs (machine-specific-registers) to transition a system service request from user-mode to kernel-mode, this is faster than using an interrupt. Windows 32-bit uses an interrupt for system calls and the corresponding interrupt service routine is KiSystemService.

(Interrupt -> Interrupt Descriptor Table -> Interrupt Service Routine)

System Service Requests 32-bit vs. 64-bit (64-bit OS)

Windows 32-bit and Windows 64-bit system service requests differ in implementation on a 64-bit operating system. For one, 32-bit processes utilize WoW64 (Windows 32-bit on Windows 64-bit). WoW64 is essentially a compatibility subsystem that allows for the completion of 64-bit system service requests from 32-bit processes.

The system user-mode module, NTDLL.DLL, contains a list of system call stubs that call into the WoW64 call gate code page. The WoW64 call gate code contains a far jump.

jmp 0x0033:address

The far jump uses the 0x0033 segment selector to transition the current thread from protected mode to long mode. This makes a change in the code segment (cs register). The execution then transitions to the WoW64 module mappeds into the process which will execute the syscall/sysenter instruction.

On Windows 64-bit, none of this is necessary because the processor is already executing in long mode. The system call stubs do not make a call to a WoW64 call gate code page, but instead just contain the necessary instructions, such as the system service request number being moved into the accumulator register and then the syscall/sysenter instructions to transition the processor from user-mode to kernel-mode.

From user-mode, you could hook a WoW64 routine or the call gate code page in NTDLL.DLL to intercept system service requests before the processor transitions to kernel-mode such as by intercepting executions at wow64cpu._BTCpuIsProcessorFeaturePresent+0x4A6. Code locations and memory may vary by service pack. However, this only works for a single process. There is a method to globalize this, but this concept will be introduced in the system service dispatcher hooking simulation section.

Hooking - System Service Dispatcher

Monitoring System Calls

In this section, I will detail the simulation of system service dispatch hooking. As we know, there are two methods for performing a system service request. Through an interrupt or through the syscall/sysenter instruction that uses MSRs. Both methods reach the same destination, KiSystemService and its chain of routines.

We don’t want to reach KiSystemService just yet, instead we want to monitor it while still having the processor in kernel-mode. This would be accomplished by using an operation similar to the post-exploit threat discovered by CyberArk, BoundHook.

System service requests initiated by the user-mode system module, NTDLL.DLL, are essentially done by a chain of system call stubs which contain about the same number of bytes per stub.

32-bit (NtClose)

mov eax,0x0003000F
mov edx,call_gate
call edx
ret 0x0004 
nop 

64-bit (NtClose)

mov r10,rcx
mov eax,0x0000000F
syscall 
ret 
nop [rax+rax+0x00]

Each system call stub is written into memory like that. The syscall/sysenter instruction is 2-bytes wide and call edx (32-bit) is also 2-bytes wide in instruction length.

Modifying a Shared Memory Section

As I previously stated, we want to monitor system service requests in kernel-mode. So, we have to change the system call stubs so that both 32-bit and 64-bit initiate bound exceptions which will be caught by kernel callbacks and then we will later on use the syscall/sysenter instruction.

Since each system call stub has the same amount of bytes, we can iterate through the chain by offsetting on a constant size and changing the following instructions on 32-bit:

mov eax,0003000F
mov edx,call_gate -> xor eax, eax
call edx -> bound eax, [esp]
-> Bound Trap (set edx in kernel handler)
-> call edx
ret 0004 
nop 

And the following instructions on 64-bit:

mov r10,rcx
mov eax,0000000F
syscall -> xor rax, rax
ret  -> bound rax, [rsp]
nop [rax+rax+00] 
-> syscall
-> ret

There are multiple ways to go about globally modifying the modules, and no, we are not enumerating through each process because that is ridiculous and can be done in user-mode. System modules/DLLs are loaded into the system and then the kernel maps them into each loaded process; they are not mapped by file. This is done through the use of shared section mapping. The modules are shared sections in the memory.

So, why doesn’t writing to one module make it persist the changes for every other process? The answer is copy-on-write protection which is implemented through the control registers, CR0, to be specific. The 16th bit of CR0 is the WP bit which employs this protection mechanism. A physical page is allocated, but for every write to an untouched page, a new virtual page is created which is the copied version. This prevents the changes from persisting to each process.

The following code can be used to disable write protection temporarily and then write to these code pages. This all has to be done on the same processor affinity

The following code can flip the WP bit on or off.

void SetWriteProtection(BOOLEAN Protection){
    QWORD cr0 = __readcr0();
    switch (Protection) {
    case WP_OFF:
        if ((BOOLEAN)((cr0 >> 16) & 0x1) == WP_ON) {
            cr0 ^= 1ULL << 16;
            __writecr0(cr0);
        }
        break;
    case WP_ON:
        if ((BOOLEAN)((cr0 >> 16) & 0x1) == WP_OFF){
            cr0 ^= 1ULL << 16;
            __writecr0(cr0);
        }
        return;
    }
}

The other method we can use is to remap the sections to another location, which a POC is provided here.

PVOID GetNtDllMapping() {
    PVOID Address = NULL;
    HANDLE SectionHandle = NULL;
    OBJECT_ATTRIBUTES ObjectAttributes = { 0 };
    UNICODE_STRING ObjectName = RTL_CONSTANT_STRING(L"\\KnownDlls\\ntdll.dll");
    SIZE_T ViewSize = 0;
    NTSTATUS Status = STATUS_UNSUCCESSFUL;

    InitializeObjectAttributes(&ObjectAttributes, &ObjectName, NULL, NULL, NULL);

    if  (!NT_SUCCESS(Status = ZwOpenSection(&SectionHandle, SECTION_ALL_ACCESS, &ObjectAttributes))) {
        DbgPrint("Demeter: Unable to open Ntdll shared section\r\n");
        goto ExitPoint;
    }

    if (!NT_SUCCESS(Status = ZwMapViewOfSection(SectionHandle, ZwCurrentProcess(), &Address, 0, 0, NULL, ViewSize, ViewUnmap, NULL, PAGE_READWRITE))) {
        DbgPrint("Demeter: Unable to map view of NtDll shared section\r\n");
        goto ExitPoint;
    }

ExitPoint:
    if (SectionHandle != NULL)
        ZwClose(SectionHandle);

    return Address;
}

Registering a Callback

On the other side, kernel-mode, we need to have a kernel-module that registers a callback for bounds exceptions which is done using KeRegisterBoundCallback. The supplied callback will run in the context of the user-mode thread that initiated the exception. Through this, we can monitor the CPU state at the time the system service request was initiated and make changes if necessary.

Conclusion

System Calls Monitored

Now that a callback has been registered and the list of system call stubs has been completely rewritten so that a bound exception will be initiated before requesting a system service, we can check the type of system service being requested through the system call number / system service request number, we can check which process the request came from, which thread, etc…

KPP Circumvented

Through this, we avoided the detection of kernel patch protection because we did not touch any kernel data structures, yet we can now still monitor and modify system service requests in kernel-mode.

Other Ideas

There are other ways that this can probably be done using the same type of implementation, such as modifying KiUserExceptionDispatcher and changing syscall/sysenter and/or call edx to an interrupt - all three operations are 2-bytes wide. The exception dispatcher would see the exception and then feed the data to a kernel module. Afterwards, handle the exception and continue at the exception address.

Credits

Jason - longmode (Publication & Concept)
Internal - InternalRecursion (Concept)