Kernel Virtual Address Layout


This document explains the details of the kernel virtual address space on X64 versions of Windows 7 and Server 2008 R2. The debugger extension command !CMKD.kvas applies this theory to display the X64 virtual address space and map a given address to one of the address ranges.

Kernel Virtual Address Layout

The X64 CPU supports only 48-bits of the 64-bit virtual addresses that are used by software running on the CPU. The upper 16 bits of virtual addresses are always set to 0x0 for user mode addresses and to 0xF for kernel mode addresses. This effectively splits up the X64 address space into user mode addresses range 0x00000000`00000000 - 0x0000FFFF`FFFFFFFF and the kernel mode address range 0xFFFF0000`00000000 - 0xFFFFFFFF`FFFFFFFF. This kernel virtual address range amounts to 256TB for total kernel virtual address space accessible to Windows. Windows statically divides this virtual address space into multiple fixed sized VA regions each assigned a specific use. The start and end of each region is, for the most part, static as shown in the following table.

Start End Size Description
FFFF0800`00000000 FFFFF67F`FFFFFFFF 238TB Unused System Space
FFFFF680`00000000 FFFFF6FF`FFFFFFFF 512GB PTE Space
FFFFF700`00000000 FFFFF77F`FFFFFFFF 512GB HyperSpace
FFFFF780`00000000 FFFFF780`00000FFF 4K Shared System Page
FFFFF780`00001000 FFFFF7FF`FFFFFFFF 512GB-4K System Cache Working Set
FFFFF800`00000000 FFFFF87F`FFFFFFFF 512GB Initial Loader Mappings
FFFFF880`00000000 FFFFF89F`FFFFFFFF 128GB Sys PTEs
FFFFF8a0`00000000 FFFFF8bF`FFFFFFFF 128GB Paged Pool Area
FFFFF900`00000000 FFFFF97F`FFFFFFFF 512GB Session Space
FFFFF980`00000000 FFFFFa70`FFFFFFFF 1TB Dynamic Kernel VA Space
FFFFFa80`00000000 *nt!MmNonPagedPoolStart-1 6TB Max PFN Database
*nt!MmNonPagedPoolStart *nt!MmNonPagedPoolEnd 512GB Max Non-Paged Pool
FFFFFFFF`FFc00000 FFFFFFFF`FFFFFFFF 4MB HAL and Loader Mappings

Windows uses certain data structures like Push Locks, Ex Fast Referenced Pointers and Interlocked SLists which require CPU instructions that can perform atomic manipulation of numbers that are twice the number of bits in a virtual address. So on the X64 CPU where virtual addresses are 64-bits, a 128 bit atomic CMPXCHG instruction is needed. Earlier versions of X64 CPUs did not have such an instruction, posing a roadblock while implementing the above mentioned data structures. The X64 CPU already restricted the number of usable bits in virtual addresses to 48-bits, Windows placed a further restriction on the virtual addresses cutting them down to 44-bits. Thus the virtual address span that could store such structures was restricted to 2^44 i.e. upper 8TB to the X64 virtual address space i.e. 0xFFFFF80000000000 - 0xFFFFFFFFFFFFFFFF. As a result, the virtual address regions like "Unused System Space", "PTE Space", "HyperSpace" and "System Cache Working Set" that fell outside the limits of the 44-bit range, were unable to store these data structures. This restriction was extended to user mode as well, effectively restricting the amount of virtual address that was utilized by Windows to 8TB in user mode i.e. 0x00000000`00000000 - 0x000007FF`FFFFFFFF and 8TB in kernel mode i.e. 0xFFFFF000`00000000 - 0xFFFFFFFF`FFFFFFFF. Note that there are kernel virtual address regions outside this range i.e. FFFF0800`00000000 - FFFFF7FF`FFFFFFFF that are used by Windows, but not for general purpose allocation and data structure storage as described above.

Page size on X64 CPU is 4K. Page Table Entries (PTEs) are used by the CPU to map virtual pages to physical pages and each PTE maps a single 4K page. On X64 CPU PTEs are 64-bits (8 bytes) in size in order to be able to accommodate large physical addresses or Page Frame Number (PFNs). So a single page table page (4K) can hold only 512 PTEs. All the PTEs stored in such a page maps 2MB (512*4K) of virtual address space. Also, since Page Directory Entries (PDEs) point to page table pages, a single PDE entry maps 2MB of virtual address space. This 2MB address span is the allocation granularity within most of the virtual address regions listed in the table, above. Most of these regions have allocation bitmaps associated with them that are used to perform memory allocation within these regions in multiples of 2MB chunks. This task is performed by the memory manager internal function MiObtainSystemVa() which taken in values defined by the enumeration type nt!_MI_SYSTEM_VA_TYPE as region identifiers to allocate memory from.

Kernel Virtual Address Components

The following section describes each virtual address region in the kernel virtual address space.

Unused System Space

Start Address is in nt!MmSystemRangeStart. This space is unused on Windows 7 X64.

PTE Space

This region contains the X64 4-level page table pages for user mode and kernel mode virtual address space mappings. The various types of X64 page table pages are mapped within this range at the addresses specified below:
PTE Pages FFFFF680`00000000
PDE Pages FFFFF6FB`40000000
PPE Pages FFFFF6FB`7DA00000
PXE Pages FFFFF6FB`7DBED000

HyperSpace

Process Working Set List Entries are mapped here. For every process the EPROCESS.Vm.VmWorkingSetList contains the address 0xFFFFF700`01080000 which maps to this region. This region contains the MMWSL (Memory Manager Working Set List) structure and an array of MMWSLE (Memory Manager Working Set List Entry) structures one for each page in the processes working set. Note that the function MiMapPageInHyperSpaceWorker() is supposed to map physical pages to HyperSpace VAs i.e. actually maps them into the System PTE region not to this (HyperSpace) region.

Shared System Page

This 4K page is shared between UVAS and KVAS. It serves as a quick way to pass information between user and kernel mode. The shared data structure is nt!_KUSER_SHARED_DATA.

System Cache Working Set

Working Set and Working Set List Entries for the system cache VAs.

The kernel variable nt!MmSystemCacheWs points to the working set data structure for the system cache (i.e. nt!_MMSUPPORT). To display the working set list entries for the system cache use the command "!wsle 1 @@(((nt!_MMSUPPORT *) @@(nt!MmSystemCacheWs))->VmWorkingSetList)". These entries are used by the working set trimmer to trim physical pages from the system cache virtual addresses.

Initial Loader Mappings

NTOSKRNL, HAL and Kernel Debugger DLLs (KDCOM, KD1394, KDUSB) are loaded is this region. This region also includes the stacks for the idle threads, the DPC stacks, the KPCR and the Idle Thread structures.

Paged Pool Area

The last Paged Pool address is in the variable nt!MmPagedPoolEnd. The size of paged pool is in nt!MmSizeOfPagedPoolInBytes. MiObtainSystemVa() allocates from this area when called with MiVaPagedPool. Allocation from paged pool is controlled by the Bitmap nt!MiPagedPoolVaBitMap and the allocation hint is stored at nt!MiPagedPoolVaBitMapHint.

PFN Database

The PFN data has one entry for every physical page in the system (nt!MmHighestPossiblePhysicalPage +1) as well as PFN entries to accommodate hot-plug memory. To find the size of the PFN database the expression '? poi(nt!MmNonPagedPoolStart) - poi(nt!MmPfnDatabase)' can be used in the debugger. And to find the total number of entries in the PFN database the expression can be used '?(poi(nt!MmNonPagedPoolStart) - poi(nt!MmPfnDatabase))/ @@(sizeof(nt!_MMPFN))'. nt!MmPfnDatabase defines the start of this region.

Non-Paged Pool

Non-Paged pool region starts immediately after the PFN database. The start of non-paged pool is stored in nt!MmNonPagedPoolStart. MiObtainSystemVa() allocates from this area when called with MiVaNonPagedPool. Allocations in this region are controlled by nt!MiNonPagePoolVaBitmap and the allocation hint is stored at nt!MiNonPagedPoolVaBitMapHint.

HAL and Loader Mappings

Kernel global nt!MiLowHalVa contains the start address of this range i.e. 0xFFFFFFFFFFC00000. The VA range ends at the end of the X64 kernel virtual address space at 0xFFFFFFFFFFFFFFFF.

This region is only used during system start i.e. within the function MmInitSystem(). Memory in this address range cannot be used by the system after the initialization phase.

At the end of the system initialization MmInitSystem() calls the function MiAddHalIoMappings() which scans this VA range and determines if there any I/O mappings that have to be added to the list of I/O maps maintained by the system and if so, calls MiInsertIoSpaceMap(). For each I/O mapping MiInsertIoSpaceMap() creates a tracker entry with the pool tag MmIo "IO space mapping trackers " and adds the entry to the double linked list whose head is at nt!MmIoHeader. Each such entry represents one physical memory block that has been mapped into SysPTE region. The first few fields of these tracker entries contain some interesting information that describes physical memory and their VA mappings. The function MiInsertIoSpaceMap() is also called by MmMapIoSpace to track all adapter memory mappings in the system.

struct _IO_SPACE_MAPPING_TRACKER {
    LIST_ENTRY Link;
    PHYSICAL_ADDRESS  Pfn;
    ULONGLONG  Pages;
    PVOID Va;
    . . . 
}

Session Space

Session Data Structures, Session Pool and Session Images are loaded in this area.

The session image space contains driver images like Win32K.sys (Window Manager), CDD.DLL (Canonical Display Driver), TSDDD.dll (Frame Buffer Display Driver), DXG.sys (DirectX Graphics Driver) etc.

For any process that belongs to a session the field EPROCESS->Session points to a MM_SESSION_SPACE structure for that session. Session paged pool limits are pointed to by MM_SESSION_SPACE->PagesPoolStart and MM_SESSION_SPACE->PagesPoolEnd.

Sys PTEs

This region contains mapped views, MDLs, adapter memory mappings, driver images and kernel stacks. This region is described by the bitmap nt!MiSystemPteBitmap and the allocation hint is stored at nt!MiSystemPteBitMapHint. MiObtainSystemVa() allocates from this area when called with MiVaSystemPtes.

Dynamic Kernel VA Space

This area consists of system cache views, paged special-pool and non-paged special pool. nt!MiSystemAvailableVa contains the number of 2MB regions available in the dynamic kernel VA space. MiObtainSystemVa() allocates from this area when called with MiVaSystemCache, MiVaSpecialPoolPaged and MiVaSpecialPoolNonPaged. This region is described by the bitmap MiSystemVaBitmap and the allocation hint is stored at nt!MiSystemVaBitMapHint.

Kernel Virtual Address Space Allocation

The memory manager function MiObtainSystemVa() is used to dynamically allocate memory from various kernel VA regions in multiple of 2MB. When calling MiObtainSystemVa() the caller specifies the number of PDE entries to allocate and the type of system VA to allocate (i.e. one of the values in nt!_MI_SYSTEM_VA_TYPE). The VA types that are valid for allocation by this function are MiVaPagedPool, MiVaNonPagedPool, MiVaSystemPtes. MiVaSystemCache, MiVaSpecialPoolPaged, MiVaSpecialPoolNonPaged.

MiObtainSystemVa() satisfies VA allocation requests from different kernel VA regions. For example, requests for MiVaPagedPool are directed to the Paged Pool region, MiVaNonPagedPool is directed to the non-paged pool region, MiVaSystemPtes are directed to the System PTE region and all other allocations are directed to the Dynamic System VA region.

MiReturnSystemVa() frees memory allocated by MiObtainSystemVa(). The function MiInitializeDynamicBitmap() initializes all the bitmaps used by the MiObtainSystemVa() and MiReturnSystemVa() to allocate and free kernel VAs.

An example of dynamic memory allocations would be MiExpandSystemCache() calling MiObtainSystemVa() to obtain System Cache Views. MiExpandSystemCache() calls MiObtainSystemVa(MiVaSystemCache) to allocate virtual address space that is used by the Cache Manager VACB (Virtual Address Control Block) data structure.

SysPTE Management

The memory allocated by MiObtainSystemVa() from the SysPTE region is sub-allocated by MiReservePtes() based on the allocation bitmaps nt!MiKernelStackPteInfo and nt!MiSystemPteInfo.

The rationale behind grouping SysPTE allocations into the 2 separate categories is to prevent VA fragmentation. Since kernel stacks (especially for system and service process threads) are long term allocations whereas the other allocations like MDL and mapped views remain allocated for relatively shorter periods of time.

The 2 structures nt!MiKernelStackPteInfo and nt!MiSystemPteInfo are of type nt!_MI_SYSTEM_PTE_TYPE. These structures are setup by the function MiInitializeSystemPtes(). The bitmaps in these structures cover the entire 128GB of SysPTE area. The function MiReservePtes() is called with one of these structures to allocate VAs out of these areas. This memory is later freed with MiReleasePtes(). When the VA range covered by nt!MiKernelStackPteInfo and nt!MiSystemPteInfo get depleted the VA range is expanded by calling MiExpandPtes() which in turn calls MiObtainSystemVa(MiVaSystemPtes).

Functions like MmAllocateMappingAddress() and MmCreateKernelStack() allocate SysPTEs VAs from nt!MiKernelStackPteInfo.

Functions like MiValidateIamgePfn() and MiCreateImageFileMap(), MiRelocateImagePfn(), MiRelocateImageAgain() allocate SysPTEs VAs from nt!MiSystemPteInfo.

Mapping PFNs into HyperSpace

HyperSpace VAs are actually allocated from the Sys PTE region of the kernel virtual address space. MiMapPageInHyperSpaceWorker() maps a PFN into a kernel VA and returns the VA assigned to that mapping. Functions like MiZeroPhysicalPage(), MiWaitForInPageComplete(), MiCopyHeaderIfResident(), MiRestoreTransitionPte() etc call MiMapPageInHyperSpaceWorker() to temporarily obtain VAs mapped to physical addresess.