Prototype PTEs


This document discusses a critical data structure used by the Windows Memory Manager known as the Prototype PTE which enables support for shared memory in Windows. It starts with an overview of virtual to physical address translation, virtual address descriptors, working set trimming and shared memory. It describes the need for prototype PTEs, the functionality provided by Prototype PTES and the page fault handler's use of prototype PTEs. It concludes with examples of examining prototype PTEs in the debugger.

All the debugger examples shown in this document are based on Debugging Tools for Windows version 6.12.002.633 connected to a live Windows 7 X86 target system. Some of the debugger examples use commands from CodeMachine Debugger Extension DLL (cmkd.dll).

Virtual to Physical Memory Mapping

Software running on Windows uses virtual addresses to read and write memory. The CPU's hardware memory management unit translates these virtual addresses to physical address that are used to access the actual memory contents. To perform this virtual to physical address translation the CPU refers to per-process data structures called page tables. These process specific page tables are setup by the Windows memory manager to reflect the current virtual to physical address mappings in use by the process.

The page table of a process is made up of multiple Page Table Entries (PTEs), one for each virtual page in the process. These PTEs map virtual addresses to physical addresses. The format of the PTE data structure (nt!_MMPTE) is defined by the CPU vendors and is different for X86, X86 running in physical address extension (PAE) mode and X64 CPUs.

The PTE contains the Page Frame Number (PFN) that serves as an index into a system wide database called the PFN database. The PFN database contains one entry for every physical page in the system, the memory manger uses this data structure (nt!_MMPFN) to store information about physical pages. When the CPU references a PTE it uses the PFN to locate the physical page. The following figure shows a virtual page mapped to a physical page via a PTE:

FIG#1

Figure #1 : Virtual Page mapped to a physical page

The following output shows the PTE for a virtual page that is currently mapped to physical page:

kd> !pte 40000
                    VA 00040000
PDE at C0600000            PTE at C0000200
contains 0000000035619867  contains 800000002875D847
pfn 35619     ---DA--UWEV   pfn 2875d     ---D---UW-V

The PTE displayed above has the valid bit ('V') set which tells the CPU's MMU that the rest of the bits of the PTE are valid and they can use be used for address translation. For such PTEs the PFN contains the index into the PFN database. The PFN database entry identified by the PFN in the PTE is shown below:

kd> !pfn 2875d     
    PFN 0002875D at address 83A6CE2C
    flink       00000034  blink / share count 00000001  pteaddress C0000200
    reference count 0001   Cached     color 0   Priority 5
    restore pte 00000080  containing page        035619  Active     M       
    Modified              

Virtual Address Descriptors

For every process, the memory manager needs to keep track of the virtual address ranges that are valid and the ones that are free. This information is used by the memory manager, for 2 purposes - During virtual memory allocation (i.e. calls to VirtualAlloc()/MapViewOfFile()) it is used to find a virtual address range that is currently free and hence and be allocated and during page fault handling it is used to determine if the virtual address (VA) being faulted on is valid or invalid.

The data structures, the memory manager uses for this purpose, are called the Virtual Address Descriptors (VADs). For every contiguous virtual address range of a process that is allocated there is a VAD structure tracking the range. All the VAD structures for a process are organized in the form a tree for easy lookup, insertion and deletion as shown in the following figure. For the kernel virtual address space, information about which parts of the address space are valid and invalid are directly stored in the PTEs alleviating the need for VADs.

FIG#2

Figure #2 : VAD Tree describing the user mode virtual address space of a process

The following command displays the VAD tree of process with PID 0x830.

kd> !process 0830 1
Searching for Process with Cid == 830
Cid handle table at 9294f000 with 552 entries in use

PROCESS 846ae848  SessionId: 2  Cid: 0830    Peb: 7ffdd000  ParentCid: 0670
    DirBase: 3ee334e0  ObjectTable: 988a5ab0  HandleCount:   9.
    Image: memmap.exe
    VadRoot 85327b68 Vads 19 Clone 0 Private 42. Modified 0. Locked 0.
.
.
.

kd> !vad 85327b68 
VAD     level      start      end    commit
8468d360 ( 2)         10       1f         0 Mapped       READWRITE          Pagefile-backed section
84ddde50 ( 3)         20       2f         0 Mapped       READWRITE          Pagefile-backed section
83f5b4c8 ( 1)         30       33         0 Mapped       READONLY           Pagefile-backed section
84f58490 ( 3)         40       40         1 Private      READWRITE         
84fd1a10 ( 2)         50       b6         0 Mapped       READONLY           \Windows\System32\locale.nls
83f85070 ( 4)         c0       cf         0 Mapped       READONLY           \Windows\WindowsUpdate.log
84fca828 ( 3)        120      12f         5 Private      READWRITE         
85327b68 ( 0)        1d0      20f         4 Private      READWRITE         
83d82848 ( 3)        370      46f         4 Private      READWRITE         
84756ee8 ( 2)       10d0     10d4         2 Mapped  Exe  EXECUTE_WRITECOPY  \Temp\memmap.exe
8530d378 ( 4)      752c0    75309         3 Mapped  Exe  EXECUTE_WRITECOPY  \Windows\System32\KernelBase.dll
84bae970 ( 3)      753d0    7547b         8 Mapped  Exe  EXECUTE_WRITECOPY  \Windows\System32\msvcrt.dll
83f03b20 ( 4)      75650    75723         2 Mapped  Exe  EXECUTE_WRITECOPY  \Windows\System32\kernel32.dll
85025570 ( 1)      76f80    770bb         9 Mapped  Exe  EXECUTE_WRITECOPY  \Windows\System32\ntdll.dll
84cdea50 ( 3)      771c0    771c0         0 Mapped  Exe  EXECUTE_WRITECOPY  \Windows\System32\apisetschema.dll
853022c0 ( 4)      7f6f0    7f7ef         0 Mapped       READONLY           Pagefile-backed section
84b70de8 ( 2)      7ffb0    7ffd2         0 Mapped       READONLY           Pagefile-backed section
84731e40 ( 3)      7ffdd    7ffdd         1 Private      READWRITE         
84cd70c0 ( 4)      7ffdf    7ffdf         1 Private      READWRITE         

Total VADs:    19  average level:    3  maximum depth: 4

Working Set Trimming

A process's working set are the set of pages that the process can access without incurring a page fault, such pages are also called resident pages.

When the total number of physical pages available for allocation in the system falls below a certain threshold the memory manager performs working set trimming. This is required, so that pages that are not being actively used by a process are removed from the processes working set and made available for re-allocation to processes that actually need them.

The "working set trimmer", a component of the windows memory manager, is responsible for removing pages from process's working set. This ensures that there is always an ample supply of physical pages available in the system to resolve page faults.

Once the physical page is removed from the working set of a process it is kept in a modified or standby state, depending on whether the page has been written to or not. In these states the contents of the page are kept intact, but the PTE for the page is marked as "invalid" (from a hardware perspective) and in "transition". This ensures that the physical page can either be faulted back to the process from where it was removed, in case the process accesses the page, or it can be given away to another process that needs it. Before a modified page is given away to another process the contents of the page are saved to the pagefile or a memory mapped file.

FIG#3

Figure #3 : Physical page trimmed from the process's working set

The following command displays the PTE for a virtual page for which the corresponding physical page has been removed from the working set of the process:

kd> !pte 40000
                    VA 00040000
PDE at C0600000            PTE at C0000200
contains 0000000036FAF867  contains 0000000033132886
pfn 36faf     ---DA--UWEV   not valid
                            Transition: 33132
                            Protect: 4 - ReadWrite

When a page is removed from the working set of the process the PTE becomes invalid from a hardware perspective indicated by the "not valid" shown above. In addition the PTE is put into the "Transition" state to indicate that the PTE still points to the physical page whose PFN is displayed right next to "Transition :". The CPU cannot use such a PTE to perform virtual to physical address translation and if the process attempts to access the corresponding virtual page a page fault will occur.

The following command shows the PFN database entry for a page that has been trimmed from the process's working set and for which the PTE has been put into "Transition" state. Since the process has written to the page in the past (while it was still in the process's working set) the state of the page is "Modified" as shown below:

kd> !pfn 33132
    PFN 00033132 at address 83B96178
    flink       0002E371  blink / share count 0002F53B  pteaddress C0000200
    reference count 0000   Cached     color 0   Priority 5
    restore pte 00000080  containing page        036FAF  Modified   M       
    Modified              

Sharing Memory in Windows

Windows operating system supports shared memory whereby virtual pages from multiple processes can be mapped to the same physical page. When a physical page is shared among multiple processes, the PTEs of multiple processes point to the same physical page as shown in the following figure:

FIG#4

Figure #4 : Multiple PTE's pointing to the same physical page

Shared memory is represented by a data structure called the Section Object which is created when a process calls CreateFileMapping(). Once a section object is created the view has to be mapped into user mode virtual address space by calling MapViewOfFile() before the memory becomes accessible. This is when the VADs for the virtual address range are created. These views are unmapped by calling UnmapViewOfFile().

Prototype PTEs

For a shared page it may happen that the working set trimmer removes the page from the working set of one process but not from other processes into which the page is mapped. This leads to a situation where multiple PTEs of different processes that point to the same physical page end up in different states. For example, one process's PTE could be a transition PTE whereas the other process's PTE might still be valid and would point to the physical page as shown in the figure below:

FIG#5

Figure #5 : Problem of Multiple PTE's pointing to the same shared page being in different states

When the process, that has lost a shared page from its working set, attempts to access the page again, a page fault occurs. The page fault handler resolves the fault by adding the shared physical page back to faulting process's working set. Performing this task poses a problem since the page fault handler does not know whether the actual physical page is in memory, in the pagefile or has been re-allocated to another process. This information is not available in the PTE of the faulting process which the page fault handler looks at to resolve a page fault.

So for every shared page there is a need for another data structure that stores the "real" state and location of the page. This data structure is called the prototype PTE. These prototype PTEs are allocated from paged pool with the tag 'MmSt' along with the section object and they maintain the actual location and state of the shared page.

In a nutshell, as long as a page is a part of a process's working set the PTE of the process points to the physical page. Once a page is removed from the process's working set the PTE is marked as protoype and points to the prototype PTE of the page as shown in the following figure:

FIG#6

Figure #6 : Invalid Process PTE pointing to Prototype PTE

Unlike hardware PTEs, prototype PTEs are internal memory manager data structures that are never used by the CPU to perform virtual to physical address translation. They are only used by the Windows page fault handler to resolve page faults on shared pages.

Examining Prototype PTEs (Case #1)

The following section shows the PTE for a virtual address in the system cache (0x9a600000) that has been trimmed from the cache manager's working set. System cache pages are always sharable:

kd> !cmkd.kvas 9a600000 
kvas : Show region containing 9a600000
### Start    End        Length (  MB)    Count Type    
000 9a600000 9a9fffff   600000 (   6)        2 SystemCache

The PTE of the page shown above is invalid indicating that the page has been trimmed. The PTE points to the prototype PTE as shown in the following output:

kd> !pte 9a600000 
                    VA 9a600000
PDE at C0602698            PTE at C04D3000
contains 0000000033124863  contains 8886200000000400
pfn 33124     ---DA--KWEV   not valid
                            Proto: 88862000

Prototype PTEs are allocated from paged pool and with the tag 'MmSt' as shown below:

kd> !cmkd.kvas 88862000
kvas : Show region containing 88862000
### Start    End        Length (  MB)    Count Type    
000 88800000 889fffff   400000 (   4)        1 PagedPool

kd> !pool 88862000
Pool page 88862000 region is Paged pool
*88861000 : large page allocation, Tag is MmSt, size is 0x8000 bytes
		Pooltag MmSt : Mm section object prototype ptes, Binary : nt!mm

The contents of the prototype PTE indicates that the PTE is in transition and the PTE contains the index to the PFN database entry (0x7d47) as shown below:

kd> !pte 88862000  1
                    VA 88862000
PDE at 88862000            PTE at 88862000
contains 0000000007D478C4
not valid
 Transition: 7d47
 Protect: 6 - ReadWriteExecute

The PFN database entry for the page indicates that the page is in "Standby" and the "pteaddress" points to the prototype PTE that describes the page, as shown below:

kd> !pfn 0x7d47
    PFN 00007D47 at address 836DB3C4
    flink       00015EE6  blink / share count 0001F0C8  pteaddress 88862000
    reference count 0000   Cached     color 0   Priority 5
    restore pte 84B9AB08000004C0  containing page        027B7B  Standby     P      
      Shared            

Examining Prototype PTEs (Case #2)

This example discusses a page backed by memory mapped file shared between 2 processes i.e. 2 instances of memmap.exe. The first instance i.e. PID 0x830 in which the shared memory is mapped at the virtual address 0x00c0000. The second instance i.e. PID 0xcf4 in which the shared memory is mapped at the virtual address 0x0050000.

kd> !process 0 0 memmap.exe

PROCESS 846ae848  SessionId: 2  Cid: 0830    Peb: 7ffdd000  ParentCid: 0670
    DirBase: 3ee334e0  ObjectTable: 988a5ab0  HandleCount:   9.
    Image: memmap.exe

PROCESS 8468d030  SessionId: 2  Cid: 0cf4    Peb: 7ffdf000  ParentCid: 099c
    DirBase: 3ee334c0  ObjectTable: 87e90da0  HandleCount:   9.
    Image: memmap.exe

Switch the debugger user mode virtual address and PTE context to the first instance (PID=0x830) of the process memmap.exe.

kd> .process /P 846ae848  
Implicit process is now 846ae848
.cache forcedecodeptes done

kd> !pte 0x00c0000
                    VA 000c0000
PDE at C0600000            PTE at C0000600
contains 0000000035619867  contains 80000000372C9005
pfn 35619     ---DA--UWEV   pfn 372c9     -------UR-V

Switch the debugger user mode virtual address and PTE context to the second instance (PID=0xCF4) of the process memmap.exe.

kd> .process /P 846ae848  
Implicit process is now 846ae848
.cache forcedecodeptes done

kd> !pte 0x00050000
                    VA 00050000
PDE at C0600000            PTE at C0000280
contains 0000000036FAF867  contains 80000000372C9025
pfn 36faf     ---DA--UWEV   pfn 372c9     -------UR-V

The PFN database entry for the page is as shown below. The "share count" indicates that there are 2 PTE references on the physical page 0x372C9.

kd> !pfn 372c9     
    PFN 000372C9 at address 83C08DFC
    flink       00000110  blink / share count 00000002  pteaddress 9726F008
    reference count 0001   Cached     color 0   Priority 5
    restore pte 84CE4358000004C0  containing page        0171F6  Active      P      
      Shared

The second instance of the process memmap.exe is forced to empty its working set using the Win32 API EmptyWorkingSet(). After this the PTEs of the 2 instances of memmap.exe are examined again as shown below:

The PTE for the first instance (PID=0x830) is unaffected.

kd> .process /P 846ae848  
Implicit process is now 846ae848
.cache forcedecodeptes done

kd> !pte 0x000c0000
                    VA 000c0000
PDE at C0600000            PTE at C0000600
contains 0000000035619867  contains 80000000372C9005
pfn 35619     ---DA--UWEV   pfn 372c9     -------UR-V

The PTE for the second instance (PID=0xcf4) is no longer valid.

kd> .process /P 8468d030  
Implicit process is now 8468d030
.cache forcedecodeptes done


kd> !pte 0x00050000
                    VA 00050000
PDE at C0600000            PTE at C0000280
contains 0000000036FAF867  contains FFFFFFFF00000420
pfn 36faf     ---DA--UWEV   not valid
                            Proto: VAD
                            Protect: 1 - Readonly

When the shared physical page is removed from the working of the process the PTE for that page is marked as prototype. For user mode addresses, such PTEs can contain a special signature displayed as "Proto: VAD" indicating that the VAD describing the virtual address region needs to be examined to find the prototype PTE. The page fault handler looks up the VAD to locate the prototype PTE and then uses the contents of the Prototype PTE to locate the page in order to resolve the page fault.

The following command displays the root of the VAD tree for the second instance of memmap.exe:

kd> !process 8468d030   1
PROCESS 8468d030  SessionId: 2  Cid: 0cf4    Peb: 7ffdf000  ParentCid: 099c
    DirBase: 3ee334c0  ObjectTable: 87e90da0  HandleCount:   9.
    Image: memmap.exe
    VadRoot 84c3fe98 Vads 19 Clone 0 Private 41. Modified 35. Locked 0.
.
.
.

The following command displays the VAD tree of the process using the root. The VAD covering the virtual address range 0x00050000 through 0x0005ffff is shown below:

kd> !vad 84c3fe98 
VAD     level      start      end    commit
 . . .
83fba5e0 ( 3)         50       5f         0 Mapped       READONLY           \Windows\WindowsUpdate.log
. . . 

A shared memory region can be backed by page file or by a memory mapped file as determined by the hFile parameter of CreateFileMapping(). The VAD indicates that the name of the memory mapped file backing the region is \Windows\WindowsUpdate.log.

The VAD contains pointers to the first and last prototype PTEs that covers the shared virtual address region.

kd> dt nt!_MMVAD 83fba5e0
   +0x000 u1               : 
   +0x004 LeftChild        : 0x83ed22c8 _MMVAD
   +0x008 RightChild       : (null) 
   +0x00c StartingVpn      : 0x50
   +0x010 EndingVpn        : 0x5f
   +0x014 u                : 
   +0x018 PushLock         : _EX_PUSH_LOCK
   +0x01c u5               : 
   +0x020 u2               : 
   +0x024 Subsection       : 0x84ce4358 _SUBSECTION
   +0x024 MappedSubsection : 0x84ce4358 _MSUBSECTION
   +0x028 FirstPrototypePte : 0x9726f008 _MMPTE
   +0x02c LastContiguousPte : 0x9726f080 _MMPTE
   +0x030 ViewLinks        : _LIST_ENTRY [ 0x83f850a0 - 0x84ce4350 ]
   +0x038 VadsProcess      : 0x8468d031 _EPROCESS

The following command shows that prototype PTEs are allocated from paged pool with the 'MmSt' tag:

kd> !pool 9726f008  
Pool page 9726f008 region is Paged pool
*9726f000 size:  fc8 previous size:    0  (Allocated) *MmSt
		Pooltag MmSt : Mm section object prototype ptes, Binary : nt!mm

The contents of the FirstPrototypePte field is the address of the prototype PTE of the first page in virtual address range (0x0050000). The contents of the LastContiguousPte is the address prototype PTE of the last page in the range (0x005f000). The address of first prototype PTE is used in the following command since the objective is to find the mapping for the first page in the range i.e. the page at address 0x0050000:

Decoding the contents of the prototype PTE, it can be observed that the physical page i.e. 372c9 is still valid and is in memory since the page is still a part of the working set of the other instance of memmap.exe.

kd> !pte 9726f008 1
                    VA 9726f008
PDE at 9726F008            PTE at 9726F008
contains 00000000372C9825  contains 00000000372C9825
pfn 372c9     ----A--UREV   pfn 372c9     ----A—UREV

The PFN for the physical page at this point shows the "share count" value of 1, indicating that there is now only a single PTE pointing to the physical page.

kd> !pfn 372c9     
    PFN 000372C9 at address 83C08DFC
    flink       00000110  blink / share count 00000001  pteaddress 9726F008
    reference count 0001   Cached     color 0   Priority 5
    restore pte 84CE4358000004C0  containing page        0171F6  Active      P      
      Shared

The following figure depicts the scenario described above and the relationship between the various data structures:

FIG#7

Figure #7 : Invalid Process PTE pointing to the Prototype PTE through the process's VAD

Executable files (.EXEs), Dynamic Link Libraries (.DLLs), shared memory, memory mapped files etc. depend on the capability of Windows to share pages between multiple processes which is made possible by Prototype PTEs. Many different data structures in the kernel like Process PTEs, PFN Database Entries, VADs and Section Objects point to Prototype PTEs making them an integral and critical part of the Windows Memory Manager.