LDE incorrectly reads 0x48 as a single byte assembly instruction (x64)
0x48 as far as I understand is supposed to indicate qword and is part of a longer instruction.
In the following example notice that:
mov rax,qword ptr [rsp+68h]
Incorrectly becomes the 32 bit version:
mov eax,dword ptr [rsp+68h]
Original:
CContext::ID3D11DeviceContext1_Map_<1>:
000007FEF23F6F00 48 83 EC 38 sub rsp,38h
000007FEF23F6F04 48 8B 44 24 68 mov rax,qword ptr [rsp+68h]
000007FEF23F6F09 4C 0F BE 52 61 movsx r10,byte ptr [rdx+61h]
000007FEF23F6F0E 4C 8D 1D 3B 7F 06 00 lea r11,[c_MapFns (07FEF245EE50h)]
000007FEF23F6F15 48 89 44 24 28 mov qword ptr [rsp+28h],rax
000007FEF23F6F1A 8B 44 24 60 mov eax,dword ptr [rsp+60h]
000007FEF23F6F1E 48 83 C1 90 add rcx,0FFFFFFFFFFFFFF90h
000007FEF23F6F22 89 44 24 20 mov dword ptr [rsp+20h],eax
000007FEF23F6F26 43 FF 14 D3 call qword ptr [r11+r10*8]
000007FEF23F6F2A 48 83 C4 38 add rsp,38h
000007FEF23F6F2E C3 ret
Detoured: (Notice the 48 goes missing from the mov originally on 000007FEF23F6F04)
CContext::ID3D11DeviceContext1_Map_<1>:
000007FEF23F6F00 E9 09 D0 75 E7 jmp R_ID3D11ImmediateDeviceContext_Map (07FED9B53F0Eh)
000007FEF23F6F05 8B 44 24 68 mov eax,dword ptr [rsp+68h]
000007FEF23F6F09 4C 0F BE 52 61 movsx r10,byte ptr [rdx+61h]
000007FEF23F6F0E 4C 8D 1D 3B 7F 06 00 lea r11,[c_MapFns (07FEF245EE50h)]
000007FEF23F6F15 48 89 44 24 28 mov qword ptr [rsp+28h],rax
000007FEF23F6F1A 8B 44 24 60 mov eax,dword ptr [rsp+60h]
000007FEF23F6F1E 48 83 C1 90 add rcx,0FFFFFFFFFFFFFF90h
000007FEF23F6F22 89 44 24 20 mov dword ptr [rsp+20h],eax
000007FEF23F6F26 43 FF 14 D3 call qword ptr [r11+r10*8]
000007FEF23F6F2A 48 83 C4 38 add rsp,38h
000007FEF23F6F2E C3 ret
Trampoline:
0000000008291D20 48 83 EC 38 sub rsp,38h
0000000008291D24 48 FF 25 00 00 00 00 jmp qword ptr [8291D2Bh]
where 8291D2Bh is 000007FEF23F6F05 (mov eax... assembly above)
The detour actually copied 5 bytes, however visual studio's disassembler is fine thinking that 48 is OK at the beginning of an absolute jump (48 FF 25). Apparently actually running this code is fine too. The 5 bytes came from 1 for the first 0x48, 3 for the (83 EC 38) sub instruction, and another 1 for the next 0x48.
I'm not sure if this is because the LDE disassembler you use doesn't properly detect x64, or something else. A proposed fix (may not be the best) is manual detection of 0x48 in GetDetourLenAuto:
while(totalLen < jmpType)
{
size_t len = LDE(reinterpret_cast<LPVOID>(lpbDataPos), 0);
if (len == 1 && *lpbDataPos == 0x48)
{
++lpbDataPos;
++totalLen;
len = LDE(reinterpret_cast<LPVOID>(lpbDataPos), 0);
}
lpbDataPos += len;
totalLen += len;
}
This leads to the following correct assembly:
Detoured:
000007FEF23F6F00 E9 09 D0 58 E8 jmp R_ID3D11ImmediateDeviceContext_Map (07FEDA983F0Eh)
000007FEF23F6F05 90 nop
000007FEF23F6F06 90 nop
000007FEF23F6F07 90 nop
000007FEF23F6F08 90 nop
000007FEF23F6F09 4C 0F BE 52 61 movsx r10,byte ptr [rdx+61h]
000007FEF23F6F0E 4C 8D 1D 3B 7F 06 00 lea r11,[c_MapFns (07FEF245EE50h)]
000007FEF23F6F15 48 89 44 24 28 mov qword ptr [rsp+28h],rax
000007FEF23F6F1A 8B 44 24 60 mov eax,dword ptr [rsp+60h]
000007FEF23F6F1E 48 83 C1 90 add rcx,0FFFFFFFFFFFFFF90h
000007FEF23F6F22 89 44 24 20 mov dword ptr [rsp+20h],eax
000007FEF23F6F26 43 FF 14 D3 call qword ptr [r11+r10*8]
000007FEF23F6F2A 48 83 C4 38 add rsp,38h
000007FEF23F6F2E C3 ret
Trampoline:
00000000083AB500 48 83 EC 38 sub rsp,38h
00000000083AB504 48 8B 44 24 68 mov rax,qword ptr [rsp+68h]
00000000083AB509 FF 25 00 00 00 00 jmp qword ptr [83AB50Fh]
where 83AB50Fh is 000007FEF23F6F09 (movsx r10... assembly above)
Hope this fix makes it in or you find out what's wrong with LDE. Either way, I love the library and I'm just glad I was able to understand what was happening!
LDE incorrectly reads 0x48 as a single byte assembly instruction (x64)
0x48 as far as I understand is supposed to indicate qword and is part of a longer instruction.
In the following example notice that:
mov rax,qword ptr [rsp+68h]Incorrectly becomes the 32 bit version:
mov eax,dword ptr [rsp+68h]The detour actually copied 5 bytes, however visual studio's disassembler is fine thinking that 48 is OK at the beginning of an absolute jump (48 FF 25). Apparently actually running this code is fine too. The 5 bytes came from 1 for the first 0x48, 3 for the (83 EC 38) sub instruction, and another 1 for the next 0x48.
I'm not sure if this is because the LDE disassembler you use doesn't properly detect x64, or something else. A proposed fix (may not be the best) is manual detection of 0x48 in GetDetourLenAuto:
This leads to the following correct assembly:
Hope this fix makes it in or you find out what's wrong with LDE. Either way, I love the library and I'm just glad I was able to understand what was happening!