Copy Link
Add to Bookmark
Report

Assembly Programming Journal Issue 03

  

::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::. Feb/March 98
:::\_____\::::::::::. Issue 3
::::::::::::::::::::::.........................................................

A S S E M B L Y P R O G R A M M I N G J O U R N A L
http://asmjournal.freeservers.com
asmjournal@mailcity.com




T A B L E O F C O N T E N T S
----------------------------------------------------------------------
Introduction...................................................mammon_

"An Introduction to SPARC assembly"............................+Spath.

"Extending NASM"...............................................mammon_

Column: Win32 Assembly Programming
"NASM specific Win32 coding".......................Tamas Kaproncai
"More about Text".........................................Iczelion
"Keyboard Input"..........................................Iczelion

Column: The C standard library in Assembly
"C string functions: introduction, _strlen".................Xbios2
"C string functions: _strcpy"...............................Xbios2

Column: The Unix World
"X-Windows in Assembly Language: Part II"..................mammon_

Column: Virtual Machines
"An Intro to the Java Virtual Machine"............Cynical Pinnacle

Column: Assembly Language Snippets
"NumFactors"..........................................Troy Benoist

Column: Issue Solution
"6-byte Solution"..........................................mammon_
----------------------------------------------------------------------
++++++++++++++++++++++++Issue Challenge+++++++++++++++++++++
Write a routine for converting ASCII hex to binary in 6 bytes
----------------------------------------------------------------------


____________________________________________________________________________
___ .___ __) (__ _____ ______ ```
._____| \____\ ___/__._) /._) _ (_. \\
| | _\ |_ \ | \/ | |CE ,
.=|_____|___)\___|(_______|______| |===============[ Introduction ]===.
'================================| :=================================='
: . by mammon_


The first thing that you will notice about this issue --well, that it is late--
will probably be the section headers designed by iCE. I had to add a top/upper
left border to them [the horizontal and slanted lines] in order to make them
standout when scrolling though a 100K file such as this one, but other than
they are all his: comments, etc welcome.

I don't have much to say about this issue: I went overboard with the NASM stuff
this month as I have been doing a lot of 'research' work in that area recently;
my articles have been supplemented with Tamas Kaproncai's Win32 NASM pointers.

Iczelion and XBios2 have both produced --as usual-- 2 quality articles this
month, Iczelion's based on his win32 asm tutorial 'the MASM way', and XBios2
once again continuing to replace C with assembler.

+Spath. has produced an excellent article on SPARC assembly language; I was
hoping to debut the 'other CPU' scene with a MIPS article I had planned but it
looks like +Spath has beat me to it.

On a similar note, I mentioned on the Message Board wanting to start a Virtual
Machines column. Cynical Pinnacle has started the column off this month with an
article on programming the Java VM in its native 'assembly language'; in
subsequent issues I and perhaps others will be adding articles here as well.

A final note, I have not come up with a challenge for the next issue; anyone
with good ideas is welcome to post one to the Message Board or to the APJ
email address.

Enjoy the mag!

_m


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
An introduction to SPARC assembly
by +Spath.


The goal of this article is to introduce SPARC v8 architecture and SPARC
assembly ; I hope it can also constitute a good introduction to RISC
philosophy.


What is SPARC ?
-----------------
The principles of RISC (Reduced Instructions Set Computer) are born in the
early 80's in two universities (Berkeley and Stanford) ; its philosophy is
the quest for simplicity and CPU speed. SPARC (Scalable Processor ARChitecture)
is a 32 bits RISC architecture created by Sun in 1987. It's an open
architecture, so that any manufaturer can make SPARC processors (like Philips,
VLSI, T.I., Fujitsu... already did). Its key features are :

- a load/store architecture : this means that only registers can be used
in data manipulation operations, and not memory locations. Memory is
organised in a linear address space of 2^32 bits which use "big-endian"
organisation (the MSB is stored first) ; a word is 32 bits wide (a 16 bits
data is a halfword).

- a large number of registers : from 2 to 32 sets of 24 general purpose
registers are available ; these 24 registers are local registers %l[0-7],
in registers %i[0-7] and out registers %o[0-7], all working in an
overlapping windows mechanism that will be explained later. The SPARC
architecture also provides 8 global registers %g[0-7], 32 registers for
floating-point operations (%f[0-31]) and some specific registers (%pc, %sp,
%psr, %y,...).

- a small set of simple instructions : to avoid translation from machine code
to microcode, SPARC instructions are directly implemented in hardware, and
therefore are very basic (mainly load/store, logical, arithmetic, branching).
All instructions are 4 bytes long, and most of them use 3 registers (source1,
source2, destination in that order). Assemblers also provide a set of
synthetic instructions, which are more "coder friendly", but does not really
exist for the processor (and therefore must be carefully used). These
synthetic instructions have most of the time less operands, so that the
corresponding real instructions often use %g0, a read-only register stuck
at 0 ; here are some aliases :

synthetic instruction | real opcode
nop <=> sethi 0, %g0
ret <=> jmpl %i7+8, %g0
mov reg_or_imm, reg <=> or %g0, reg_or_imm, reg
cmp reg, reg_or_imm <=> subcc reg, reg_or_imm, %g0

Enough with theory, let's see some code.


SPARC assembly basics
-----------------------
Let's start with an in-season "hello world" style program : '!' is used
for single line comments, /* .. */ is used for multiple lines comments).

!8<------------------------------------------------------------------------
/* FILE : hello.s */

.section ".rodata" ! read-only initialised datas
.MyText: ! define our string label
.asciz "Happy new year %i \n"! define a null-terminated string
.Year:
.word 1999 ! define a word constant

.section ".text" ! read-only object code (instructions)
.global main ! Make function name globally visible

main:
save %sp, -112, %sp ! allocate space for stack
sethi %hi(.MyText), %o1 ! load higher part of string offset
or %o1, %lo(.MyText), %o0 ! add lower part of offset
set (.Year), %l1 ! get year address
ld [%l1], %o1 ! load year into %o1
call printf ! print the string
nop ! do nothing (BDS)

ret ! Return to caller
restore ! Restore register windows (BDS)
Endmain: ! Tell the linker how big the
.size main,(.-main) ! procedure is ("." is current address).
!8<-------------------------------------------------------------------------

Every procedure must save some memory space for itself ; this stack space
will be used to store the out and local registers and all the datas needed by
the procedure (the minimal space is 64 bytes for %o and %l registers). The
stack grows from higher to lower addresses, so that allocating a stack space
is implemented by substracting a value from the current stack pointer ; the
previous stack pointer is called the frame pointer (%fp).

Registers %o0 - %o5 are used to pass the first six parameters to a procedure,
because the current stack pointer (%sp) is stored in %o6 and the calling
program counter (%pc, used to calculate the return address) is stored in %o7.
If a procedure has more than six parameters, the remaining parameters are
passed using the stack space (eg for a caller's stack space of 92 bytes, the
child procedure can get the seventh parameter at [%fp+92]).

As I said before, all instructions are 32 bits long, so that you must use
two steps (with sethi and or) to load a 32 bits data. Note that %hi refers
to the most significant 22 bits and %lo refers to the least significant 10
bits of a register.

Like most RISC machines, a SPARC processor uses a branch delay slot (BDS) to
optimize pipeline efficiency : this means that by default, the instruction
following a branching is executed regardless of whether or not the branch is
taken. So the coder must move (when possible) an instruction from before the
branch to after the branch. Another possibility is to use the 'nop' instruction
or to add the ',a' suffix to the branch instruction, which annul the next
operation.


Calling and branching
-----------------------
Let's take another example to better illustrate the calling process : this
is a recursive implementation of the Fibonacci numbers, which are defined as :

fib(N) = fib(N-1) + fib(N-2)
fib(0) = fib(1) = 1

!8<--------------------------------------------------------------------------
/* FILE: fib.s */

.section ".rodata" ! read-only initialised datas (constants)
.align 8 ! datas must be double-words aligned
.MyText: ! define our string label
.asciz "Fib(%i) = %i \n" ! define a null-terminated string

! ------- FIB : handles F(0) and F(1) --------
.section ".text" ! read-only object code (instructions)
.align 4 ! code must be word-aligned (4 bytes)
.global fib ! Make function name globally visible
fib:
save %sp, -112, %sp ! save stack space
mov %i0, %o0 ! 1st parameter may be needed for calling
cmp %o0, 1 ! asked for F(0) of F(1) ?
ble F1orF0 ! yes : take the branch
mov 1, %i0 ! return value = 1 (BDS)

call fibcall !
nop ! do nothing (BDS)
mov %o0, %i0 ! return value = fibcall return value
F1orF0:
ret ! Return to caller
restore ! Restore register windows (BDS)
Endfib:
.size fib,(.-fib)

! ----- FIBCALL : calls F(N-1) and F(N-2) -----
.global fibcall ! Make function name globally visible
fibcall:
save %sp,-112,%sp ! save stack space
mov %i0,%l0 ! save N in %l0
call fib ! call F(N-1)
sub %l0,1,%o0 ! compute N-1 (BDS)

mov %o0,%i0 ! save result in %i0
call fib ! call F(N-2)
sub %l0,2,%o0 ! compute N-2 (BDS)

ret ! Return to caller
restore %i0,%o0,%o0 ! return F(N-1) + F(N-2) (BDS)
Endfibcall:
.size fibcall,(.-fibcall)

!-------- MAIN --------------------------------
.global main ! Make function name globally visible
main:
save %sp,-112,%sp ! save stack space
call fibcall ! calculate fib number 7
mov 7,%o0 ! (BDS)

mov %o0,%o2 ! result is second parameter
sethi %hi(.MyText),%o0 ! load higher part of string offset
or %o0,%lo(.MyText),%o0 ! add lower part of offset
call printf
mov 7,%o1 ! number is the first parameter (BDS)

ret ! Return to caller
restore ! Restore register windows (BDS)
Endmain:
.size main,(.-main)
!8<--------------------------------------------------------------------------

All procedures share the global registers (%g[0-7]), the remaining registers
%l[0-7], %i[0-7], %o[0-7] constitute the register window. When a procedure
starts execution, it allocates 16 registers (input and local), the output
registers are overlapped with the subroutine's input registers. Here's what
happens if procedure A calls procedure B which calls procedure C :

proc A in | local | out |
| |
proc B | in | local | out |
| |
proc C | in | local | out


As you see, for each procedure, the parameters passed to and received from a
subroutine are stored in %o registers. The same way, the parameters taken from
and passed to the calling procedure are stored in %i registers. The current
window pointer (CWP) identifies the current register window : it is stored in
the least significant 5 bits of %psr, and is modified by the 'save' and
'restore' commands.

The condition code register (another part of %psr) contains four flags : Z
(zero), N (negative), C (carry), and V (overflow) ; contrary to x86 assembly,
these bits are not updated by standard arithmetic operations, but by special
instructions (with 'cc' suffix, like 'cmp' which is in fact a 'subcc').
For instance, you have these equivalences :

SPARC x86
subcc r1, r2, r1 <=> sub r1, r2 ; keep result and flags
subcc r1, r2, %g0 <=> cmp r1, r2 ; discard result, keep flags
sub r1, 0, r2 <=> mov r2, r1 ; keep result, discard flags


As last example, here's a simple anti-cracking method : a checksum on
our own code :

!8<-----------------------------------------------------------------------
/* FILE: cksum.s */

.section ".data" ! read-only initialised datas (constants)
.align 8 ! datas must be double-words aligned
.CRCError:
.asciz "Wrong CRC !! \n" !

.section ".text" ! read-only object code (instructions)
.align 4 ! code must be word-aligned (4 bytes)
cksum:
save %sp,-64,%sp ! save minimal stack space
mov %i0, %l0 ! %l0 is the base address
sub %i1, %i0, %l1 ! %l1 is the decreasing index
mov %g0, %l2 ! %l2 is the running sum
loop:
ld [%l0+%l1], %o0 ! fetch the next element
add %l2, %o0, %l2 ! add it to the running sum
subcc %l1, 4, %l1 ! one fewer element

bge,a loop ! if %o0 >= 0 get next element
! (delay slot result is annulled)
mov %l2, %o0 ! store the result in sum (BDS)

ret ! Return to caller
restore ! Restore register windows (BDS)
endcksum: ! Tell the linker how big the
.size cksum,(.-cksum) ! procedure is.

!-------------------- MAIN -------------------------
.global main
main:
save %sp,-64,%sp ! allocate space for stack
set main, %o0 ! start address for cksum
sethi %hi(EndOfCRCZone),%o1 ! high part of end address
call cksum ! calculate cksum
or %o1,%lo(EndOfCRCZone),%o1 ! low part of end address (BDS)

EndOfCRCZone:
set 0x10954, %o1 ! load precalculated checksum
cmp %o0, %o1 ! is checksum correct ?
be End ! yes : exit
nop ! do nothing (BDS)

Error: ! no : display message
sethi %hi(.CRCError ),%o0 ! load higher part of string offset
call printf ! print the string
or %o0,%lo(.CRCError),%o0 ! add lower part of string offset (BDS)

End:
ret ! Return to caller
restore ! Restore register windows (BDS)

EndMain: ! Tell the linker how big the
.size main,(.-main) ! procedure is.
!8<-----------------------------------------------------------------------


Tools and references
----------------------
Here are the tools I use when I play with SPARC assembly ; some are SunOS
specific tools, some are multi-platforms ones. The code you read here has
been tested on various Sun workstations (using SPARC and UltraSPARC
processors). With very little modifications, it also worked on ISEM, a SPARC
emulator for Linux (see below).

- assembling : I use gcc for that job, which itself uses as and ld (assembler
and linker) to create ELF executables. CC and cc also work well ; these 3
compilers can also be used to generate ASM source code from C source code
with the "-S" option, which is IMHO a great method to learn assembly on
a new platform.

- debugging : I use adb, which is very basic but also very powerful, but gdb
and dbx may also work.

- reversing : all the previous tools are useful ; I also use a disassembler
(SunOS dis), but some exist for other platforms (see Bruce Ediger's
homepage).


If you plan to give a try to SPARC assembly, here are some links :

http://www.cs.unm.edu/~maccabe/classes/341/labman/labman.html
ISEM (Instructional Sparc EMulator) homepage.

http://www.csn.net/~bediger/
Bruce Ediger Homepage

http://www.cs.earlham.edu/~mutioke/cs63/
a good introduction to SPARC with plenty of links.

http://www.sics.se/~psm/sparcstack.html
a very good overview of SPARC stack and registers.


Final Words
------------
If you use a SPARC based machine, give a try to assembly, it's quite fun.
If not, remember that the best you know your processor, the best you
can code ASM.


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
Extending NASM
by mammon_


Programmers transitioning to NASM from a commercial assembler such as MASM or
TASM immediately notice the lack of any high-level language structures -- the
assembly syntax accepted by NASM is only slightly more sophisticated than what
you would find in a debugger. While this has its good side --smaller code size,
nothing hidden from the programmer-- it does make coding a bit more tedious.

For this reason NASM comes with a preprocessor that is both simple and powerful;
by writing NASM macros, the high-level functionality of other assemblers can be
emulated rather easily. As thw following macros will demonstrate, most of the
high-level asm features in commercial assemblers really do not do anything very
elaborate; they simply are more convenient for the programmer.

The macros that I will detail below provide some basic C and ASM constructs for
use in NASM. I have made the complete file available at
http://www.eccentrica.org/Mammon/macros.asm
The macro file can be included in a .asm file with the NASM directive
%INCLUDE "macros.asm"
Comments on the usage of each macro are included in the file.

Macro Basics
------------
The fundamenal structure of a NASM macro is
%macro {macroname} {# parameters}
%endmacro
The actual code resides on the line between the %macro and %endmacro tags; this
code will be inserted into your program wherever NASM finds {macroname}. Thus
you could create a macro to push the contents of each register such as:
%macro SAVE_REGS 0
push eax
push ebx
push ecx
push edx
%endmacro
Once you have defined this macro, you can use it in your code like:
SAVE_REGS
call ReadFile
...which the preprocessor will expand to
push eax
push ebx
push ecx
push edx
call ReadFile
before assembling. It should be noted that all preprocessing takes place in a
single stage immediately before compiling starts; to preview what the pre-
processor will send to the assembler, you can invoke nasm with the -e option.

The %macro tag requires that you declare the number of paramters that will be
passed to the macro. This can be a single number or a range, with a few quirks:
%macro LilMac 0 ; takes 0 arguments
%macro LilMac 5 ; takes 5 arguments
%macro LilMac 0-3 ; takes 0-3 arguments
%macro LilMac 1-* ; takes 1 to unlimited arguments
%macro LilMac 1-2+ ; takes 1-2 arguments
%macro LilMac 1-3 0, "OK" ; takes 1-3 arguments, 2-3 default to 0 & "OK"
The last three examples bear some explanation. The "-*" operator in the %macro
tag specifies that the macro can handle any number of parameters; in other
words, there is no maximum number, and the minimum is whatever number is to the
left of the "-*" operator. The "+" operator means that any additional arguments
will be appended to the last argument instead of causing an error, so that:
LilMac 0, OK, This argument is one too many
will result in argument 1 being 0 and argument 2 being "OK, This argument is
one too many."
Note that this is a good way to pass commas as part of an argu-
ment (normally they are only separators). Providing defualt arguments after the
number of arguments allows a macro to be called with fewer arguments than it
expects.
%macro SAVE_VARS 1-4 ecx, ebx, eax
will fill a missing 4th argument with eax, 3rd with ebx, and 2nd with ecx. Note
that you have to provide defaults starting with the last argument and working
backwards.

The parameters to the macro are available as %1 for the first argument, %2 for
the second, and so on, with %0 containing a count of all the arguments. There
is an equivalent to the DOS "SHIFT" command called %rotate which will rotate
the parameters to either the left or to the right depending on whether a
positive or negative value was supplied:
Before: %1 %2 %3 %4 Before: %1 %2 %3 %4 Before: %1 %2 %3 %4
%rotate 1 %rotate -1 %rotate 2
After: %4 %1 %2 %3 After: %2 %3 %4 %1 After: %3 %4 %1 %2
So that rotating by 1 will put the value at %1 into %4, and rotating by -1 will
put the value of %1 into %2.


High-Level Calls
----------------
Perhaps the buggest complaint about NASM is its primitive call syntax. In MASM
and TASM, the parameters to a call may be appended to the call itself:
call MessageBox, hOwner, lpszText, lpszTitle, fuStyle
where in NASM the parameters must be pushed onto the stack prior to the call:
push fuStyle
push lpszTitle
push lpszText
push hOwner
call MessageBox
Using NASM's "-*" macro feature along with the %rep directive make a high-level
call easy to replicate:
%macro call 2-*
%define _func %1
%rep &0-1
%rotate 1
push %1
%endrep
call _func
%endmacro
The %define directive simply defines the variable _func [underscores should
prefix variable names in macros so you do not mistakenly use the same name
later in the program] as %1, the name of the function to call. The %rep and
%endrep directives enclose the instructions to be repeated, and %rep takes as a
parameter the number of repetitions [in this case set to the number of macro
parameters minus 1]. Thus, the above macro cycles through the arguments to call
and pushes them last-argument first [C syntax] before making the call.

Overloading an existing instruction such as call will cause warnings at compile
time [remember, the preprocessor thinks you are doing a recursive macro invoke]
so usually you will want to name the macro "c_call" or something similar. The
following macros provide facilities for C, Pascal, fastcall, and stdcall call
syntaxes.
;==============================================================-High-Level Call
; ccall FuncName, param1, param2, param 3... ;Pascal: 1st-1st, no clean
; pcall FuncName, param1, param2, param 3... ;C: Last-1st, stack cleanup
; stdcall FuncName, param1, param2, param 3... ;StdCall: last-1st, no clean
; fastcall FuncName, param1, param2, param 3... ;FastCall: registers/stack
%macro pcall 2-*
%define _j %1
%rep %0-1
%rotate -1
push %1
%endrep
call _j
%endmacro

%macro ccall 2-*
%define _j %1
%assign __params %0-1
%rep %0-1
%rotate -1
push %1
%endrep
call _j
%assign __params __params * 4
add esp, __params
%endmacro

%macro stdcall 2-*
%define _j %1
%rep %0-1
%rotate -1
push %1
%endrep
call _j
%endmacro

%macro fastcall 2-*
%define _j %1
%assign __pnum 1
%rep %0-4
%rotate -1
%if __pnum = 1
mov eax, %1
%elif __pnum = 2
mov edx, %1
%elif __pnum = 3
mov ebx, %1
%else
push %1
%endif
%assign __pnum __pnum+1
%endrep
call _j
%endmacro
;==========================================================================-END


Switch-Case Blocks
------------------
One of the most awkward C constructs to code in assembly is the SWITCH-CASE
block. It is also rather difficult to re-create as a macro due to variable
number and length of CASE statements.

NASM's preprocessor has a context stack which allows you to create a set of
local variables and addresses which is specific to a particular invocation of a
macro. Thus it becomes possible to refer to labels which will be created in a
future macro by giving them context-dependent names:
%macro MacPart1 0
%push mac ;create a context called "mac"
jmp %$loc ;jump to context-specific label "loc"
%endmacro

%macro MacPart2 0
%ifctx mac ;if we are in context 'mac'
%$loc: ;define label 'loc'
xor eax, eax ;code at this label...
ret
%endif ;end the if block
%pop ;destroy the 'mac' context
%endmacro
As you can see, the context is created and named with a %push directive, and
destroyed with a $pop directive. NASM has a number of preprocessor conditional
IF/ELSE statements; in the above example, the %ifctx [if current context equals]
directive is used to determine if a 'mac' context has been created [Note that
the 'base' NASM conditionals include %if, %elif, %else, and %endif; these carry
over to the %ifctx directive, such that there is available %ifctx, %ifnctx,
%elifctx, %elifnctx, %else, and %endif; all %if directives must be closed with
an %endif directive]. Finally, %$ is used to prefix the name of a context-
specific variable or label. Non-context-specific local labels use the %% prefix:
%macro LOOP_XOR
%%loop:
pop eax
xor eax, ebx
test eax, eax
jnz %%loop
%endmacro

The SWITCH-CASE macro that follows uses the syntax:
SWITCH Variable
CASE Int
BREAK
CASE Int
BREAK
DEFAULT
ENDSWITCH
Which could be implemented as follows:
card db 0 ;card_variable
Jack EQU 11
Queen EQU 12
King EQU 13
...
SWITCH card
CASE Jack
add edx, Jack
BREAK
CASE Queen
add edx, Queen
BREAK
CASE King
add edx, King
BREAK
DEFAULT
add d, [card]
ENDSWITCH
Note that SWITCH moves the variable into eax and CASE moves the value into ebx.
;===========================================================-SWITCH-CASE Blocks
%macro SWITCH 1
%push switch
%assign __curr 1
mov eax, %1
jmp %$loc(__curr)
%endmacro

%macro CASE 1
%ifctx switch
%$loc(__curr):
%assign __curr __curr+1
mov ebx, %1
cmp eax, ebx
jne %$loc(__curr)
%endif
%endmacro

%macro DEFAULT 0
%ifctx switch
%$loc(__curr):
%endif
%endmacro

%macro BREAK 0
jmp %$endswitch
%endmacro

%macro ENDSWITCH 0
%ifctx switch
%$endswitch:
%pop
%endif
%endmacro
;==========================================================================-END


If-Then Blocks
--------------
While the preprocessor provides support for if-then directives, it is a slight
bit of work to cause that to generate the equivalent assembly language 'if'
code [ the preprocessor 'if' is resolved before compile time, not at run time].
Using macros, you can create if-then blocks with the following structure:
IF Value, Cond, Value
;if code here
ELSIF Value, Cond, Value
;else-if code here
ELSE
;else code here
ENDIF
An example being:
IF [Passwd], e, [GoodVal] ;e == equals or je
jmp Registered
ELSE
jmp FormatHardDrive
ENDIF
The trickiest part about this macro sequence is the 'Cond' parameter. NASM
allows condition codes [the 'cc' in 'jcc' that you findin opcode refs] to be
passed to macros; these condition codes are simply the 'jcc' with the 'j' cut
off -- 'jnz' becomes 'nz', 'jne' becomes 'ne', 'je' becomes 'e', and so on.
The reason for this is that the condition code is appended to a 'j' later in
the macro:
%macro Jumper %1 %2 %3 ;JUMPER Reg1, cc, Reg2
cmp %1, %3
j%+2 Gotcha
jmp error
%endmacro
The above code appends %2 to the 'j' with the directive j%+2. Note that if you
use j%- instead of j%+, NASM will insert the *inverse* condition code, so that
jz becomes jnz, etc. For example, calling the macro
%macro Jumper2 %1
j%-1 JmpHandler
%endmacro
with the invocation 'Jumper2 nz' would assemble the code 'jz JmpHandler'.

The condition codes can be a bit tricky to work with; it is advisable to add a
sequence such as the following to the macro file:
%define EQUAL e
%define NOTEQUAL ne
%define G-THAN g
%define L-THAN l
%define G-THAN-EQ ge
%define L-THAN-EQ le
%define ZERO z
%DEFINE NOTZERO nz
so that you could call the IF macro as follows:
IF PassWd, EQUAL, GoodVal
;if code here
...etc etc. Note also that the IF-THEN-ELSE macros put the passed values into
eax and ebx for compatison, so these registers will need to be preserved.

;===========================================================-IF-THEN-ELSE Loops
%macro IF 3
%push if
%assign __curr 1
mov eax, %1
mov ebx, %3
cmp eax, ebx
j%+2 %%if_code
jmp %$loc(__curr)
%%if_code:
%endmacro

%macro ELSIF 3
%ifctx if
jmp %$end_if
%$loc(__curr):
%assign __curr __curr+1
mov eax, %1
mov ebx, %3
cmp eax, ebx
j%+2 %%elsif_code
jmp %$loc(__curr)
%%elsif_code:
%else
%error "'ELSIF' can only be used following 'IF'"
%endif
%endmacro

%macro ELSE 0
%ifctx if
jmp %$end_if
%$loc(__curr):
%assign __curr __curr+1
%else
%error "'ELSE' can only be used following an 'IF'"
%endif
%endmacro

%macro ENDIF 0
%$loc(__curr):
%$end_if:
%pop
%endmacro
;==========================================================================-END

For/While Loops
---------------
The DO...FOR and DO...WHILE do nothing differnet from the previous macros, but
are simply a different application of the same principles. The syntax for
calling these macros is:
DO
;code to do here
FOR min, Cond, max, step

DO
;code to do here
WHILE variable, Cond, value
It is perhaps easiest to illustrate this by comparing the macros with C code.
for( x = 0; x <= 100; x++) { SomeFunc() }
Equates to:
DO
call SomeFunc
FOR 0, l, 100, 1
Likewise,
for( x = 0; x != 100; x--) { SomeFunc() }
Equates to:
DO
call SomeFunc
FOR 0, e, 100, -1
The WHILE macro is similar:
while( CurrByte != BadAddr) {SomeFunc() }
Equates to:
DO
call SomeFunc
WHILE CurrByte, ne, BadAddr
Once again, eax and ebx are used in the FOR and WHILE macros.

;====================================================-DO-FOR and DO-WHILE Loops
%macro DO 0
%push do
jmp %$init_loop
%$start_loop:
push eax
%endmacro

%macro FOR 4
%ifctx do
pop eax
add eax, %4
cmp eax, %3
j%-2 %%end_loop
jmp %$start_loop
%$init_loop:
mov eax, %1
jmp %$start_loop
%%end_loop:
%pop
%endif
%endmacro

%macro WHILE 3
%ifctx do
pop eax
mov ebx, %3
cmp eax, ebx
j%+2 %%end_loop
jmp %$start_loop
%$init_loop:
mov eax, %1
jmp %$start_loop
%%end_loop:
%pop
%endif
%endmacro
;==========================================================================-END


Data Declarations
-----------------
Declaring data is relatively simple in assembly, but sometimes it helps to make
code more clear if you create macros that assign meaningful data types to
variables, even if those macros simply resolve to a DB or a DD. The following
macros demonstrate this concept. They are invoked as follows:
CHAR Name, String ;e.g. CHAR UserName, "Joe User"
INT Name, Byte ;e.g. INT Timeout, 30
WORD Name, Word ;e.g. WORD Logins
DWORD Name, Dword ;e.g. DWORD Password
Note that when invoked with a name but not a value, these macros create empty
[DB 0] variables.
;============================================================-Data Declarations
%macro CHAR 1-2 0
%1: DB %2,0
%endmacro

%macro INT 1-2 0
%1: DB %2
%endmacro

%macro WORD 1-2 0
%1: DW %2
%endmacro

%macro DWORD 1-2 0
%1: DD %2
%endmacro
;==========================================================================-END

Procedure Declarations
----------------------
Procedure declarations are another matter of convenience. It is often useful in
your code to clearly delineate the start and end of a procedure; each of the
PROC macros below does that, as well as creating a stack fram for the procedure.
The ENTRYPROC macro creates a procedure named 'main' and declares main as a
global symbol; the standard PROC declares the provided name as global. These
macros can be used as follows:
PROC ProcName Parameter1, Parameter2, Parameter3
;procedure code here
ENDP

ENTRYPROC
;entry-procedure code here
ENDP
Note that the Parameters to PROC are set up to EQU to offsets from ebp, e.g.
ebp-4, ebp-8, etc. I have also included support for local variables, which
will EQU to positive offsets from ebp' these may be used as follows:
PROC ProcName Parameter1, Parameter2, Parameter3...
LOCALDD Dword_Variable
LOCALDW Word_Variable
LOCALDB Byte_Variable
;procedure code here
ENDP

;=======================================================-Procedure Declarations
%macro PROC 1-9
GLOBAL %1
%1:
%assign _i 4
%rep %0-1
%2 equ [ebp-_i]
%assign _i _i+4
%rotate 1
%endrep
push ebp
mov ebp, esp
%push local
%assign __ll 0
%endmacro

%macro ENDP 0
%ifctx local
%pop
%endif
pop ebp
%endmacro

%macro ENTRYPROC 0
PROC main
%endmacro

%macro LOCALVAR 1
sub esp, 4
%1 equ [ebp + __ll]
%endmacro

%macro LOCALDB 1
%assign __ll __ll+1
LOCALVAR %1
%endmacro

%macro LOCALDW 1
%assign __ll __ll+2
LOCALVAR %1
%endmacro

%macro LOCALDD 1
%assign __ll __ll+4
LOCALVAR %1
%endmacro
;==========================================================================-END

Further Extension
-----------------
Continued experimentation will of course prove fruitful. It is recommended that
you read/print out chapter 4 of the NASM manual for reference. In addition, it
is very helpful to test your macros by cpmpiling the source with "nasm -e",
which will output the preprocessed source code to stdout and will not compile
the program.


____________________________________________________________________________
______ _____. ____ ```
._____/\______.________._________. ._\___ |__\_ /. \\
| | | _ | | (_ | __/ |CE ,
.=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
'===============| :==================================================='
NASM specific Win32 coding
by Tamas Kaproncai


Contents
========
0. Preface
1. Compiling
2. Include files
3. Library files
4. Importing API functions
5. Calling API functions
6. WinMain
7. Window procedure
8. Sections
9. Self modification


0. Preface
==========
I will introduce the win32 coding and I will focus on the NASM specific part.

Downloadable working examples:
ftp://ftp.szif.hu/pub/demos/tool/w32nasm.zip
http://rs1.szif.hu/~tomcat/win32

There is another tutorial on this topic, called:
"The Win32 NASM Coding Toolkit v0.02 by Gij"
that uses the LCC linker and the resource compiler which comes with LCC.


1. Compiling
============
I'm working with the following free programs in connection with NASM:
- linker: ALINK v1.5 by Anthony A.J. Williams.
- resource compiler: GoRC v0.50b by Jeremy Gordon.

The process of compiling a win32 program involves a number of steps which can
be divided into three main processes: preparing the include files, preparing
the library files, and writing the actual program.

The compiling flow chart
------------------------
.h -> ? -> .inc
\
.asm -> NASM -> .obj
\
.rc -> GORC -> .res -> ALINK -> .exe
/
.dll -> IMPLIB -> .lib (? means handwork)


2. Include files
================
The include files (*.inc) must be generated from existing header files (*.h)
that come with win32-compatible C or Pascal compilers. Files needed:
WIN32N.INC (Thanks for the inital MASM version to S.L.Hutchesson).
The compiler will be NASM version 0.97
http://www.cryogen.com/nasm
Usage: nasmw -fobj -w+orphan-labels -pwin32n.inc %1.asm


3. Library files
================
Files Needed:
WIN32.LIB
The linker will be ALINK
http://www.geocities.com/SiliconValley/Network/4311/#alink
Usage: alink -oPE %1 win32.lib %1.res %2 %3

More lib files can be created with IMPLIB.
Example: IMPLIB DDRAW.DLL


4. Importing API functions
==========================
EXTERN MessageBoxA
IMPORT MessageBoxA use32.dll


5. Calling API functions
========================
PUSH UINT MB_OK
PUSH LPCTSTR title1
PUSH LPCTSTR string1
PUSH HWND NULL
CALL [MessageBoxA]


6. WinMain
==========
You don't need to use the name, WinMain:
You must start the program with the label, ..start:

At the begening there is nothing special in the stack, so you should call
GetModuleHandleA for hInstance and GetCommandLineA for the command line.
(Command line consists the full path, the file name and the parameters).

You can exit the program with: RETN
or you should call the ExitProcess function:
PUSH UINT 0 ; the error code
CALL [ExitProcess]


7. Window procedure
===================
There are four parameters on the top of the stack:
PUSH EBP
MOV EBP,ESP
%DEFINE hwnd EBP+8 ;handle of window
%DEFINE message EBP+12 ;message
%DEFINE wParam EBP+16 ;first message parameter
%DEFINE lParam EBP+20 ;second message parameter

You can handle the messages depends on WPARAM [wParam]
and the rest you can pass to DefWindowProcA:
PUSH LPARAM [lParam]
PUSH WPARAM [wParam]
PUSH UINT [message]
PUSH HWND [hwnd]
CALL [DefWindowProcA]
POP EBP
RETN 16


8. Sections
===========
You need a code section:
SECTION CODE USE32 CLASS=CODE
and a data section:
SECTION DATA USE32 CLASS=DATA

You don't need bss section, instead of you should append
every RESB, RESW, RESD, RESQ to the end of the source code.
This zero data not will be included to the exe file.


9. Self modification
====================
You can include your code and data together in one section:
SECTION CODE USE32 CLASS=CODE

In that case you need another object file, with only one line source:
SECTION CODE USE32 CLASS=DATA

ALINK will combine the properties of these two sections.

EXTERN MessageBoxA
EXTERN ExitProcess

SECTION CODE USE32 CLASS=CODE
..start:

PUSH UINT MB_OK
PUSH LPCTSTR title1
PUSH LPCTSTR string1
PUSH HWND NULL
CALL MessageBoxA

PUSH UINT NULL
CALL ExitProcess

SECTION DATA USE32 CLASS=DATA
string1: db 'Hello world!',13,10,0
title1: db 'Hello',0



____________________________________________________________________________
______ _____. ____ ```
._____/\______.________._________. ._\___ |__\_ /. \\
| | | _ | | (_ | __/ |CE ,
.=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
'===============| :==================================================='
More about Text
by Iczelion


We will experiment more with text attributes, ie. font and color.

Preliminary:
------------

Windows color system is based on RGB values, R=red, G=Green, B=Blue. If you
want to specify a color in Windows, you must state your desired color in
terms of these three major colors. Each color value has a range from 0 to
255 (a byte value). For example, if you want pure red color, you should use
255,0,0. Or if you want pure white color, you must use 255,255,255. You can
see from the examples that getting the color you need is very difficult
with this system since you have to have a good grasp of how to mix and
match colors.

For text color and background, you use SetTextColor and SetBkColor, both of
them require a handle to device context and a 32-bit RGB value. The 32-bit
RGB value's structure is defined as:

RGB_value struct
unused db 0
blue db ?
green db ?
red db ?
RGB_value ends

Note that the first byte is not used and should be zero. The order of the
remaining three bytes is reversed,ie. blue, green, red. However, we will
not use this structure since it's cumbersome to initialize and use. We will
create a macro instead. The macro will receive three parameters: red, green
and blue values. It'll produce the desired 32-bit RGB value and store it in
eax. The macro is as follows:

RGB macro red,green,blue
xor eax,eax
mov ah,blue
shl eax,8
mov ah,green
mov al,red
endm

You can put this macro in the include file for future use.

You can "create" a font by calling CreateFont or CreateFontIndirect. The
difference between the two functions is that CreateFontIndirect receives
only one parameter: a pointer to a logical font structure, LOGFONT.
CreateFontIndirect is the more flexible of the two especially if your
programs need to change fonts frequently. However, in our example, we will
"create" only one font for demonstration, we can get away with CreateFont.
After the call to CreateFont, it will return a handle to a font which you
must select into the device context. After that, every text API function
will use the font we have selected into the device context.


Content:
--------

Below is our source code:
;======================================================================TEXT.ASM
include windows.inc
includelib user32.lib
includelib kernel32.lib
includelib gdi32.lib

RGB macro red,green,blue
xor eax,eax
mov ah,blue
shl eax,8
mov ah,green
mov al,red
endm

.data
ClassName db "SimpleWinClass",0
AppName db "Our First Window",0
TestString db "Win32 assembly is great and easy!",0
FontName db "script",0

.data?
hInstance HINSTANCE ?
CommandLine LPSTR ?

.code
start:
invoke GetModuleHandle, NULL
mov hInstance,eax
invoke GetCommandLine
invoke WinMain, hInstance,NULL,CommandLine, SW_SHOWDEFAULT
invoke ExitProcess,eax

WinMain proc
hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:SDWORD
LOCAL wc:WNDCLASSEX
LOCAL msg:MSG
LOCAL hwnd:HWND
mov wc.cbSize,SIZEOF WNDCLASSEX
mov wc.style, CS_HREDRAW or CS_VREDRAW
mov wc.lpfnWndProc, OFFSET WndProc
mov wc.cbClsExtra,NULL
mov wc.cbWndExtra,NULL
push hInstance
pop wc.hInstance
mov wc.hbrBackground,COLOR_WINDOW+1
mov wc.lpszMenuName,NULL
mov wc.lpszClassName,OFFSET ClassName
invoke LoadIcon,NULL,IDI_APPLICATION
mov wc.hIcon,eax
mov wc.hIconSm,0
invoke LoadCursor,NULL,IDC_ARROW
mov wc.hCursor,eax
invoke RegisterClassEx, addr wc
invoke CreateWindowEx,NULL,ADDR ClassName,ADDR AppName,\
WS_OVERLAPPEDWINDOW,CW_USEDEFAULT,\
CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,NULL,NULL,\
hInst,NULL
mov hwnd,eax
invoke ShowWindow, hwnd,SW_SHOWNORMAL
invoke UpdateWindow, hwnd
.WHILE TRUE
invoke GetMessage, ADDR msg,NULL,0,0
.BREAK .IF (!eax)
invoke TranslateMessage, ADDR msg
invoke DispatchMessage, ADDR msg
.ENDW
mov eax,msg.wParam
ret
WinMain endp

WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
LOCAL hdc:HDC
LOCAL ps:PAINTSTRUCT
LOCAL hfont:HFONT

mov eax,uMsg
.IF eax==WM_DESTROY
invoke PostQuitMessage,NULL
.ELSEIF eax==WM_PAINT
invoke BeginPaint,hWnd, ADDR ps
mov hdc,eax
invoke CreateFont,24,16,0,0,400,0,0,0,OEM_CHARSET,\

OUT_DEFAULT_PRECIS,CLIP_DEFAULT_PRECIS,\
DEFAULT_QUALITY,DEFAULT_PITCH or
FF_SCRIPT,\
ADDR FontName
invoke SelectObject, hdc, eax
mov hfont,eax
RGB 200,200,50
invoke SetTextColor,hdc,eax
RGB 0,0,255
invoke SetBkColor,hdc,eax
invoke TextOut,hdc,0,0,ADDR TestString,SIZEOF TestString
invoke SelectObject,hdc, hfont
invoke EndPaint,hWnd, ADDR ps
.ELSE
invoke DefWindowProc,hWnd,uMsg,wParam,lParam
ret
.ENDIF
xor eax,eax
ret
WndProc endp

end start
;===========================================================================EOF

Let's begin our analysis : )

invoke CreateFont,24,16,0,0,400,0,0,0,OEM_CHARSET,\
OUT_DEFAULT_PRECIS,CLIP_DEFAULT_PRECIS,\
DEFAULT_QUALITY,DEFAULT_PITCH or FF_SCRIPT,\
ADDR FontName

CreateFont creates a logical font that is the closest match to the given
parameters and the font data available. This function has more parameters
than any other function in Windows. It returns a handle to logical font to
be used by SelectObject function. We will examine its parameters in detail.

HFONT CreateFont(int nHeight, int nWidth, int nEscapement, int
nOrientation, int nWeight, BYTE cItalic, BYTE cUnderline, BYTE cStrikeOut,
BYTE cCharSet, BYTE cOutputPrecision, BYTE cClipPrecision, BYTE cQuality,
BYTE cPitchAndFamily, LPSTR lpFacename);

nHeight --> The desired height of the characters . 0 means use default size.
nWidth --> The desired width of the characters. Normally this value should be
0 which allows Windows to match the width to the height. However, in our
example, the default width makes the characters hard to read, so I use the
width of 16 instead.
nEscapement --> Specifies the orientation of the next character output
relative to the previous one in tenths of a degree. Normally, set to 0. Set
to 900 to have all the characters go upward from the first character, 1800
to write backwards, or 2700 to write each character from the top down.
nOrientation --> Specifies how much the character should be rotated when
output in tenths of a degree. Set to 900 to have all the characters lying
on their backs, 1800 for upside-down writing, etc.
nWeight --> Sets the line thickness of each character. Windows defines the
following sizes:

FW_DONTCARE equ 0
FW_THIN equ 100
FW_EXTRALIGHT equ 200
FW_ULTRALIGHT equ 200
FW_LIGHT equ 300
FW_NORMAL equ 400
FW_REGULAR equ 400
FW_MEDIUM equ 500
FW_SEMIBOLD equ 600
FW_DEMIBOLD equ 600
FW_BOLD equ 700
FW_EXTRABOLD equ 800
FW_ULTRABOLD equ 800
FW_HEAVY equ 900
FW_BLACK equ 900

cItalic --> 0 for normal, any other value for italic characters.
cUnderline --> 0 for normal, any other value for underlined characters.
cStrikeOut --> 0 for normal, any other value for characters with a line
through the center.
cCharSet --> The character set of the font. Normally should be OEM_CHARSET
which allows Windows to select font which is operating system-dependent.
cOutputPrecision --> Specifies how much the selected font must be closely
matched to the characteristics we want. Normally should be
OUT_DEFAULT_PRECIS which defines default font mapping behavior.
cClipPrecision --> Specifies the clipping precision. The clipping precision
defines how to clip characters that are partially outside the clipping
region. You should be able to get by with CLIP_DEFAULT_PRECIS which defines
the default clipping behavior.
cQuality -->Specifies the output quality. The output quality defines how
carefully GDI must attempt to match the logical-font attributes to those of
an actual physical font. There are three choices: DEFAULT_QUALITY,
PROOF_QUALITY and DRAFT_QUALITY.
cPitchAndFamily --> Specifies pitch and family of the font. You must combine
the pitch value and the family value with "or" operator.
lpFacename A pointer to a null-terminated string that specifies the
typeface of the font.

The description above is by no means comprehensive. You should refer to
your Win32 API reference for more details.

invoke SelectObject, hdc, eax
mov hfont,eax

After we get the handle to the logical font, we must use it to select the
font into the device context by calling SelectObject. SelectObject puts the
new GDI objects such as pens, brushs, and fonts into the device context to
be used by GDI functions. It returns the handle to the replaced object
which we should save for future SelectObject call. After SelectObject call,
any text output function will use the font we just selected into the device
context.

RGB 200,200,50
invoke SetTextColor,hdc,eax
RGB 0,0,255
invoke SetBkColor,hdc,eax

Use RGB macro to create a 32-bit RGB value to be used by SetColorText and
SetBkColor.

invoke TextOut,hdc,0,0,ADDR TestString,SIZEOF TestString

Call TextOut function to draw the text on the client area. The text will be
in the font and color we specified previously. The syntax of TextOut is as
follows:

BOOL TextOut(

HDC hdc, // handle of device context
int nXStart, // x-coordinate of starting position
int nYStart, // y-coordinate of starting position
LPCTSTR lpString, // address of string
int cbString // number of characters in string
);

invoke SelectObject,hdc, hfont

When we are through with the font, we should restore the old font back into
the device context. You should always restore the object that you replaced
in the device context.


____________________________________________________________________________
______ _____. ____ ```
._____/\______.________._________. ._\___ |__\_ /. \\
| | | _ | | (_ | __/ |CE ,
.=|_____/\______| |----)____| |______|______|======[ Win 32 ASM ]===.
'===============| :==================================================='
Keyboard Input
by Iczelion


We will learn how a Windows program receives keyboard input.

Preliminiary:
------------

Since there's only one keyboard in each PC, all running Windows programs
must share it between them. Windows is responsible for sending the key
strokes to the window which has the input focus.

Although there may be several windows on the screen, only one of them has
the input focus. The window which has input focus is the only one which can
receive key strokes. You can differentiate the window which has input focus
from other windows by looking at the title bar which is highlighted.
Actually, there are two main types of keyboard message. You can view a
keyboard as a group of keys. For example, if you press the "a" key, Windows
sends a WM_KEYDOWN message to the window which has input focus, notifying
that a key is pressed. When you release the key, Windows sends a WM_KEYUP
message. In this case, you treat a key as a button. Another way to look at
the keyboard is that it's a character input device. When you press "a" key,
Windows sends a WM_CHAR message to the window which has input focus,
telling it that the user sends "a" character to it. In fact, Windows sends
WM_KEYDOWN, WM_CHAR, and WM_KEYUP messages to the window which has input
focus. The window procedure may decide to process all three messages or
only the messages it's interested in. Most of the time, you can ignore
WM_KEYDOWN and WM_KEYUP since TranslateMessage function call in the message
loop translate WM_KEYDOWN and WM_KEYUP messages to a WM_CHAR message. We
will focus on WM_CHAR in this tutorial.


Content:
-------
;=======================================================================KEY.ASM
include windows.inc
includelib user32.lib
includelib kernel32.lib
includelib gdi32.lib

.data
ClassName db "SimpleWinClass",0
AppName db "Our First Window",0
char WPARAM 20h ; the character the program receives from keyboard

.data?
hInstance HINSTANCE ?
CommandLine LPSTR ?

.code
start:
invoke GetModuleHandle, NULL
mov hInstance,eax
invoke GetCommandLine
invoke WinMain, hInstance,NULL,CommandLine, SW_SHOWDEFAULT
invoke ExitProcess,eax

WinMain proc
hInst:HINSTANCE,hPrevInst:HINSTANCE,CmdLine:LPSTR,CmdShow:SDWORD
LOCAL wc:WNDCLASSEX
LOCAL msg:MSG
LOCAL hwnd:HWND
mov wc.cbSize,SIZEOF WNDCLASSEX
mov wc.style, CS_HREDRAW or CS_VREDRAW
mov wc.lpfnWndProc, OFFSET WndProc
mov wc.cbClsExtra,NULL
mov wc.cbWndExtra,NULL
push hInstance
pop wc.hInstance
mov wc.hbrBackground,COLOR_WINDOW+1
mov wc.lpszMenuName,NULL
mov wc.lpszClassName,OFFSET ClassName
invoke LoadIcon,NULL,IDI_APPLICATION
mov wc.hIcon,eax
mov wc.hIconSm,0
invoke LoadCursor,NULL,IDC_ARROW
mov wc.hCursor,eax
invoke RegisterClassEx, addr wc
invoke CreateWindowEx,NULL,ADDR ClassName,ADDR AppName,\
WS_OVERLAPPEDWINDOW,CW_USEDEFAULT,\
CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,NULL,NULL,\
hInst,NULL
mov hwnd,eax
invoke ShowWindow, hwnd,SW_SHOWNORMAL
invoke UpdateWindow, hwnd
.WHILE TRUE
invoke GetMessage, ADDR msg,NULL,0,0
.BREAK .IF (!eax)
invoke TranslateMessage, ADDR msg
invoke DispatchMessage, ADDR msg
.ENDW
mov eax,msg.wParam
ret
WinMain endp

WndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
LOCAL hdc:HDC
LOCAL ps:PAINTSTRUCT

mov eax,uMsg
.IF eax==WM_DESTROY
invoke PostQuitMessage,NULL
.ELSEIF eax==WM_CHAR
push wParam
pop char
invoke InvalidateRect, hWnd,NULL,TRUE
.ELSEIF eax==WM_PAINT
invoke BeginPaint,hWnd, ADDR ps
mov hdc,eax
invoke TextOut,hdc,0,0,ADDR char,1
invoke EndPaint,hWnd, ADDR ps
.ELSE
invoke DefWindowProc,hWnd,uMsg,wParam,lParam
ret
.ENDIF
xor eax,eax
ret
WndProc endp
end start
;===========================================================================EOF

Let's analyze it:

char WPARAM 20h ; the character the program
receives from keyboard

This is the variable that stores the character received from the keyboard.
Since the character is sent in WPARAM of the window procedure, we define
the variable as type WPARAM for simplicity. The initial value is 20h or the
space since when our window refreshes its client area the first time, there
is no character input. So we want to display space instead.

.ELSEIF eax==WM_CHAR
push wParam
pop char
invoke InvalidateRect, hWnd,NULL,TRUE

This is added in the window procedure to handle the WM_CHAR message. It
just puts the character into the variable named "char" and then calls
InvalidateRect. InvalidateRect makes a specified rectangle in the client
area invalid which forces Windows to send WM_PAINT message to the window
procedure. Its syntax is as follows:

BOOL InvalidateRect(
HWND hWnd, // handle of window with changed update region
CONST RECT * lpRect, // address of rectangle coordinates
BOOL bErase // erase-background flag
);

lpRect is a pointer to the rectagle in the client area that we want to
declare invalid. If this parameter is null, the entire client area will be
marked as invalid.
bErase is a flag telling Windows if it needs to erase the background. If
this flag is TRUE, then Windows will erase the backgroud of the invalid
rectangle when BeginPaint is called.

So the strategy we used here is that: we store all necessary information
about how to paint the client area and generate WM_PAINT message to paint
the client area. Of course, the codes in WM_PAINT section must know
beforehand what's expected of them. This seems a roundabout way of doing
things but it's the way of Windows.

Actually we can paint the client area during processing WM_CHAR message by
calling GetDC and ReleaseDC pair. There is no problem there. But the fun
begins when our window needs to repaint its client area. Since the codes
that paint the character are in WM_CHAR section, the window procedure will
not be able to repaint our character in the client area. So the bottom line
is: put all necessary data and codes that do painting in WM_PAINT. You can
send WM_PAINT message from anywhere in your code anytime you want to
repaint the client area.

invoke TextOut,hdc,0,0,ADDR char,1

When InvalidateRect is called, it sends a WM_PAINT message back to the
window procedure. So the codes in WM_PAINT section is called. It calls
BeginPaint as usual to get the handle to device context and then call
TextOut which draws our character in the client area at x=0, y=0. When you
run the program and press any key, you will see that character echo in the
upper left corner of the client window. And when the window is minimized
and maximized again, the character is still there since all the codes and
data essential to repaint are all gathered in WM_PAINT section.


____________________________________________________________________________
::::::::::.___ . ```
::::::::::| _/__. |__ ____ . __. ____ ____ __. \\
:::::: |____ | __/_ _\_ (.___| .___) |__\_ (._) /___) | ,
::::::::::/ / | \ | - | \ | - | - | \/| - |
.=:::::::::/______|_____|_____| (___|_____|______|____|_____|===============.
'=::::::::::==================| . ____ | (____====[ The C Standard lib ]==='
:::::::::: | |------| - |
:::::::::: | |______|______|CE
. :
C string functions: introduction, _strlen
by Xbios2


I. INTRODUCTION
---------------
Beware: this is going to be long...

String handling in assembly is - anyway - a difficult subject. There are few
string-oriented x86 opcodes, and most of them are slow. There is not a standard
library providing even basic functions. There is no string specific syntax in
assembly, like C's printf('hello world') or, even worse, BASIC's a$=b$+'hello'.
In a few words, if easy string-related programming is your goal, maybe you
should consider PERL, or another text-manipulation language.

Yet, string functions are really needed, since almost any program in assembly
uses text for I/O. (An alternative to this would be using animated paper-clips
to communicate with the user :)).

Furthermore, coding those functions in assembly allows for smaller and faster
functions. Actually many of the string functions in C _were_ written in
assembly (e.g. strlen, strcat, strcpy, etc). Those can be divided in two
categories:

-'Traditional' func

  
tions, using the x86 string instructions
-'Modern' functions, which run faster by being Pentium-optimized

Borland C++ 4.02 and KERNEL32.DLL only have traditional functions. Borland's
C++ Builder v1.0 (once given free as a demo) includes both types. MSVCRT.DLL
(version 5) contains 'modern' versions.

The three main aspects considered in these articles (and generally when compar-
ing different versions of the same function) are speed, size and common sense.

'Common sense' indicates how easy it is to understand the way a function
operates by reading the source code, how 'elegant' the code is. In a library
module distributed as a binary (in a 'static' reuse of code), common sense is
not important. It becomes important when the source code is distributed too,
because it allows 'dynamic' reuse. 'Elegant' code can be easily optimized for
specific needs or expanded to become a more general function.

'Size' is, obviously, the size of the resulting code. Besides creating smaller
files, small size has two interesting 'side-effects'. It (usually) creates more
elegant code and faster code (it decreases k, but it usually increases l (for
an explanation of k and l see 'speed'). For very small functions like strlen it
has the added advantage of allowing the code to be inlined without wasting too
much space, thus decreasing k even more.

'Speed' indicates the number of cycles needed to execute the function. For
simple string functions the number of cycles needed can be expressed as

c=k+l*n

where c is the total number of cycles, k is the number of cycles needed to
'prepare' the function, l is the number of cycles needed to process each chara-
cter and n is the number of characters in the string. It is obvious that small
values of c mean faster execution. In order to compare two versions of a
function that run at speeds of

c1=k1+l1*n and c2=k2+l2*n

the ratio of c1/c2 is calculated:

c1 k1+l1*n
r=----=---------
c2 k2+l2*n

if r=1 then both versions run at the same speed.
if r>1 then version 2 is faster. if r<1 then version 1 is faster.
Simple maths prove that:

1. When n becomes infinite, r becomes equal to l1/l2. Especially if l1=l2,
then r=1

2. If k1<k2 but l1>l2, c1<c2 (version 1 is faster) if n<(k2-k1)/(l1-l2).

Point 1 means that for long strings speed is (almost) independent of the value
of k. Especially if l1=l2 both versions will run at (almost) the same speed.
Point 2 means that for small strings k strongly affects the value of c.

For those of you that are fed up with maths, here is a simple example that
demonstrates what I've been trying to say all this time :)

If version 1 runs at c1=10+3*n and version 2 at c2=30+1*n then:

-For strings up to 9 chars version 1 is faster
-For strings of 10 chars both versions run at the same speed
-For strings of 11 or more version 2 is faster
-For strings of 50 chars, version 2 is 2x faster than version 1
-For strings of 770 chars version 2 is 2.9x faster than version 1

The problem is that none of the above versions can be classified as better than
the other. Think of the parser of a compiler. It receives as input lines from a
text file, which are strings longer than 10 characters, but also has to deal
with tokens, which are short strings (in an assembler, three-char tokens are
very common).

Keep in mind that, while l depends only on the method used to implement the
function, k also depends on the 'push arg/call/prepare stack/resore stack/
ret/get arg' times. So if n is low, overall speed can be increased by inlining
the code, thus subtracting from k the time needed to call the function.

Well, I think you've had enough. Let's see all this stuff in practice.


II. THE _STRLEN FUNCTION
------------------------
Attention: especially for _strlen, ALL versions I have either written or found
in libraries will be explained. This means you'll get source code for 8
functions...

size_t strlen(const char *s);

Calculates the length of a string. strlen returns the number of characters in
s, not counting the null-terminating character.

_strlen is the simplest of the string functions. The 'traditional' way to
implement it is through 'repne scasb'. BC 4.02 implements it as:

; ------------ version 1 ------------
; Borland C++ 4.02
; 25 bytes
; c=27+4*n

_strlen proc near
push ebp
mov ebp, esp
push edi
mov edi, [ebp+8]
mov ecx, -1
xor al, al
cld
repne scasb
not ecx
lea eax, [ecx-1]
pop edi
pop ebp
retn
_strlen endp
; -----------------------------------

A shorter, and a bit faster version of this would be:

; ------------ version 2 ------------
; Improved 'repne scasb'
; 18 bytes
; c=21+4*n

_strlen proc near
xor eax, eax
push edi
mov edi, [esp+8]
or ecx, -1
repne scasb
sub eax, 2
pop edi
sub eax, ecx
retn
_strlen endp
; -----------------------------------

The win32 API also includes a strlen function, called lstrlenA. It is based on
'repne scasb' as well, but you are _strongly_ advised to avoid it. It runs at
c=56+4*n cycles.

The most 'common sense' function (IMHO) is also the smallest:

; ------------ version 3 ------------
; Elegant and very small
; 15 bytes
; c=27+4*n
; k is so big because we have a retn immediately after a jump
; if a nop is added between those two, k drops to 13

_strlen proc near
or eax, -1
mov ecx, [esp+4]
loop1: inc eax
cmp byte ptr [ecx+eax], 0
jnz short loop1
; nop
retn
_strlen endp
; -----------------------------------

Which gets a little less elegant but faster if tweaked a little:
(The trick is that the carry flag is set by the 'cmp' instruction if the byte
read is 0, else it is cleared. The 'inc' instruction doesn't affect the carry
flag).

; ------------ version 4 ------------
; Very small and faster than repne scasb
; 15 bytes
; c=12+3*n

_strlen proc near
mov ecx, [esp+4]
xor eax, eax
loop1: cmp byte ptr [ecx+eax], 1
inc eax
jnc short loop1
dec eax
retn
_strlen endp
; -----------------------------------

But it gets even better as inlined code, as a macro:

; ------------ version 4.5 ------------
; Very small, extremely elegant macro
; 10 bytes
; c=8+3*n

strlen macro srcreg, cntreg
xor cntreg, cntreg
cmp byte ptr [srcreg+cntreg], 1
inc cntreg
jnc $-5
dec eax
endm
; -----------------------------------

This macro returns in cntreg the length of the string at srcreg.
It uses no other registers, srcreg is unchanged, it is only 10 bytes long and
it runs at a speed of only 8+3*n cycles. It also returns its value in any
register, without altering the other registers.

Suppose we need in ecx the length of the string at esi. The following code:

push esi
call _strlen
pop ecx ; restore stack
mov ecx, eax

takes 9 bytes, only one less than the macro version. Plus, of course, the at
least 15 bytes of code in _strlen.

Another 'elegant' version, which is also small and much faster is the following:

; ------------ version 5 ------------
; Elegant, small and fast
; 16 bytes
; c=12+2*n

_strlen proc near
mov ecx, [esp+4]
xor eax, eax
loop1: mov dl, [ecx+eax]
inc eax
or dl, dl
jnz short loop1
dec eax
retn
_strlen endp
; -----------------------------------

I believe that version 5 is the best version that could have elegance, speed
and small size together. It can also be converted to a macro and inlined to
drop to a speed of c=8+2*n (it will use one register more, but this register
would anyway be lost if a call to the function was made).

It also has what I believe is the smallest value of k. However, it doesn't have
the smallest value of l. To reduce the cycles needed, data can be read not byte
after byte but dword after dword. Here is a routine given by Agner Fog in his
document on Pentium optimization (which you are _strongly_ advised to read):

; ------------ version 6 ------------
; [by Agner Fog] Very fast
; 61 bytes
; c=18+1*n (not exactly, as data is read in 4 byte blocks)

_strlen proc
mov eax, [esp+4] ; get pointer
mov edx, 7
add edx, eax ; pointer+7 used in the end
push ebx
mov ebx, [eax] ; read first 4 bytes
add eax, 4 ; increment pointer
l1: lea ecx, [ebx-01010101h] ; subtract 1 from each byte
xor ebx, -1 ; invert all bytes
and ecx, ebx ; and these two
mov ebx, [eax] ; read next 4 bytes
add eax, 4 ; increment pointer
and ecx, 80808080h ; test all sign bits
jz l1 ; no zero bytes, continue loop
test ecx, 00008080h ; test first two bytes
jnz short l2
shr ecx, 16 ; not in the first 2 bytes
add eax, 2
l2: shl cl, 1 ; use carry flag to avoid a branch
pop ebx
sbb eax, edx ; compute length
ret
_strlen endp
; -----------------------------------

The only problem with this routine is that it expects the string to be aligned
on a 4 byte boundary. If the string is misaligned, the speed drops to
c=24+1.75*n. In the extreme case that the string is misalinged AND it ends on a
page boundary, the function will cause an access violation error.

The fastest version (I have found) is the one in the Borland C++ builder
library:

; ------------ version 7 ------------
; [C++ Builder, slightly modified] Fastest
; 88 bytes
; c=20+0.75*n (not exactly, see notes)

_strlen proc near
mov eax, [esp+4]
test al, 3
jnz short unalgn

loop1: mov edx, [eax]
add eax, 4
mov ecx, edx
sub edx, 1010101h
and edx, 80808080h
jz short loop1
not ecx
and edx, ecx
jz short loop1
test dl, dl
jnz short minus4
test dh, dh
jnz short minus3
test edx, 0FF0000h
jnz short minus2
jmp short minus1

unalgn: add eax, 3
xor cl, cl
cmp byte ptr [eax-3], cl
jz short minus3
cmp byte ptr [eax-2], cl
jz short minus2
cmp byte ptr [eax-1], cl
jz short minus1
and al, 0FCh
jmp short loop1

minus4: dec eax
minus3: dec eax
minus2: dec eax
minus1: mov ecx, [esp+4]
dec eax
sub eax, ecx
retn
_strlen endp
; -----------------------------------

Actually, the original version is 90 bytes long. I have only changed the
'unalgn:' block, to reduce k if the string is unaligned.

This function works well even on unaligned strings, as it first check the unali-
gned bytes, and the proceeds in the main loop with aligned data (for unaligned
strings it runs at c=31+0.75*n cycles). Since all dwords read are aligned,
unaligned strings that end on page boundaries don't cause problems.

This function is not always the fastest. If the string contains characters in
the range 128 to 255 (i.e. signed bytes) the speed drops. If all the characters
are signed (actually if at least one byte in every dword read), the speed
becomes c=1.25*n. Of course most of the time (especially for english text) this
is not the case, but if you have to process strings in another language that
has characters in the range 128 to 255, it is a bit slower.

Another fast version of strlen can be found in MSVCRT.DLL (the one I checked is
version 5.00.7303). It runs at c=20+1*n, and handles unaligned strings almost
like the Builder version. Misaligned strings give a value of k ranging from a
minimum of 25 to a maximum of 52.

What the MSVCRT function lacks completely is common sense and small size. In
fact it is 144 bytes long and it is divided in two different pieces of the dll's
code, causing most jumps to be in the long form.

The main loop MSVCRT uses is good, but the rest of the function isn't. Based on
that function, I came up with the following one:

; ------------ version 8 ------------
; My fast version
; 92 bytes
; c=17+1*n

_strlen proc
mov eax, [esp+4]
xor ecx, ecx
loop2: test al, 3
jz loop1
cmp byte ptr [eax], cl
jz short ret0
cmp byte ptr [eax+1], cl
jz short ret1
cmp byte ptr [eax+2], cl
jnz short adjust
inc eax
ret1: inc eax
ret0: sub eax, [esp+4]
ret

adjust: add eax, 3
and eax, 0FFFFFFFCh

loop1: mov edx, [eax]
mov ecx, 81010100h
sub ecx, edx
add eax, 4
xor ecx, edx
and ecx, 81010100h
jz loop1
sub eax, [esp+4]
shr ecx, 9
jc minus4
shr ecx, 8
jc minus3
shr ecx, 8
jc minus2
minus1: dec eax
ret
minus4: sub eax, 4
ret
minus3: sub eax, 3
ret
minus2: sub eax, 2
ret
_strlen endp
; -----------------------------------

This one has the advantage of having k=17 for aligned strings and k=24 to 25
for misaligned ones.

The only question left to be answered is: 'Which version should we prefer?'.

If your program does not include string handling in it's time-critical parts,
I higly recommend either versions 5 or 4.5 (the inlined macro). As said before,
the size overhead of the inlined version is very small (if any), and it has
another advantage: it keeps the source code more readable, as it only involves
the needed registers (input and output) in one single line.

If string handling IS time-critical, I recommend version 8 (of course, it's
mine... :)). Even then, the average size of the handled strings is to be consi-
dered, as well as the percentage of unaligned strings. For unaligned strings of
16 or less characters, the fastest version would be an inlined version 5,
running at c=8+2*n.

The choice is yours....


____________________________________________________________________________
::::::::::.___ . ```
::::::::::| _/__. |__ ____ . __. ____ ____ __. \\
:::::: |____ | __/_ _\_ (.___| .___) |__\_ (._) /___) | ,
::::::::::/ / | \ | - | \ | - | - | \/| - |
.=:::::::::/______|_____|_____| (___|_____|______|____|_____|===============.
'=::::::::::==================| . ____ | (____====[ The C Standard lib ]==='
:::::::::: | |------| - |
:::::::::: | |______|______|CE
. :
C string functions: _strcpy
by Xbios2


I. INTRODUCTION
---------------
C syntax: char *strcpy(char *dest, const char *src);

_strcpy copies string src to dest, stopping after the terminating null character
has been moved, and returns dest.

The 'traditional' way to do this is with the 'rep movs' instruction. BC 4.02
and kernel32 use it. The problem is that it is rather slow (BC _strlen takes
53+5.5*n cycles, lstrlenA takes 74+5.5*n cycles, and optimizing their code
leads to 46+5.5*n cycles wher n the number of chars, see part I of these
articles). This is because even though the 'rep movs' instruction is fast it
needs to know the number of bytes to copy in advance. So, _strcpy includes a
_strlen function before the actual copying, which is implemented through 'repne
scasb', a slow instruction.

In this article we will examine two 'modern' _strcpy functions, found in
MSVCRT.DLL and Borland C++ Builder library. Those functions are (supposed to be)
optimized for Pentium processors. If you're not familiar with optimization for
Pentium processors, I suggest you read the document on Pentium optimization by
Agner Fog (http://announce.com/agner/assem).


II. STRCPY IN MSVCRT
--------------------
; c=39+1.75*n / 146 bytes

strcpy proc
push edi
mov edi, [esp+8] ; dest
mov ecx, [esp+0Ch] ; src
test ecx, 3
jz short loop1

algn: mov dl, [ecx]
inc ecx
test dl, dl
jz short one
mov [edi], dl
inc edi
test ecx, 3
jnz short algn

loop1: mov edx, -81010101h
mov eax, [ecx]
add edx, eax
xor eax, -1
xor eax, edx
mov edx, [ecx]
add ecx, 4
test eax, 81010100h
jz short nozero
test dl, dl
jz short one
test dh, dh
jz short two
test edx, 0FF0000h
jz short three
test edx, 0FF000000h
jz short four

nozero: mov [edi], edx
add edi, 4
jmp short loop1
;... in the DLL, there is code here, not used by strcpy

one: mov [edi], dl
mov eax, [esp+8]
pop edi
retn

two: mov [edi], dx
mov eax, [esp+8]
pop edi
retn

three: mov [edi], dx
mov eax, [esp+8]
mov byte ptr [edi+2], 0
pop edi
retn

four: mov [edi], edx
mov eax, [esp+8]
pop edi
retn
strcpy endp

This procedure does the following:
1. Read arguments (src, dest) from stack
2. Check if src is aligned on a 4 byte boundary
If not, copy byte after byte until src gets aligned
3. Loop
Read one dword from src
Test if there is a zero byte in the dword
If no zero, copy dword to dest, loop back
4. Copy the remaining bytes
5. Return with dest in eax

Actually the code above compiles to 130 bytes. The extra 16 bytes are added
because between the loop and the 'one:' label there is the strcat function. So
4 conditional jumps take the 6-byte form, not the 2-byte one.

This function takes 39+1.75*n. This means that the loop takes 7 cycles to
execute (since each time the loop runs, it copies 4 bytes). Here is the explan-
ation of the loop (U and V refer to the pipe the commands run in).

loop1: mov edx, -81010101h ; U 1st
mov eax, [ecx] ; V
add edx, eax ; U 2nd
xor eax, -1 ; V
xor eax, edx ; U 3rd
mov edx, [ecx] ; V
add ecx, 4 ; U 4th
test eax, 81010100h ; V
jz short nozero ; U 5th
...
nozero: mov [edi], edx ; U 6th
add edi, 4 ; V
jmp short loop1 ; U 7th

The problem here is that both jumps run in the U pipe so they will not pair.
Generally it's better to have an even number of instructions in each block of
code. Just by moving one instruction this code will run in 6 cycles (i.e.
39+1.5*n cycles):

loop1: mov edx, -81010101h ; U 1st
mov eax, [ecx] ; V
add edx, eax ; U 2nd
xor eax, -1 ; V
xor eax, edx ; U 3rd
mov edx, [ecx] ; V
test eax, 81010100h ; U
jz short nozero ; V 4th
...
nozero: mov [edi], edx ; U 5th
add ecx, 4 ; V <<< moved instruction
add edi, 4 ; U
jmp short loop1 ; V 6th

Everything pairs perfectly, and so 12 instructions only take 6 cycles. Pay
attention to one thing: if 'add ecx, 4' and 'add edi, 4' are swapped, we get
back to 7 cycles per loop, even though the pairing is the same. This is because
the 'mov eax, [ecx]' instruction uses ecx to access memory, but ecx was changed
in the previous clock cycle (add ecx, 4 / jmp short loop1). This causes an AGI
stall (Address Generation Interlock), which wastes one cycle.

As you 've noticed, _strcpy makes sure that the data read from src is aligned,
because reading aligned dwords is faster. If src is aligned, the test only takes
one cycle more, so it shouldn't trouble us. Yet, aligning src is not always a
good idea. Suppose that you have an unaligned string and want to copy it in a
buffer that is aligned. So what happens is that by aligning src we misalign
dest. The problem is that misaligned writes are more expensive in cycles than
misaligned reads. So _strcpy should either align dest or leave everything
untouched. (not aligning src introduces an extremely small possibility of an
access violation error, read section V below for details).


III. STRCPY IN C++ BUILDER
--------------------------

; c=66+1.75*n / 146 bytes

_strcpy proc
push ebp
mov ebp, esp
mov ecx, [ebp+0Ch] ; src
mov edx, [ebp+8] ; dest
mov eax, ecx
and eax, 3
jmp algn[eax*4]
; ------------------------------------
algn dd offset loop1
dd offset algn3
dd offset algn2
dd offset algn1
; ------------------------------------

algn3: mov al, [ecx]
or al, al
jz short one
mov [edx], al
add ecx, 1
add edx, 1

algn2: mov al, [ecx]
or al, al
jz short one
mov [edx], al
add ecx, 1
add edx, 1

algn1: mov al, [ecx]
or al, al
jz short one
mov [edx], al
add ecx, 1
add edx, 1

loop1: mov eax, [ecx]
or al, al
jz short one
or ah, ah
jz short two
test eax, 0FF0000h
jz short three
test eax, 0FF000000h
jz short four
mov [edx], eax
add ecx, 4
add edx, 4
jmp short loop1

four: mov [edx], eax
mov eax, [ebp+arg_0]
pop ebp
retn

three: mov [edx], ax
mov byte ptr [edx+2], 0
mov eax, [ebp+arg_0]
pop ebp
retn

two: mov [edx], ax
mov eax, [ebp+arg_0]
pop ebp
retn

one: mov [edx], al
mov eax, [ebp+arg_0]
pop ebp
retn
_strcpy endp

This function runs at 66+1.75*n cycles. The aligning is done in an awful way.
If the aligning code is removed, we gain 39 cycles. By not using ebp, we save
4 more cycles. The loop takes 7 cycles, as shown below:

loop1: mov eax, [ecx] ; U 1st
or al, al ; U 2nd
jz short one ; V
or ah, ah ; U 3dr
jz short two ; V
test eax, 0FF0000h ; U 4th
jz short three ; V
test eax, 0FF000000h ; U 5th
jz short four ; V
mov [edx], eax ; U 6th
add ecx, 4 ; V
add edx, 4 ; U 7th
jmp short loop1 ; V

The first two instructions don't pair because 'or al,al' accesses a register
changed by the previous instruction. Anyway, there are 13 instructions, not 12
as in the MSVCRT function. So, one instruction has to be removed. This instruct-
ion is the unconditional jump (generally unconditional jumps can be avoided).
Notice that if we get through to 'jz short four', the 'mov [edx], eax' instruct-
ion will be executed anyway. So we rewrite the code as:

loop1: mov eax, [ecx] ; U 1st
inc ecx ; V
or al, al ; U 2nd
jz short one ; V
or ah, ah ; U 3dr
jz short two ; V
test eax, 0FF0000h ; U 4th
jz short three ; V
mov [edx], eax ; U 5th
add edx, 4 ; V
shr eax, 24 ; U 6th
jnz short loop1 ; V

Notice that we use 'shr eax, 24' instead of 'test eax, 0FF000000h', because we
no longer need the value in eax, and the 'shr' form is two bytes shorter.

A modified version of this strcpy is the best I could come up with:


IV. A FAST STRCPY
-----------------

; c=25+1.5*n / 80 bytes

_strcpy proc
mov ecx, [esp+8] ; src
mov edx, [esp+4] ; dest
push edx ; save return value
test edx, 3 ; check if dest is aligned
jz short loop1

algn: mov al, [ecx]
inc ecx
mov [edx], al
inc edx
test al, al
jz short return
test edx, 3
jnz short algn

loop1: mov eax, [ecx]
add ecx, 4
or al, al
jz short one
or ah, ah
jz short two
test eax, 0FF0000h
jz short three
mov [edx], eax
add edx, 4
shr eax, 24
jnz short loop1
pop eax
retn

three: mov byte ptr [edx+2], 0
two: mov [edx], ax
return: pop eax ; restore return value
retn

one: mov [edx], al
pop eax ; restore return value
retn
_strcpy endp

This function aligns dest instead of src, which, as discussed above, is faster.
It can run at one cycle less, by reading the return value directly from the stack
and not push/popping it, but it would take 8 bytes of code more.

Slight modifications to this routine give us three other functions:
- _stpcpy is exactly the same as _strcpy, only that it returns a pointer to the
ending null char copied in dest
- _strdup is a combination of _strlen, _malloc and _strcpy
- _strcat is a combination of _strlen and _strcpy

The MSVCRT _strcat actually counts the chars in src and then jumps into the code
of _strcpy to perform the actual copying. Strangely, _strdup is implemented
through 'repne scas' and 'rep movs'. It even has two 'repne scasb' instructions,
one to calculate the length of the string to pass to the malloc function, and
one to calculate the length of the string to copy, even though these two values
are the same. So, even if coding in MS C++, using _strlen/_malloc/_strcpy is
faster than using _strdup.


V. IS IT FOOLPROOF?
-------------------

To be honest, no. But programming is hardly ever so... First of all, any version
of strcpy (of any function, generally) will fail if it tries to read or write
data in a page it doesn't have access. This is hardly the case, but it can
happen:

-If there is no NULL character between the address of src and the last byte in
the valid page.
-If src is longer than the distance between dest and the last valid byte.

The first case is extremely unlikely, because even if src was corrupt and had no
terminating NULL, one is very likely to be encountered somewhere. The second
case is also unlikely, and it means that the programmer didn't allocate enough
space for dest. Anyway, corrupt data or not enough allocated space even if they
don't cause an access violation, they cause problems. But the problem was
created by the programmer, not the function. Yet there are also two cases where
the strings ARE ok but an access violation occurs. These cases appear only on
'optimized' versions of strcpy, not on the 'rep movsb' version.

The first case would appear in a strcpy function that doesn't align src (it
either aligns dest or leaves both unchanged), if src is not aligned and it ends
on a page boundary. Then the last read operation would try to read one to three
bytes on a page it doesn't have access. This doesn't happen on the Builder and
MSVCRT functions and happen on the one I give. Yet, it is really unlikely to
happen, and aligning dest is faster.

The second case would appear if dest points to a character in src (including
the terminating NULL). What happens is that the NULL of src (and any other
found) are overwritten, so no NULL is found and we finally get an access
violation. This doesn't happen in the 'traditional' versions, because we only
copy strlen(src) bytes. But even in those versions the last character copied
wouldn't be a NULL, so dest wouldn't be a proper string.


____________________________________________________________________________
____ . : . ```
| |___ : | ---- _____ ______ _____ | \\
| | |___|_ |______\ |---/ ._____/\____) -- (_) / |____) | ,
| ' | / | \___ __/ | | \/| | - |CE
.==|________| (______|_______/ \==|_____/\__________|____| |______|===.
'=========| |===========/----|___\==================[ The Unix World ]==='
:
X-Windows in Assembly Language: Part II
by mammon_



OK, let's face it: you've seen the tedium of XLib, one *has* to use widgets in
order to get any programming done in XWindows. 'But this is assembly langauge',
the masochist might point out. 'Aren't widgets a little Visual-Basicy?'

Not in the slightest. A widget is simply a C++ class exported for use --much
like the windows API functions, only a little more object oriented...maybe a
good comparison would be MFC or VCL. Xt, or 'X toolkit Intrinsics', is the
interface that widget sets [such as Athena, Qt or GTK] use to interface with
XLib. The Xt include files are in /usr/X11R6/include/X11, its libraries are in
/usr/X11R6/lib, and its exported functions are all prefixed with "Xt".

For the following examples I will be using the Atehna widget set, which is
supplied with XFree86. The include files for Athena are in /usr/X11R6/include/Xaw
and the libraries are in /usr/X11R6/lib.

A barebones Xt/Athena app in C would run as follows:
//====================================================================-xthell.c

#include <X11/Intrinsic.h>
#include <X11/StringDefs.h>
#include <X11/Xaw/Command.h>

void Quit(w, client_data, call_data) //CallBack function
Widget w;
XtPointer client_data, call_data;
{
exit(0);
}

main(argc,argv)
int argc;
char **argv;
{
XtAppContext app_context;
Widget ShellWidge, ButtnWidge;
ShellWidge = XtVaAppInitialize( &app_context, "toplevel", NULL, 0, &argc, \
argv, NULL, NULL);
ButtnWidge = XtVaCreateManagedWidget("hellbutton", commandWidgetClass, \
ShellWidge, NULL);
XtAddCallback(ButtnWidge, XtNcallback, Quit, 0);
XtRealizeWidget(ShellWidge);
XtAppMainLoop(app_context);
}
// compile with cc -o xthell xthello.c -lXmu -lXaw -lXt -lX11 -L/usr/X11R6/lib
//=========================================================================-EOF

Pretty ugly, eh? This boils down to the following steps:
1) Create the top-level 'Canvas' widget [the window]
ShellWidge = XtVaAppInitialize( &app_context, "toplevel", ..... )
2) Create the button widget
ButtnWidge = XtVaCreateManagedWidget("hellbutton", ..... )
3) Register a callback for the button
XtAddCallback(ButtnWidge, XtNcallback, Quit, 0);
4) Show the top-level widget
XtRealizeWidget(ShellWidge);
5) Transfer control to the Xt message loop
XtAppMainLoop(app_context);
The most interesting thing about Xt programming is in fact the callbacks.
Instead of writing a message processing loop, you register a callback function
for each widget and then pass control to Xt, which processes the messages for
you and dispatches each message to the appropriate callback function. The call-
back receives a pointer to the widget that sent the message [the same argument
as passed to XtAddCallback], a client_data pointer [the last argument passed to
XtAddCallback, used to pass data from the main routine to the callback], and a
call_data pointer, which contains information from the message [such as cursor
or scrollbar position].

The calls themselves are pretty straightforward:
XtVaAppInitialize initializes [sic] the X app and takes as its arguments a
pointer to an XtAppContext structure, the class name of the application,
application-specific command line options {args 3 and 4], argc, argv, a default
resource-settings file, and a NULL to terminate the arguments list [the
XtVaAppInitialize function actually takes a number of different parameters]; it
returns a handle to the 'canvas' or 'top-level' widget, on which all other
widgets will be painted.

XtVaCreateManagedWidget is used to create any of the Xt widgets [Athena, GTK,
etc], and takes as its parameters the instance name, the widget class, the
parent widget, and a NULL to terminate the arguments list; it returns a pointer
to the created widget.

XtAddCallback is used to register a callback function with a specific widget;
it takes as its parameters a pointer to the Widget, the callback type, the
function being registered, and a pointer to client_data which will be passed to
the callback.

XtRealizeWidget is simply used like ShowWindow in Windows; it takes a single
parameter which is the widget to 'show'; it displays that widget and its
children.

XtAppMainLoop takes the current application context [which was filled with the
XtVaAppInitialize call] and turns control over to the Xt message processing
loop. Note that the program does not have to return; in this example, the exit
call is placed in the callback function.

Here is the same application written for NASM:
;===================================================================-xthell.asm
BITS 32
GLOBAL main
GLOBAL bail
EXTERN XtVaAppInitialize
EXTERN XtVaCreateManagedWidget
EXTERN XtAddCallback
EXTERN XtRealizeWidget
EXTERN XtAppMainLoop
EXTERN commandWidgetClass
EXTERN exit

SECTION .data
AppContext DD 0
ShellWidge DD 0
ButtnWidge DD 0
ARGC times 128 DB 0
ClassName DB "toplevel",0
ButtnName DB "hellbutton",0
XtNcallback DB "callback",0 ;XtNcallback

SECTION .text
bail:
pop eax ; Xt_Pointer call_data
pop ebx ; Xt_Pointer client_data
pop ecx ; Xt_Pointer widget
push dword 0
call exit
;-------------------------- main
main:
mov eax, esp
push dword 0 ;Number of Args
push dword 0 ;Args
push dword 0 ;Fallback Resources
push dword 0 ;argv
push dword ARGC ;&argc
push dword 0 ;Number of Options
push dword 0 ;Options Array
push dword ClassName ;Class Name (String)
push dword AppContext ;Application Context (Ptr)
call XtVaAppInitialize
add esp, 36
mov [ShellWidge], eax
push dword 0
push eax ;Button parent (ShellWidge)
push dword [commandWidgetClass] ;Button widget type
push dword ButtnName ;Button class name
call XtVaCreateManagedWidget
add esp, 16
mov [ButtnWidge], eax
push dword 0 ; client_data
push dword bail ;CallBack function
push dword XtNcallback ; callback type
push eax ;CallBack widget (ButtonWidge)
call XtAddCallback
add esp, 16
push dword [ShellWidge] ;Widget Handle
call XtRealizeWidget
add esp, 4
push dword [AppContext]
call XtAppMainLoop
add esp, 4
ret
;==========================================================================-EOF
This can be compiled with the following commands:
nasm -f elf xthell.asm
gcc -o xthell xthell.o -lXaw -lXt -lX11 -L/usr/X11R6/lib

Most of the operation is the same as the C file; naturally you must push dword
0's instead of NULLs...and do not forget to push the arguments in reverse
order and to clean up the stack afterwards; this is C after all and not stdcall
is used in Windows.

You will have to study up on Athena to learn what the names of the various
widgets are ... I found it helpful to use
grep extern /usr/X11R6/include/Xaw/*
for a general overview. Note that the class names are strings in assembly; also
each of the various 'handles' [widgets, contexts, etc] is simply defined with a
DD 0 -- your generic 32-bit variable. The Callback type turned out to be a
string defined in the Xt header files; I simply recreated it above.

Another interesting gemis the need to call 'exit' rather than simply using a
'ret' as you would in console mode; the latter causes segmentation faults, most
likely due to the XtAppMainLoop call. In addition you *must* provide a pointer
to ARGC whether you check the command line or not; hence the 'ARGC: DB 128'.

In case you didn't notice, the Xt asm example is huge and clunky, with a lot of
not-so-obvious variable definitions. Having included a lengthy introduction to
NASM macros in this issue, I took the opportunity to create an xt.mac file
which will take some of the burden off of experimenting with small Xt apps. The
InitXt and RegisterCallback macros probably are not ready for prime-time just
yet, but they will do for testing purposes.

;=======================================================================-xt.mac
%macro CLASS 2
%1: DB %2,0
%endmacro
%macro WDGTPTR 1
%1: DD 0
%endmacro
%macro CONTEXT 1
%1: DD 0
%endmacro
%macro CHARSTR 2
%1: DB %2,0
%endmacro
%define WIDGET EXTERN
%define XLibAPI EXTERN
%define XtAPI EXTERN
%define PUBLIC GLOBAL
%define NULL dword 0
%define TERM_VARARGS dword 0
%macro InitXt 2
SECTION .data
CONTEXT AppContext
CLASS XtShell, "XtShell"
SECTION .text
EXTERN XtVaAppInitialize
push dword 0 ;Number of Args
push dword 0 ;Args
push dword 0 ;Fallback Resources
push dword 0 ;argv
push dword %2 ;&argc
push dword 0 ;Number of Options
push dword 0 ;Options Array
push dword XtShell ;Class Name (String)
push dword AppContext ;Application Context (Ptr)
call XtVaAppInitialize
add esp, 36
mov [%1], eax
%endmacro
%macro XtMsgLoop 0
EXTERN XtAppMainLoop
push dword [AppContext]
call XtAppMainLoop
add esp, 4
%endmacro
%macro RegisterCallback 1
SECTION .data
CBType: DB "callback",0
SECTION .code
push NULL ;
push dword %1 ;CallBack function
push dword CBType ;
push eax ;CallBack parent (ButtonWidge)
call XtAddCallback
add esp, 16
%endmacro
%macro CALLBACK 1
SECTION .data
Call_Data_%1: DD 0
Client_Data_%1: DD 0
Widget_%1: DD 0
GLOBAL %1
SECTION .text
%1:
pop eax
mov [Call_Data_%1], eax
pop ebx
mov [Client_Data_%1], ebx
pop ecx
mov [Widget_%1], ecx
%endmacro
%define ENDCALLBACK nop
%macro ENTRYPOINT 1
GLOBAL %1
%1:
%endmacro
;==========================================================================-EOF
Most of the macro file should be readily apparent if you are familiar with the
NASM macro facility. I did take the opportunity to clean up the callback
function, so that the parameters to the callback are saved in variables, but for
the most part it does the same as the equivalent code in the preceding asm
example.

Now the xthell.asm sample will look as follows:

;===================================================================-xthell.asm
BITS 32
%INCLUDE "xt.mac"
;========================================================XTRN=====
XtAPI XtVaCreateManagedWidget
XtAPI XtAddCallback
XtAPI XtRealizeWidget
WIDGET commandWidgetClass
EXTERN exit

;========================================================DATA=====
SECTION .data
;------------
WDGTPTR ptrShell
WDGTPTR ptrButton
CLASS XHELL, "XHell"
CLASS HellButton, "HellButton"
CallbackType DB "callback",0 ;XtNcallback
ARGC times 128 DB 0
;========================================================CODE=====
SECTION .text
;------------
CALLBACK bail
push dword 0
call exit
ENDCALLBACK


ENTRYPOINT main
InitXt ptrShell, ARGC

push TERM_VARARGS
push eax ;Button parent (ShellWidge)
push dword [commandWidgetClass] ;Button widget type
push dword HellButton ;Button class name
call XtVaCreateManagedWidget
add esp, 16
mov [ptrButton], eax

RegisterCallback bail

push dword [ptrShell] ;Widget Handle
call XtRealizeWidget
add esp, 4

XtMsgLoop
ret
;==========================================================================-EOF
Much prettier and hey, only twice as long as the C version! ;)

Next issue I will dwell on Xt/Athena a little longer and come up with some more
practical methods of automating the coding process.


____________________________________________________________________________
____ ___ _____ _| |_ ____ . ```
.__\ /__ ______ _) /.\ _/__ ___ ______\_ (_. | \\
| \/ | | \/ | \ | | | - | |CE ,
.==|________|______|______|_______|_______|_______| |======================.
'=================================================| :=[ Virtual Machines ]='
An Intro to the Java Virtual Machine
by Cynical Pinnacle


For awhile C/C++ reigned supreme and nothing challenged it but then along
comes Java, creating a splash, and causing outright corporate warfare to claim
right of ownership. Strangely enough the result of this war has not been dead
bodies but buckets and buckets of API's all given away for free. Just stop by
and take a look at Java's Official Website (http://java.sun.com) and what do
you find a good development kit with compiler, symbolic debugger, disassembler,
complete toolkit for creating GUI's, built in support for compression, encrypt-
ion, http, ftp, SMTP, POP3, IHMP, and more. Wow! But how can we take advantage
of all this?

First we have to step back a look at what Java really is. Because one of
the main goals of Java is platform independence (both from the chip and the OS).
The JVM, which supports Java, has to be both a chip and an OS. If any of us
(well lets say us programmers) were to design a chip and a OS in one, we would
fill it with features like built in security, automatic dependency resolution
and linking, network support, video and audio acceleration, along with more
common things such as built in data types (ints, floats, arrays, and objects ),
support for local variables, exception handling, support for debugging, and on
and on. This is really what Java is because it does all the things I mentioned
above and more. This is what Sun has tried to do - design an "Ideal" environ-
ment for writing and executing code, or write once run anywhere. But this
wealth of features comes at the price of speed of execution and further distance
further from the native code of your machine (unless you are running on a real
Java machine). And if you are like me the latter hurts as much as the former.

Still there are a alot of really appealing things about Java. And so the
challenge is to use these appealing features on our own terms. We can do this
by programming at a lower level to at least touch the native language of the
Java Virtual Machine (Java Assembly language!!)

The JVM:

I am going to take a programmers view of the JVM and say it is simple
because from our perspective it is. But for the sake of completeness I will
list the other components of the JVM:

Memory Manager: This is the unit responsible for the famous garbage collect-
ion and heap management. I say clean up your own garbage.

Error/Exception Manager: Handles unforseen conditions.

Native Method Support: This is to allow you to call WaitForSingleObject
from within Java. It is responsible for loading DLLs, resolving entry points
and executing them. Note Java only supports dynamic linking.

Threads: Java doesn't have to worry so much about memory because it is all
allocated on the stack. Each thread gets its own stack frame (chunk of stack
for its personal use). Switching threads in a stack based machine is easy. You
just make the threads stack top the machine's stack top and go.

Class Loader: This is just like the loaders in NT and 95. It brings up
class files from the disk initializes them (headers memory etc) and passes
execution to the classes entry point.

Security Manager: Want to find out whether or not you can do something.
Ask him.

Execution Engine: This is where the JVM opcodes are translated into native
opcodes. This is the part of the JVM which a low level Java programmer will
interact with most. The Execution engine has a much simpler structure than a
Pentium. At its heart is a stack where instuctions are executed. Note that
the JVM has no registers, which is more a trait of Virtual machines than Java.
Basically opcodes and operands are popped off of the stack and executed by the
VM bases upon a mapping between Java opcodes and Native opcodes. In addition,
there is built in support for local variables (more later). And as mentioned
before the stack is symbollically divided up by the JVM. Each method (read
method to mean function) gets it own stack frame which is allocated when the
method starts and deallocated when the method exits (sound familiar eg x86 push
bp -- mov bp, sp .. leave instructions). The Execution engine understands the
int, long, float, double, byte, char, short, reference (eg pointer) and the
instructions it understands are strongly typed (for example: there is an
instruction called iload which loads an integer from a local variable onto the
stack (like mov eax, myLocalVariable) but there is also a dload, lload, and
fload for doubles, longs, and floats).

Now with a little background it is time to learn or burn but first we need
some tools. First you will need to download the JDK from http://java.sun.com
(I recommend 1.1.x and the current I think is 7) you will also need a Java
assembler which is called Jasmin also free at
http://www.cat.nyu.edu/meyer/jasmin/
and you will need a good editor (I heartily recommend Visual SlickEdit 3.0 or
4.0 at http://www.slickedit.com ). You can get docs about the JVM from
http://www.javasoft.com/
And the best book I have found on the JVM (and Jasmin) is "Java Virtual Machine"
by Jon Meyer and Troy Denning. There are several useful tools with the JDK
comes a program called javap which is a Java .class file disassembler!
With the -c switch it will produce JASM code from any .class file. Note that
reversing Java programs is not nearly as hard as x86. Try it. Take some Java
.class file you have laying around ( Keep it small so you don't get confused )
and disassemble it then you will see what I mean. I can see a very difficult
future for Java Shareware programmers. There are also many other 3rd party
tools out there for mainpulating .class files.

With all of those tools installed we are ready to write the mandatory
"Hello from JASM!" program. First comes the Java Assembler code followed by
its Java equivilant.


;***************************************************************
;Export the class name so Java can find it
class public Hello

;Simplest class to derive from
super java/lang/Object

; General facts
; .method - means this is a function
;
; public - means you are it is visible externally (low
; level Object Orientedness)
;
; Java always uses explicit paths in Unix notation hence you
; end up with a lot of notation like java/lan/Object.
; java/lang/Object is the path to the superclass. In a lot
; of the functions there is a V, this means it returns void
; the syntax for specifying arguments is strange. I
; recommend reading Jonathan Meyer's Jasmin documentation
; http://www.cat.nyu.edu/meyer/jasmin/


; This method is called init and is in every class
; all that is done here is to push the contents of local
; variable 0 onto the stack and call the superclass's
; (Object) init method. Local variable 0 is always
; the reference (pointer) to the equivalent of C++'s this

.method public <init>()V

aload_0

; This just calls the superclasses init method
invokenonvirtual java/lang/Object/<init>()V

; Get out
return
end method


; Here is the main function which is publically visible
; is static and thus shared by all instances of the class,
; it takes one argument of type [ljava/lang/String, which
; is an array of strings eg: argv**, and returns void hence
; the V.

.method public static main([Ljava/lang/String;)V

; Delcare your stack memory usage
.limit stack 2
.limit locals 1

; Push 5 onto the stack and store it in local variable 1
bipush 5
istore_1

LoopTop:

; These next two lines put the parameters for the println
; function onto the stack in the right order. Java uses
; the pascal calling convention (push left to right and
; the callee cleans up). First a reference to the stream
; object doing the work is pushed onto the stack. Next a
; reference to the string to be printed is pushed.

;get the pointer to the stream object and push it
getstatic java/lang/System/out Ljava/io/PrintStream;
;get the pointer to the string and push it
ldc "Hello from JASM!"
;call println
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

; These next three lines are the loop condition
; iinc adds -1 to local variable 1. iload_1 pushes
; local variable 1 onto the stack and ifne compares it to
; zero (just like jnz in x86). If it is not equal to zero
; we jump to LoopTop
iinc 1 -1
iload_1
ifne LoopTop

; Go home
return
;Declare the end of the function
end method

;
; Java equivalent:
;
; public class Hello
; {
; public static void main ( String args[] )
; {
; int i;
; for( i = 0;i < 5; i++ )
; {
; System.out.println( "Hello from JASM!" );
; }
;
; }
;
; }
;***************************************************************

To run this JASM sample cut out the stuff between the *'s and save it to a
file called Hello.j. Next type jasmin Hello.j. This should generate a
Hello.class file. Now type java Hello and you should see the string "Hello
from JASM!" printed out 5 times. This is enough to get you started poking
around in the JVM and looking a little closer at the .class files you find
lying around ;) . More to come?


____________________________________________________________________________
____ . __ ```
._| __/___ : | ___ _____ __) |_ _____ \\
| |_____ |____|__ |_______ ______ ______ _) _ |_\_ __)___| __/_____ ,
| | ( | | __ | __ | ----' | | |_______ |CE
.=|_________| /_______|_______| _____| _____|________|____________________|=.
'===========| |===========|___|==|___|===================[ snippets ]==='

NumFactors
by Troy Benoist


;Summary: Routine to determine the number of factors for a 16-bit value
;Compatibility: All DOS versions/8088+ instructions
;Notes: 22 BYTES Input: AX = Value to check for number of factors
; * If CX is 2, check value in AX is prime.
mov bx,1 ;Test=1 (Test is the testing value of each theoretical
;factor of AX, from 1-AX).
xor cx,cx ;Count=0 (Count is running total with # of factors for AX).
ChkFctr:
xor dx,dx <--- ;Need to divide DX:AX by BX, but DX is not used-- clear it.
push ax | ;Dividing by BX puts quotient in AX, but quotient is not
| ;needed, and we need to keep the value to check, so save it.
div bx | ;Divide DX:AX by BX. Remainder is in DX.
pop ax | ;Restore value to check into AX.
cmp dx,0 | ;Is remainder=0? (Did Test divide evenly into check value?)
jnz NC -----| | ;If not, Count remains unchanged.
inc cx | | ;If so, factor found, so Count=Count+1.
NC: inc bx <---| | ;Test=Test+1.
cmp bx,ax | ;Is Test greater than check value?
jbe ChkFctr___| ;If not, go back and check next Test factor.


______________________________________________. __________________________ ```
| . ```
._____ ___ ____ ___ ___ ____ : | ______ \\
| __/__| |______\_ (_) | (____\_ ( ______ |________ _) __ |___. ,
| | | - | | | - | _ / | __ | ----' |CE
.==|_______|--)___|______|___|___|_______|--(_______|_______ |___________|===.
'=============================================[ issue | _____| challange ]==='
|___\
Convert ASCII hex to binary in 6 bytes
by mammon_ [and help]


The Challenge
-------------
Write a routine for converting ASCII hex to binary in 6 bytes.


The Solution
------------
Well, actually, I cheated: I found the following text on the internet a few
months ago and decided to see if I could beat it:
===================================================================-Asc2hex.txt
An efficient algorithm for converting ASCII hex to binary
Ken Sykes (72177,141)
David Ogle (75676,2612)

There is a well-known algorithm for converting a binary number between
one and fifteen to its equivalent hex form in ASCII that only requires
four assembly language instructions. Assuming that the number to convert is
in AL, the following sequence performs the conversion:

add al,90h
daa
adc al,40h
daa

This instruction sequence is, as far as we know, the shortest (16 cycles)
self-contained routine to convert a binary number to hex. Inspired by
this code and the fact that a similar routine to convert ASCII hex to
binary would be useful, we came up with two algorithms that convert an
ASCII hex digit to binary in five assembly instructions or less.

The first algorithm takes advantage of an undocumented feature of the
8088. The AAM instruction (Ascii Adjust for Multiply) divides AL by 10,
placing the remainder in AL and the quotient in AH. The opcode for
AAM is: 0D4h,0Ah. It's no coincidence that the second byte is 10 - the
8088 uses the second byte of the opcode as the divisor! The same rule
applies to the AAD (Ascii Adjust for Division) instruction. With this in
mind, a conversion routine goes as follows (assuming the ASCII digit is in
AL and in the range '0..9,A..F'):

sub al,'0'
aam 16
aad 9

The only problem is the Microsoft Assembler does not accept this form.
By placing the opcodes in data statements, however, the following code
will assemble properly:

sub al,'0'
db 0D4h,10h
db 0D5h,09h

At three instructions and six bytes of code space, We are reasonably
sure this is the shortest self-contained sequence to perform the conversion.
The only drawbacks are the use of non-standard opcodes and the execution
time (147 cycles!). The second algorithm, loosely-based on this one,
relieves these restrictions.

The second algorithm makes exclusive use of fast instructions to
perform the conversion (again, AL holds the digit to convert):

add al,40h
cbw
and ah,09h
add al,ah
and al,0Fh

While two additional instructions are required, the routine executes
in 17 cycles. We are reasonably sure this is the fastest-executing self-
contained code to perform the conversion. It has the added benefit of
handling the ASCII values 'a'..'f'.

These algorithms will hopefully trim down the execution times of your
programs, and we welcome any suggestions or improvements on our code.

Happy Hacking!
===========================================================================-EOF

Sadly, I was unable to come up with a smaller version, or a faster one. Tinara
posted a similar solution to the APJ Message Board:
SUB AL, 30h
AAM 10h ; db D4h, 10h
AAD 09h ; db D5h, 09h
...so he gets kudos for uncovering by work what I managed by stealth. As for
next issue, I haven't had time to prepare a challenge, but I'm sure one will
crop up in the next month or so.

Next Issue Challenge
--------------------
None so far... submissions welcome.



::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::.......................................................FIN

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT