Fun Facts about the 8088
Fun Facts About the 8088
I found this list somewhere on the net. Credits to Chris Peters.
1. Comparing a register
The fastest and smallest way to compare a 16 bit register to zero is to OR it with itself, e.g.
OR BX,BX ; 2 bytes, 3 clocks
JGE BXisPositive
this is much better than comparing it with zero, e.g.
CMP BX,0 ; 3 bytes, 4 clocks (bush league)
For the ultimate in comparing with zero, try to use the CX register. The 8088 contains the single instruction:
JCXZ CXisZero ; jump if CX is zero
This instruction makes a short jump if CX is zero.
To destructivly test for 1 or -1, use the DEC or INC instructions:
DEC DX ; 1 byte, 3 clocks
JZ DXisOne ; if zero, DX was 1
or
INC DX ; 1 byte, 3 clocks
JZ DXisMinusOne ; if zero, DX was -1
The LOOP instruction is just a fancy way of writing:
DEC CX
JNZ CXisNotZero
The difference is that LOOP is 1 byte smaller and 2 clocks faster. The LOOP instruction can be used to compare CX with multiple values as follows:
LOOP CXisNotOne ; If CX = 1 then...
... ; ...do this code, else...
CXisNotOne: LOOP CXisNotTwo ; if CX = 2 then… … ; …do this code, else…
CXisNotTwo: LOOP CXisNotThree ; if CX = 3 then… … ; …do this code, else etc.
Its possible to check if a signed number is in the range 0-n with a single comparison to n:
CMP DX,639 ; if <0 or >639...
JA OutOfRange ; ...its out of range
This is smaller and faster than:
OR DX,DX ; Never CMP DX,0!
JL OutOfRange ; If negative, its out of range
CMP DX,639
JG OutOfRange ; if greater, its out of range
This can be generalized to any signed compare within a specific range. If you wanted to make sure the AX register contained a number from -5 to 17:
SUB AX,-5 ; Subtract off lower bound
CMP AX,17-(-5) ; Compare upper - lower bound
JA OutOfRange
You cannot compare a segment register. To do so copy it to a register or memory location, then compare it.
2. Setting a register to zero
To set a register to zero, the smallest, fastest way is to XOR it with itself, e.g.
XOR BX,BX ; 2 bytes, 2 clocks
is smaller and faster than:
MOV DX,0 ; 3 bytes, 4 clocks
There is one side affect: MOVing does not affect the flags, but XORing does. The 8088 aficianado will only move a zero into a register in the rare cases where the flags must be preserved.
3. Incrementing, Decrementing
It is smaller to increment or decrement a 16 bit register then an 8 bit register, so if it doesnt matter, use a 16 bit register, e.g.:
INC DX ; 1 byte, 3 clocks
is smaller than:
INC DL ; 2 bytes, 3 clocks
Same thing goes for decrementing, its smaller to use a 16 bit register.
Its smaller (but not faster) to increment a register twice then to add 2 to it, e.g.
INC DX ; 1 byte, 3 clocks (total)
INC DX ; 2 bytes, 6 clocks
is smaller and slower than:
ADD DX,2 ; 3 bytes, 4 clocks
One side affect: INC and DEC do not affect the carry flag, ADD does.
4. If Then Else
When confronted with an If, Then, Else problem the assembly language programmer will often write it as Else, If, Then. For example, a sample problem might be to return 123H in the DX register if AL<5, otherwise return zero in DX. Using If, Then, Else produces:
CMP AL,5 ; Is AL less than 5?
JB ALisBelow5 ; yes...
XOR DX,DX ; Set DX to zero if AL>=5
JMP SHORT Continue ; proceed ALisBelow5:
MOV DX,123H ; Set DX to 80H if AL<5 Continue:
Using Else, If, Then:
MOV DX,123H ; Set DX to 80H if AL<5
CMP AL,5 ; Is AL less than 5?
JB ALisBelow5 ; yes...
XOR DX,DX ; Set DX to zero if AL>=5 ALisBelow5:
The idea is to do the work for the most likely case, then do the comparision. If you were right you’re all done, if not then do the other case.
5. Copying strings
One of the simplest ways to copy a null terminated string is as follows:
; ; DS:SI points at source ; ES:DI points at destination ; CopyString: LODSB ; read character into AL, inc SI STOSB ; store charcter, increment DI OR AL,AL ; was it the null JNZ CopyString ; no, repeat
A fast way to copy a string where the count is known is
SHR CX,1 ; divide count by two (2 bytes = 1 word)
REP MOVSW ; move the first part fast
JNC Even ; No carry if count was even
MOVSB ; move the odd byte Even:
Slightly faster but slightly bigger is:
SHR CX,1 ; Get the word count
REP MOVSW ; move the words, leaving CX = 0
RCL CX,1 ; if count was odd make CX = 1
REP MOVSB ; (possibly) move last byte
This is slightly faster because there is no jump to empty the prefetch queue.
The REP instruction checks to see if the loop count in CX is zero before starting, there is no need to check it beforehand.
6. Testing long pointer for null
Sometimes its necessary to check if a long pointer is null before using it. A sequence of code that works well is:
LES DI,[LongPointer] ; get the long pointer
MOV CX,ES ; copy ES to CX
JCXZ PointerIsNull ; if zero, dont use it
This assumes segment zero is an invalid pointer.
7. Exchanging
To move a register to or from AX takes 2 bytes and 2 clocks
MOV AX,DX ; 2 bytes, 2 clocks
However, to exchange a register with the AX register takes 1 byte and 3 clocks:
XCHG AX,DX ; 1 byte, 3 clocks
A clear case where its possible to optimize for size or speed. This optimization is only available with the AX register. A full understanding is actually more complex. Although XCHG takes 3 clocks it is often faster because it only uses one byte of the prefetch queue.
8. Testing bits
The 8088 contains an instruction that allows you to test various bits. It works by doing a non destructive AND operation.
TEST AX,0800H ; 4 bytes, 5 clocks
JZ BitIsOff
If one of the bytes is zero (a common occurence) this can be optimized to:
TEST AH,08H ; 3 bytes, 4 clocks
JZ BitIsOff
The best way to test the hi bit is:
OR AX,AX ; 2 bytes, 2 clocks
JNS HiBitIsOff ; jump not signed (hi bit zero)
A destructive way to test the low order bit is to shift it to the right into the carry flag:
SHR AX,1 ; 2 bytes, 2 clocks
JNC LoBitIsOff
You can test multiple bits with the TEST instruction:
TEST BL,11000000b ; check 2 highest bits
JZ BothAreZero ; if zero, both are zero
Sometimes you want to jump if either are zero instead of both zero, this can usually be accomplished by the NOT instruction and reversing the sense of the jump instruction:
NOT BL ; reverse the bits
TEST BL,11000000b ; check 2 highest bits
JNZ EitherAreZero ; if either were zero, then this
; is non zero
9. Absolute value
A fascinating way to get the absolute value of the AX register was discovered by Marlin Eller:
CWD ; replicate hi order bit of AX into DX
XOR AX,DX ; do a 1's complement or do nothing
SUB AX,DX ; add 1 to get a 2's complement
The boring method does not affect the DX register and can be used on any register:
OR BX,BX ; never CMP BX,0!
JGE NotNeg ; if negative...
NEG BX ; ...make it positive NotNeg:
The boring method empties the prefetch queue with the JGE instruction, making it much slower.
10. Length of null terminated string
To get the length of a null terminated string, scan for the null at the end with a starting count of -1
; ; ES:DI points at null terminated string ; XOR AL,AL ; look for null 2 bytes (total) MOV CX,-1 ; CX = -1 5 bytes REPNE SCASB ; CX = -len-2 7 bytes NOT CX ; CX = len+1 9 bytes DEC CX ; CX = len 10 bytes
This count does not include the null at the end. If you want it to include the null, just delete the final DEC CX. The use of the NOT instruction is quite interesting here.
11. Returning flags
The 8088 contains instructions for setting (STC) and clearing (CLC) the carry flag. To set the zero flag, simply compare some register with itself:
CMP DX,DX ; set zero flag
To clear the zero flag, OR the stack pointer with itself:
OR SP,SP ; clear zero flag
This is making the safe assumption that the stack pointer is not zero.
12. Shifting
Variable count shifting is slow on the 8088. Its faster to shift twice then to set a count of 2:
SHR AX,1 ; 2 bytes, 2 clocks (total)
SHR AX,1 ; 4 bytes, 4 clocks
is much faster than:
MOV CL,2 ; 2 bytes, 4 clocks (total)
SHR AX,CL ; 4 bytes, 20 clocks!
Variable count shifting is slower when shifting less than 5 bits, after that the prefetch queue makes variable shift counts faster.
13. Multiply and Divide
The multiply and divide instruction are some of the slowest instructions on the 8088. To give some perspective, a register to register MOV instruction takes 2 clocks, while a signed divide (IDIV) using registers can take 184 clocks.
If your goal is to write fast 8088 code, multiplying by constants can usually be done as a series of shifts and adds:
;
; Multiply the AX register by 10
;
SHL AX,1 ; AX = AX * 2 ( 2 clocks)
MOV BX,AX ; BX = AX * 2 ( 4 clocks)
SHL AX,1 ; AX = AX * 4 ( 6 clocks)
SHL AX,1 ; AX = AX * 8 ( 8 clocks)
ADD AX,BX ; AX = AX * 10 (11 clocks)
A multiply would be more than 10 times slower, but would take fewer bytes. Multiply is useful when neither argument is constant or you need to save bytes.
Mark Zbikowski uses this method to divide the AX register by 512:
SHR AX,1 ; divide AX by 2
XCHG AL,AH ; divide AX by 512 (AX is unsigned)
CBW ; AL < 128, so this sets AH to 0
14. Converting bytes to segments
To convert a byte count to a paragraph count try:
;
; DX contains a byte count
;
ADD DX,15 ; round up to next paragraph
MOV CL,4 ; 2^4 = 16 bytes per paragraph
SHR DX,CL ; divide by 16 by shifting 4 times
DX now contains a paragraph count. This assumes the value in DX is less than 0FFF1H. To cover the extended case:
ADD DX,15 ; round up to next paragraph
RCR DX,1 ; divide by 2, including carry
MOV CL,3 ; 2^3 = 8
SHR DX,CL ; divide by a total of 16
15. Call, Return, Jump
A near call followed by a near return can always be replaced with a near jump:
JMP NearProc ; 3 bytes, 15 clocks
is smaller and much faster than:
CALL NearProc ; 3 bytes, 19 clocks (total)
RET ; 4 bytes, 35 clocks
Its often possible to eliminate the JMP entirely by moving the subroutines adjacent to each other.
Conditional jumps on the 8088 are always short, i.e. the destination must be within -128 to 127 of the instruction pointer. It seems every time a single line of new code is added some conditional jump becomes out of range. One technique to get around this is to find a similiar conditional jump to jump to:
JC OutOfRange ; I want to jump to disk error...
... ; ...but its too far away, so...
OutOfRange:
JC DiskError ; I jump to this test for carry
Although this is not in the scope of this document, out of range jumps are usually the 8088 telling you that your subroutines have grown too large and should be broken up.
16. Multiple Entry Points
An old 8080 trick involving multiple entry points can be adapted to the 8088. Instead of doing this:
Entry1:
MOV AL,1 ; 2 bytes (total)
JMP SHORT EntryCommon ; 4 bytes
Entry2:
MOV AL,2 ; 6 bytes
JMP SHORT EntryCommon ; 8 bytes
Entry3:
MOV AL,3 ;10 bytes
EntryCommon:
The hearty and brave will do this:
Entry1:
MOV AL,1 ; 2 bytes (total)
DB 03DH ; 3 bytes
Entry2:
MOV AL,2 ; 5 bytes
DB 03DH ; 6 bytes
Entry3:
MOV AL,3 ; 8 bytes
EntryCommon: ; flags are modified
The DB 03DH is the opcode for a CMP AX,xx. In this case the bogus CMP AX’s are used to swallow up the MOV AL,x that follow.
A special case of this discovered by Pat Tharp optimizes for dual entry points:
TrueEntry: ; come here to set AX = TRUE
DB 0B8H ; (opcode for MOV AX,...)
FalseEntry: ; come here to set AX = FALSE
XOR AX,AX ; AX = 0 = FALSE
Whats happening here is that the 0B8H is the opcode for MOV AX,### which will put the opcode for the XOR AX,AX (nonzero C031H) in AX. This is a clear win of 3 bytes over the boring method:
TrueEntry:
MOV AL,1 ; set AX nonzero
JMP SHORT EntryCommon
FalseEntry:
XOR AX,AX ; set AX zero
EntryCommon:
17. Assertion Macros
When using these advanced techniques its important not to expose yourself to bugs caused by changing constants in your program. For instance, in the Multiply and Divide section of this document there is a code to quickly multiply by ten. If the constant should later change from ten to twelve this code would no longer work. An assertion macro would flag this code as being in error, saving many hours of needless debugging. In cannot be stressed to strongly that advanced 8088 programming requires liberal use of assertion macros and extra documentation.