Assembly Tutorial

"This is for all you folks out there, who want to learn the magic art of Assembly programming." - MAD

Index of Section 1

Ready To Start
Memory Segmentation
Code Example
The Stack
The Naming Convention

Next Chapter

First thing you need to know is that Assembly is a great, fast language, but only if you put time and effort in learning it. You must give all or nothing. (I suggest you give all) And remember, the beginning is always borring and hard...so don't give up !

Well, I'll start with the basics, like instruction format and some simple instructions to manipulate registers. I don't know how much you know about coding, so I'll explain even the most simple stuff. Please note, that I explain 8086 assembly coding. That means NO 32-bit registers and instructions and NO protected-, real- and virtual 86 mode for now.

Ready to Start!

First of all, we'll talk about the registers and then about the instructions to manipulate (change) them. The 8086 has 14 16-bit registers, all with different usage (see below). You might not understand some of the registers purposes, but be patient, I'll explain everything later.

Segment Registers

CS Code Segment 16-bit number that points to the active code-segment

DS Data Segment 16-bit number that points to the active data-segment

SS Stack Segment 16-bit number that points to the active stack-segment

ES Extra Segment 16-bit number that points to the active extra-segment

Pointer Registers

IP Instruction Pointer 16-bit number that points to the offset of the next instruction

SP Stack Pointer 16-bit number that points to the offset that the stack is using

BP Base Pointer used to pass data to and from the stack

General-Purpose Registers

AX Accumulator Register mostly used for calculations and for input/output

BX Base Register Only register that can be used as an index

CX Count Register register used for the loop instruction

DX Data Register input/output and used by multiply and divide

Index Registers

SI Source Index used by string operations as source

DI Destination Index used by string operations as destination

(The general purpose registers can be "split". You have the AH and the AL register for example. AH contains the high byte of AX and AL contains the lowbyte. You also have: BH, BL, CH, CL, DL, DH So if eg. DX contains the value 1234h DH would be 12h and DL would be 34h).

And a 16-bit FLAG Register. All "flags" (see below) are stored here. The FLAGS Register consists of 9 status bits. These bits are also called flags, because they can either be SET (1) or NOT SET (0). All these flags have a name and purpose.

Flags Register
Abr. Name bit nº Description

OF Overflow Flag 11 indicates an overflow when set

DF Direction Flag 10 used for string operations to check direction

IF Interrupt Flag 9 if set, interrupt are enabled, else disabled

TF Trap Flag 8 if set, CPU can work in single step mode
SF Sign Flag 7 if set, resulting number of calculation is negative

ZF Zero Flag 6 if set, resulting number of calculation is zero

AF Auxiliary Carry 4 some sort of second carry flag

PF Parity Flag 2 indicates even or odd parity

CF Carry Flag 0 contains the left-most bit after calculations

Test it!
If you want to see all these register and flags, you can go to DOS and then start "debug" (just type debug) When you're in debug, just type "r" and you'll see all the registers and some abreviations for the flags. Type "q" to quit again. We won't use debug to program in this tutorial, we'll use a real assembler. I use TASM 3.2, but MASM or any other assembler works just fine too.

[Back] [Index]

Memory Segmentation

Now I've to explain something about the way the 8086 uses memory (actually about how DOS uses memory). Since the databus of the 8086 is 16-bits, it can move and store 16-bits (1 word=2 bytes) at a time. If the processor stores a "word" (16-bits) it stores the bytes in reverse order in the memory. It looks like this:
1234h (word) ---> memory 34h (byte) 12h (byte)

So if the memory looks like this: 78h 56h and you get a word from memory you'll get the value 5678h. (note, I use the "h" after a number to indicate it's hexadecimal) However, if you just get a byte from memory it goes this way: memory 78h 56h -----> first byte you get 78h. Okay, pretty clear huh?

Now let's talk about segments. The 8086 divides it's memory into segments. Segments are (standard in DOS) 64 KB big and have a number. These numbers are stored in the segment registers (see above). Three main segments are the code, data and stack segment. Segments overlap each other almost completely. If you start debug again and type "d" you can see some addresses at the left of the screen. The format is like this: 4576:0100. that's a memory address. The first number is the segment number and the second number is the offset within the segment. So FFFF:FFF0 means: Segment FFFFh and FFF0h bytes from the beginning of the segment.

As I said before, segments overlap. The address 0000:0010 is EXACTLY the same address as 0001:0000. That means that segment begin at paragraph boundaries. (a paragraph=16 bytes, so the segment starts at an address divisible by 16) Now you can start calculating REAL addresses in memory. An example: 0000:0010 means: segment 0000h offset 10h Now we multiply the segment number with 16 and add the offset.
Note that the offset 10h means the value 16 in decimal: (0 * 16 = 0 + 16 = 16) this is the linear address.

Next, the other address 0001:0000: (16 * 1 = 16 + 0 = 16). Same linear address! Like I told you. These are some basic things you need to know when you want to program in Assembly. Learn the registers and flags by heart and try to understand the segmentation of memory.

By The Way, this segmentation of memory is actually done by DOS at startup. On a 286 or higher, you have something called real-mode and protected-mode. This Segment explanation is based on Real-mode, in Protected-mode it's way different, but don't bother, that's real complicated stuff you don't need to know. Just assume that what I explained about segments is ALWAYS true. But remember in the back of your head, that there's more.... Trust me...... I know what I'm talking about.

[Back] [Index]

Our first program

Our first program will be a real simple one. I'll first give you the code and then I'll explain it. Here's the code, cut it out and put it in a file called FIRST.ASM. Download the source.


.model small
.stack
.data
message   db "Hello world, I'm learning Assembly !!!", "$"

.code

main   proc
   mov   ax,seg message
   mov   ds,ax

   mov   ah,09
   lea   dx,message
   int   21h

   mov   ax,4c00h
   int   21h
main   endp
end main

You can assemble this by typing: "tasm first [enter] tlink first [enter]" or something like: "masm first [enter] link first [enter] You must have an assembler and the link/tlink program. I'll explain the code now.

.model small : Lines that start with a "." are used to provide the assembler with infomation. The word(s) behind it say what kind of info. In this case it just tells the assembler the program is small and doesn't need a lot of memory. I'll get back on this later.
.stack : Another line with info. This one tells the assembler that the "stack" segment starts here. The stack is used to store temporary data. It isn't used in the program, but it must be there, because we make an .EXE file and these files MUST have a stack.
.data : indicates that the data segment starts here and that the stack segment ends there.
.code : indicates that the code segment starts there and the data segment ends there.

main proc : Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. main endp states that the procedure is finished. Procedures MUST have a start and end. end main : tells the assembler that the program is finished. It also tells the assembler were to start the program. At the procedure called main in this case.

message db "xxxx" : DB means Define Byte and so it does. In the data-segment it defines a couple of bytes. These bytes contain the information between the brackets. "Message" is a name to indentify this byte-string. It's called an "indentifier".
mov ax, seg message : AX is a register. You use registers all the time, so that's why you had to know about them before I could explain this. MOV is an instruction that moves data. It can have a few "operands" (don't worry, I'll explain these names later) Here the operands are AX and seg message. Seg message can be seen as a number. It's the number of the segment "message" is in (The data-segment) We have to know this number, so we can load the DS register with it. Else we can't get to the bit-string in memory. We need to know WHERE the bit-string is located in memory. The number is loaded in the AX register. MOV always moves data to the operand left of the comma and from the operand right of the comma.

mov ds,ax : The MOV instruction again. Here it moves the number in the AX register (the number of the data segment) into the DS register. We have to load this DS register this way (with two instructions) Just typing: "mov ds,segment message" isn't possible.

mov ah, 09 : MOV again. This time it load the AH register with the constant value nine.

lea dx, message : LEA Load Efective Address. This intructions stores the offset within the datasegment of the bit-string message into the DX register. This offset is the second thing we need to know, when we want to know where "message" is in the memory. So now we have DS:DX. See the segment explanation above.

int 21h : This instruction causes an Interrupt. The processor calls a routine somewhere in memory. 21h tells the processor what kind of routine, in this case a DOS routine. INT's are very important and I'll explain more of them later, since they're also very, very complex. However, for now assume that it just calls a procedure from DOS. The procedure looks at the AH register to find out out what it has to do. In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen.

mov ax, 4c00h : Load the Ax register with the constant value 4c00h
int 21h : The same INT again. But this time the AH register contains the value 4ch (AX=4c00h) and to the DOS procedure that means "exit program". The value of AL is used as an "exit-code" 00h means "No error"

That's it!!! You now fully understand this program (I hope)

[Back] [Index]

Go to DOS and type "debug first.exe". The debug screen will appear. When you are in the debugger, type "d". You see some addresses and our program.
Now type "u" you'll see a list that looks like this:

0F77:0000  B8790F          MOV AX,0F79
0F77:0003  8ED8            MOV DS,AX
0F77:0005  B409            MOV AH,09

First 0F77:0000 is the segment number and offset. B8790F is the machine code of the mov ax,0f79 instruction. B8 means "mov ax," and 790F is the number. (reversed order) Note that the instruction was: mov ax,seg message and the assembler made it mov ax,0f79 (number might be different at your computer) So that means our data is stored in the segment with the number 0F79.

The other instruction lea dx,message turned into mov dx,0. So that means that the offset of the bit-string is 0 --> 0F79:0000. Let's look at that address. Type "d 0f79:0000" and YES our data is there! Look at the right of the screen and you can see the message.
Now let's calculate another address for the data. 0F79:0000 substract 2 from the segment number. That would give you 0F77 (the code segment). 0002:0000 --> 2*16+0=32. Two segments further means 32 bytes further, and that means an offset of 32.

Diagram So at this location the data is: 0F77:0020. Check by typing "d 0f77:0020". Please note that it's the SAME data. We can see it at multiple addresses only because the segments overlap! But in the program we said the data had to be in a data-segment. Remember, the .data instruction? Well, it IS in a data-segment, the data is just stored directly behind the code, but that doesn't matter. I mean, we can address the data with a segment number and an offset of zero.

Also note, that after the int 21h instruction to end the program the data doesn't immediately start, first there some undefined bytes. (probably zero) That's because segments start at paragraph boundaries. The data-segment couldn't start at 0F77:0010 anymore, because there is code there, if there wasn't any code there, the data-segment would have been: 0F78. So the data-segment has to be 0F79 (closest match) and so, some bytes after the code and before the data just take up space. But that doesn't matter. Please remember that the assembler doesn't care how the segment are in the .ASM file. In this example we first declared the data-segment, but the assembler puts it last in memory.

[Back] [Index]

The Stack

The stack is a place where data is temporarily stored. The SS and SP registers point to that place like this: SS:SP So the SS register is the segment and the SP register contains the offset. There are a few instructions that make use of the stack. POP and PUSH are the most basic ones. PUSH can "push" a value on the stack and POP can retrieve that value from the stack. It works like this:


 MOV   AX,1234H
 PUSH  AX
 MOV   AH,09
 INT   21H
 POP   AX

The final value of AX will be 1234h. First we load 1234h into AX, then we push that value to the stack. We now store 9 in AH, so AX will be 0934h and execute an INT. Then we pop the AX register. We retreive the pushed value from the stack. So AX contains 1234h again. Another example:


 MOV   AX, 1234H
 MOV   BX, 5678H
 PUSH  AX
 POP   BX

The final values will be: AX=1234h BX=1234h. We pushed the AX to the stack and we popped that value in BX.

As in the first program, you have to define a stack segment. It is easy done by the instruction .stack that will create a stack of 1024 bytes. Yes, there's more about the stack than just this. The stack usses a LIFO system (Last In First Out) Another example:

 MOV   AX,1234H
 MOV   BX,5678H
 PUSH  AX
 PUSH  BX
 POP   AX
 POP   BX

The values: AX=5678h BX=1234h First the value 1234h was pushed after that the value 5678h was pushed to the stack. Acording to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next.

How does the stack look in memory? Well, it "grows" downwards in memory. When you push a word (2 bytes) for example, the word will be stored at SS:SP and SP will be decreased to times. So in the beginning SP points to the top of the stack and (if you don't pay attention) it can grow so big downwards in memory that it overwrites the source code. Major system crash is the result.

[Back] [Index]

Names

There are some names you need to know. Well, you don't HAVE to know them, but it's handy if you do. I'll use these names from now on, so better learn them.

Indentifiers An identifier is a name you aply to items in your program. the two types of indetifiers are "name", wich refers to the address of a data item, and "label", wich refers to the address of an instruction. The same rules aply to names and labels.

Statements A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" wich tell the assembler to perform a specific action, like ".model small"

Here's the general format of a statement:
indentifier - operation - operand(s) - comment

The identifier is the name as explained above.
The operation is an instruction like MOV.
The operands provide information for the Operation to act on. Like MOV (operation) AX,BX (operands).
The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.

So a complete instruction looks like this:

MOVINSTRUCTION:      MOV   AX,BX           ;this is a MOV instruction

The label and the comment are optional. In fact I allready explained directives , but, okay, I'll do it again. Directives provide the assembler with information on how to assemble a .ASM file. .MODEL SMALL, or .CODE are, for example, directives.

[Back] [Index]

And so we have come to the end of Section 1 of this tutorial. If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a "Level 0 Assembly Coder". Congratulations!

In Part 2 I'll explain some more instructions and I'll explain how to address data yourself.
(MOV BYTE PTR ES:[DI],AL)
I'll also explain the Interrupts and interrupt table.

Ferdi Smit

[Back] [Index] [Next Section]

Indentifiers	An identifier is a name you aply to items in your program. the two types of indetifiers are "name", wich refers to the address of a data item, and "label", wich refers to the address of an instruction. The same rules aply to names and labels.
Statements	A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" wich tell the assembler to perform a specific action, like ".model small"