Ready To Start Memory Segmentation Code Example The Stack The Naming Convention | First thing you need to know is that Assembly is a great, fast language, but only if you put time and effort in learning it. You must give all or nothing. (I suggest you give all) And remember, the beginning is always borring and hard...so don't give up ! |
Well, I'll start with the basics, like instruction format and some simple instructions to manipulate registers. I don't know how much you know about coding, so I'll explain even the most simple stuff. Please note, that I explain 8086 assembly coding. That means NO 32-bit registers and instructions and NO protected-, real- and virtual 86 mode for now.
First of all, we'll talk about the registers and then about the instructions to manipulate (change) them. The 8086 has 14 16-bit registers, all with different usage (see below). You might not understand some of the registers purposes, but be patient, I'll explain everything later.
| (The general purpose registers can be "split". You have the AH and the AL register for example. AH contains the high byte of AX and AL contains the lowbyte. You also have: BH, BL, CH, CL, DL, DH So if eg. DX contains the value 1234h DH would be 12h and DL would be 34h). |
And a 16-bit FLAG Register. All "flags" (see below) are stored here. The FLAGS Register consists of 9 status bits. These bits are also called flags, because they can either be SET (1) or NOT SET (0). All these flags have a name and purpose.
|
Test it!
If you want to see all these register and flags, you can go to DOS and then
start "debug" (just type debug) When you're in debug, just type "r" and
you'll see all the registers and some abreviations for the flags. Type "q"
to quit again. We won't use debug to program in this tutorial, we'll use
a real assembler. I use TASM 3.2, but MASM or any other assembler works just
fine too. |
So if the memory looks like this: 78h 56h and you get a word from memory you'll get the value 5678h. (note, I use the "h" after a number to indicate it's hexadecimal) However, if you just get a byte from memory it goes this way: memory 78h 56h -----> first byte you get 78h. Okay, pretty clear huh?
Now let's talk about segments. The 8086 divides it's memory into segments. Segments are (standard in DOS) 64 KB big and have a number. These numbers are stored in the segment registers (see above). Three main segments are the code, data and stack segment. Segments overlap each other almost completely. If you start debug again and type "d" you can see some addresses at the left of the screen. The format is like this: 4576:0100. that's a memory address. The first number is the segment number and the second number is the offset within the segment. So FFFF:FFF0 means: Segment FFFFh and FFF0h bytes from the beginning of the segment.
As I said before, segments overlap. The address 0000:0010 is EXACTLY the same
address as 0001:0000. That means that segment begin at paragraph boundaries.
(a paragraph=16 bytes, so the segment starts at an address divisible by 16)
Now you can start calculating REAL addresses in memory. An example:
0000:0010 means: segment 0000h offset 10h
Now we multiply the segment number with 16 and add the offset.
Note that the offset 10h means the value 16 in decimal:
Next, the other
address 0001:0000:
By The Way, this segmentation of memory is actually done by DOS at startup. On a 286 or higher, you have something called real-mode and protected-mode. This Segment explanation is based on Real-mode, in Protected-mode it's way different, but don't bother, that's real complicated stuff you don't need to know. Just assume that what I explained about segments is ALWAYS true. But remember in the back of your head, that there's more.... Trust me...... I know what I'm talking about.
.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main |
You can assemble this by typing: "tasm first [enter] tlink first [enter]" or something like: "masm first [enter] link first [enter] You must have an assembler and the link/tlink program. I'll explain the code now.
.model small : Lines that start with a "." are used to provide the assembler
with infomation. The word(s) behind it say what kind of info. In this case it
just tells the assembler the program is small and doesn't need a lot of
memory. I'll get back on this later.
.stack : Another line with info. This one tells the assembler that the "stack"
segment starts here. The stack is used to store temporary data. It isn't used
in the program, but it must be there, because we make an .EXE file and these
files MUST have a stack.
.data : indicates that the data segment starts here and that the stack segment
ends there.
.code : indicates that the code segment starts there and the data segment ends
there.
main proc : Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. main endp states that the procedure is finished. Procedures MUST have a start and end. end main : tells the assembler that the program is finished. It also tells the assembler were to start the program. At the procedure called main in this case.
message db "xxxx" : DB means Define Byte and so it does. In the data-segment
it defines a couple of bytes. These bytes contain the information between
the brackets. "Message" is a name to indentify this byte-string. It's called
an "indentifier".
mov ax, seg message : AX is a register. You use registers all the time, so
that's why you had to know about them before I could explain this.
MOV is an instruction that moves data. It can have a few "operands" (don't
worry, I'll explain these names later) Here the operands are AX and seg
message. Seg message can be seen as a number. It's the number of the
segment "message" is in (The data-segment) We have to know this number, so
we can load the DS register with it. Else we can't get to the bit-string in
memory. We need to know WHERE the bit-string is located in memory.
The number is loaded in the AX register. MOV always moves data to the operand
left of the comma and from the operand right of the comma.
mov ds,ax : The MOV instruction again. Here it moves the number in the AX register (the number of the data segment) into the DS register. We have to load this DS register this way (with two instructions) Just typing: "mov ds,segment message" isn't possible.
mov ah, 09 : MOV again. This time it load the AH register with the constant value nine.
lea dx, message : LEA Load Efective Address. This intructions stores the offset within the datasegment of the bit-string message into the DX register. This offset is the second thing we need to know, when we want to know where "message" is in the memory. So now we have DS:DX. See the segment explanation above.
int 21h : This instruction causes an Interrupt. The processor calls a routine somewhere in memory. 21h tells the processor what kind of routine, in this case a DOS routine. INT's are very important and I'll explain more of them later, since they're also very, very complex. However, for now assume that it just calls a procedure from DOS. The procedure looks at the AH register to find out out what it has to do. In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen.
mov ax, 4c00h : Load the Ax register with the constant value 4c00h
int 21h : The same INT again. But this time the AH register contains the value
4ch (AX=4c00h) and to the DOS procedure that means "exit program". The value
of AL is used as an "exit-code" 00h means "No error"
That's it!!! You now fully understand this program (I hope)
0F77:0000 B8790F MOV AX,0F79 0F77:0003 8ED8 MOV DS,AX 0F77:0005 B409 MOV AH,09First 0F77:0000 is the segment number and offset. B8790F is the machine code of the mov ax,0f79 instruction. B8 means "mov ax," and 790F is the number. (reversed order) Note that the instruction was:
The other instruction
Now let's calculate another address for the data. 0F79:0000 substract 2 from the segment
number. That would give you 0F77 (the code segment). 0002:0000 -->
2*16+0=32. Two segments further means 32 bytes further, and that means an
offset of 32.
So at this location the data is: 0F77:0020. Check by typing "d 0f77:0020". Please note that it's the SAME data. We can see it at multiple addresses only because the segments overlap! But in the program we said the data had to be in a data-segment. Remember, the .data instruction? Well, it IS in a data-segment, the data is just stored directly behind the code, but that doesn't matter. I mean, we can address the data with a segment number and an offset of zero.
Also note, that after the int 21h instruction to end the program the data doesn't immediately start, first there some undefined bytes. (probably zero) That's because segments start at paragraph boundaries. The data-segment couldn't start at 0F77:0010 anymore, because there is code there, if there wasn't any code there, the data-segment would have been: 0F78. So the data-segment has to be 0F79 (closest match) and so, some bytes after the code and before the data just take up space. But that doesn't matter. Please remember that the assembler doesn't care how the segment are in the .ASM file. In this example we first declared the data-segment, but the assembler puts it last in memory.
| The final value of AX will be 1234h. First we load 1234h into AX, then we push that value to the stack. We now store 9 in AH, so AX will be 0934h and execute an INT. Then we pop the AX register. We retreive the pushed value from the stack. So AX contains 1234h again. Another example: |
MOV AX, 1234H |
The final values will be: |
MOV AX,1234H MOV BX,5678H PUSH AX PUSH BX POP AX POP BX | The values: AX=5678h BX=1234h First the value 1234h was pushed after that the value 5678h was pushed to the stack. Acording to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next. |
Indentifiers | An identifier is a name you aply to items in your program. the two types of indetifiers are "name", wich refers to the address of a data item, and "label", wich refers to the address of an instruction. The same rules aply to names and labels. |
Statements | A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" wich tell the assembler to perform a specific action, like ".model small" |
Here's the general format of a statement:
indentifier - operation - operand(s) - comment
The identifier is the name as explained above.
The operation is an
instruction like MOV.
The operands provide information for the Operation to
act on. Like MOV (operation) AX,BX (operands).
The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.
So a complete instruction looks like this:
MOVINSTRUCTION: MOV AX,BX ;this is a MOV instruction
The label and the comment are optional. In fact I allready explained directives , but, okay, I'll do it again. Directives provide the assembler with information on how to assemble a .ASM file. .MODEL SMALL, or .CODE are, for example, directives.
And so we have come to the end of Section 1 of this tutorial. If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a "Level 0 Assembly Coder". Congratulations!
In Part 2 I'll explain some more instructions and I'll explain how to address data yourself.
I'll also explain the Interrupts and interrupt table.