Embedded Systems - Assembly Language

Embedded Systems – Assembly Language

Assembly languages were developed to provide mnemonics or symbols for the machine level code instructions. Assembly language programs consist of mnemonics, thus they should be translated into machine code. A program that is responsible for this conversion is known as assembler. Assembly language is often termed as a low-level language because it directly works with the internal structure of the CPU. To program in assembly language, a programmer must know all the registers of the CPU.

Different programming languages such as C, C++, Java and various other languages are called high-level languages because they do not deal with the internal details of a CPU. In contrast, an assembler is used to translate an assembly language program into machine code (sometimes also called object code or opcode). Similarly, a compiler translates a high-level language into machine code. For example, to write a program in C language, one must use a C compiler to translate the program into machine language.

Structure of Assembly Language

An assembly language program is a series of statements, which are either assembly language instructions such as ADD and MOV, or statements called directives.

An instruction tells the CPU what to do, while a directive (also called pseudo-instructions) gives instruction to the assembler. For example, ADD and MOV instructions are commands which the CPU runs, while ORG and END are assembler directives. The assembler places the opcode to the memory location 0 when the ORG directive is used, while END indicates to the end of the source code. A program language instruction consists of the following four fields −

[ label: ]   mnemonics  [ operands ]   [;comment ] 

A square bracket ( [ ] ) indicates that the field is optional.

  • The label field allows the program to refer to a line of code by name. The label fields cannot exceed a certain number of characters.
  • The mnemonics and operands fields together perform the real work of the program and accomplish the tasks. Statements like ADD A , C & MOV C, #68 where ADD and MOV are the mnemonics, which produce opcodes ; “A, C” and “C, #68” are operands. These two fields could contain directives. Directives do not generate machine code and are used only by the assembler, whereas instructions are translated into machine code for the CPU to execute.
1.0000         ORG  0H            ;start (origin) at location 0 
2 0000 7D25    MOV  R5,#25H       ;load 25H into R5 
3.0002 7F34    MOV  R7,#34H       ;load 34H into  R7 
4.0004 7400    MOV  A,#0          ;load 0 into A 
5.0006 2D      ADD  A,R5          ;add contents of R5 to A 
6.0007 2F      ADD  A,R7          ;add contents of R7 to A
7.0008 2412    ADD  A,#12H        ;add to A value 12 H 
8.000A 80FE    HERE: SJMP HERE    ;stay in this loop 
9.000C END                        ;end of asm source file
  • The comment field begins with a semicolon which is a comment indicator.
  • Notice the Label “HERE” in the program. Any label which refers to an instruction should be followed by a colon.

Assembling and Running an 8051 Program

Here we will discuss about the basic form of an assembly language. The steps to create, assemble, and run an assembly language program are as follows −

  • First, we use an editor to type in a program similar to the above program. Editors like MS-DOS EDIT program that comes with all Microsoft operating systems can be used to create or edit a program. The Editor must be able to produce an ASCII file. The “asm” extension for the source file is used by an assembler in the next step.
  • The “asm” source file contains the program code created in Step 1. It is fed to an 8051 assembler. The assembler then converts the assembly language instructions into machine code instructions and produces an .obj file (object file) and a .lst file (list file). It is also called as a source file, that’s why some assemblers require that this file have the “src” extensions. The “lst” file is optional. It is very useful to the program because it lists all the opcodes and addresses as well as errors that the assemblers detected.
  • Assemblers require a third step called linking. The link program takes one or more object files and produces an absolute object file with the extension “abs”.
  • Next, the “abs” file is fed to a program called “OH” (object to hex converter), which creates a file with the extension “hex” that is ready to burn in to the ROM.

Steps to create program

Data Type

The 8051 microcontroller contains a single data type of 8-bits, and each register is also of 8-bits size. The programmer has to break down data larger than 8-bits (00 to FFH, or to 255 in decimal) so that it can be processed by the CPU.

DB (Define Byte)

The DB directive is the most widely used data directive in the assembler. It is used to define the 8-bit data. It can also be used to define decimal, binary, hex, or ASCII formats data. For decimal, the “D” after the decimal number is optional, but it is required for “B” (binary) and “Hl” (hexadecimal).

To indicate ASCII, simply place the characters in quotation marks (‘like this’). The assembler generates ASCII code for the numbers/characters automatically. The DB directive is the only directive that can be used to define ASCII strings larger than two characters; therefore, it should be used for all the ASCII data definitions. Some examples of DB are given below −

        ORG  500H 
DATA1:  DB   28                     ;DECIMAL (1C in hex) 
DATA2:  DB   00110101B              ;BINARY  (35 in hex) 
DATA3:  DB   39H                    ;HEX 
        ORG  510H 
DATA4:  DB   "2591"                 ;ASCII  NUMBERS 
        ORG  520H                         
DATA6:  DA   "MY NAME IS Michael"   ;ASCII CHARACTERS 

Either single or double quotes can be used around ASCII strings. DB is also used to allocate memory in byte-sized chunks.

Assembler Directives

Some of the directives of 8051 are as follows −

  • ORG (origin) − The origin directive is used to indicate the beginning of the address. It takes the numbers in hexa or decimal format. If H is provided after the number, the number is treated as hexa, otherwise decimal. The assembler converts the decimal number to hexa.
  • EQU (equate) − It is used to define a constant without occupying a memory location. EQU associates a constant value with a data label so that the label appears in the program, its constant value will be substituted for the label. While executing the instruction “MOV R3, #COUNT”, the register R3 will be loaded with the value 25 (notice the # sign). The advantage of using EQU is that the programmer can change it once and the assembler will change all of its occurrences; the programmer does not have to search the entire program.
  • END directive − It indicates the end of the source (asm) file. The END directive is the last line of the program; anything after the END directive is ignored by the assembler.

Labels in Assembly Language

All the labels in assembly language must follow the rules given below −

  • Each label name must be unique. The names used for labels in assembly language programming consist of alphabetic letters in both uppercase and lowercase, number 0 through 9, and special characters such as question mark (?), period (.), at the rate @, underscore (_), and dollar($).
  • The first character should be in alphabetical character; it cannot be a number.
  • Reserved words cannot be used as a label in the program. For example, ADD and MOV words are the reserved words, since they are instruction mnemonics.