This is the home page for PopAsm, the Popular Assembler project. Please fell free to send any comments. If this project is to be popular, your opinion is very important. We are all counting on you!
PopAsm stands for Popular Assembler
. It is an assembler (i.e. an assembly language
compiler) designed to meet the needs of as many assembly language programmers out there
as possible... This is NO easy mission, I know...
Popular?
Existing assemblers attempt to solve the same problem (assembly language programming) from a particular point of view. They handle issues such as compatibility with existing code, addition of new keywords and syntax&semantics in a variety of ways. Sometimes developers face problems their assemblers do not solve, at least not the way they should. Some of these problems are very simple, such as NASM lack of type checking; they are discussed next.
The following table compares PopAsm features with their counterparts in other assemblers.
Feature | TASM | NASM | PopAsm |
Free and open source | NO | YES | YES |
Portability | NO | YES | YES |
Heavy Arithmetics | NO | NO | YES |
Supports Pentium IV and AMD Athlon instructions | ??? | YES | YES |
MASM/TASM compatibility | YES | NO | YES |
NASM compatibility | NO | YES | YES |
The following paragraphs express my own opinion about other assemblers compared with PopAsm. I am sure many people will not agree with me in many points. Whenever this happens, please, post your opinion at the PopAsm Open Discution forum. Your opinion is important so that a better assembler can result. Also, it should be clear that this comparison is not intended to harm anyone involved in such projects.
Each section tackles specific topics about how assemblers should behave. Initially a short description of the issue is given, followed by TASM's point of view, then by NASM's point of view and finally PopAsm's approach. Each of them is written in a different color, for ease of reading.
All the three assemblers use square brackets for referencing memory. However, its absence introduces some ambiguity...
If one defines a variable, TASM refers to its contents by its name. It uses the OFFSET keyword to get the offset of a variable. Example:
FOO | DD | 1234h | ; Defines a double word named FOO, which holds 1234h. |
MOV | EAX,FOO | ; Moves 1234h to EAX | |
MOV | EAX,OFFSET FOO | ; EAX = FOO's offset |
Despite the incompatibility its behavior would cause, NASM simply abolished the OFFSET keyword. If one wants the contents of a variable to be referenced, he must use square brackets. For instance, the second line of the above example would behave as the third one, which would cause an assembly error, unless you told NASM preprocessor to ignore the keyword. Here goes the code written in NASM syntax:
FOO | DD | 1234h | ; Defines a double word named FOO, which holds 1234h. |
MOV | EAX,[FOO] | ; Moves 1234h to EAX | |
MOV | EAX,FOO | ; EAX = FOO's offset |
There seems to be no good reason for such change in the syntax; too little gain for major compatibilities headaches... NASM documentation cites a problem with TASM syntax, which is quoted below:
NASM was designed with simplicity of syntax in mind. One of the design goals of NASM is that it should be possible, as far as is practical, for the user to look at a single line of NASM code and tell what opcode is generated by it. You can't do this in MASM: if you declare, for example,foo equ 1 bar dw 2then the two lines of code
mov ax,foo mov ax,bargenerate completely different opcodes, despite having identical- looking syntaxes.
While true, the above citation does not mention its weakness. Let's write something similar and see what happens (in NASM syntax):
FOO | EQU | 1 | ; FOO = 1 |
BAR | DD | 2 | ; BAR holds value 2 at offset XXXX |
MOV | EAX,FOO | ||
MOV | EAX,BAR |
As can be seen, the last two lines look equal, but performs different operations. One loads a user-defined constant into EAX, other loads the offset of a variable into EAX. At machine level, both have the same encoding, but semantics differ for the final user.
PopAsm gets the best from each point of view. If the user wishes to make clear he is accessing memory, he may use square brackets. If he wishes to make clear he is refering to an offset, he may use the OFFSET keyword, despite not being obliged to. The only syntax left is just stating the symbol name, which can be used to refer to EQU constants at the developer's discretion.
Note, however, that because TASM and NASM give different meanings for
FOO | DD | 1 | ; FOO holds value 1 at offset XXXX |
MOV | EAX,FOO |
PopAsm must choose between one of them. The default is to behave like TASM in this
case, but because the final user always has the final word with PopAsm, he may change
this default anytime he wishes. One might be wondering why PopAsm will act like TASM
by default. This is due to coding justice
: TASM came first and established its
syntax, which became a de facto
standard. When NASM appeared, it simply did
things differently. There's no reason PopAsm should privilege NASM's syntax;
To finish this discussion about square brackets and OFFSET keyword, watch out for
NASM %idefine offset
pitfall: while this preprocessor directive may be helpful
in some situations, code such as:
FOO | DD | 1 | ; FOO holds value 1 at offset XXXX |
MOV | EAX,FOO |
will be treated differently in TASM and NASM, despite it will be assembled successfully in both. This may lead anyone migrating from TASM to NASM to a very difficult to find bug.
When the user declares his variables, he must use the appropriate instruction size to access it. For example, when declaring a variable using DB, all code refering to this variable must address it through BYTE operations, otherwise the memory occupied by adjacent variables will be overwritten.
TASM holds variables types and checks operand sizes. If a variable is a byte and the
user attempts to load a value larger than 255, assembly process will fail, unless he
uses an appropriate cast, such as WORD PTR
.
NASM does not store variable types. If you declare a dword, every time you refer to it YOU must remember the type and write the cast yourself. You better not mistake the type of your variables, otherwise you will probably take long to find the bug. The code generated this way is unnecessarelly long and redundant.
I respect all free software initiative, including NASM. However, some criticism is in order here. NASM design is poor in this point. PopAsm remembers variable types as not to leave the hard work to the programmer. However, if one defines his variables someway he may refer to it as if it were of other type as well. For instance, suppose someone assembles the following code using PopAsm:
FOO | DD | 1 | ; FOO is dword. Holds value 1 at offset XXXX |
BAR | DB | 2 | ; BAR is byte. Holds value 2 at offset YYYY |
MOV | [FOO],AL | ; Error: FOO is dword and AL is byte | |
MOV | [BAR],EAX | ; Error: BAR is byte and EAX is dword | |
MOV | [BAR],5 | ; BAR is byte and so should be 5 | |
MOV | DWORD [BAR],EAX | ; Hmmm... BAR is byte, but if the user wants this... |
As you can see, PopAsm protects the user from mistaking types, yet granting him feedom to do as he wishes (as in the last line, when he wrote a dword where only a byte was supposed to be).
Assemblers also differ about code aptimization. Some will not perform any. Others will do it even substituting instructions, while others take a more moderate approach.
TASM allows its users to choose whether optimizations are to be performed or not. In the case of conditional jumps, TASM may even decompose it to a pair of jumps, one conditional and the other near and unconditional. Operands that may be signed-extended by the processor are optimized as well.
NASM took the rough approach. It will never optimize your code, unless you explicitly say so. If someone wants a short jump, for example, he must write that himself. Later releases of NASM supports optimization levels via command-line options.
PopAsm behaves like TASM, but not because it came before NASM, but it is indeed the best approach: let the user decide whether he wants the optimizations or not, instead of making him type lots of redundant code. In a instruction set like the one of i386, where many instructions can be optimized, it is impractical to leave the hard job to the user. Unless turned off by the programmer, optimizations are performed unless the user states otherwise. In the following example, optimizations are supposed not to be turned off:
ADD | EAX,1 | ; Encoded as ADD EAX,BYTE 1 | |
ADD | EAX,DWORD 1 | ; User's option respected | |
JMP | RIGHT_HERE | ; Encoded as a short jump |
As you see, the first instruction is optimized, as it should (optimizations are on) The second instruction is not, because the user does not want that. The third instruction is also optimized.
Why are optimizations on by default? Simply because most people want their code to be optimized, instead of having it done manually. For those who need the precise encoding in any unusual situation, the inefficient form of the instruction can be used (see the second line of the example above). In short, the idea is to make the job of the greatest part of the users easier.
Some assemblers have directives that control the way code is generated. Classical examples are ASSUME, ORG, SEGMENT, etc. Some assemblers simply do not recognize them, while others force users to adopt them.
TASM has a lot of red tape, but it allows the user to work not using so much of it. For example, if you want to define your code segment in a default useful way, you can simply use .CODE instead of declaring it in the long form.
NASM does not support the greates part of the red tape, nor provides any form of compatibility. This may even lead the user to bugs and make the code less readable.
Again, PopAsm supports most existing code and allows you to choose between using red tape or not.