PopAsm, the Popular Assembler Project

This is the home page for PopAsm, the Popular Assembler project. Please fell free to send any comments. If this project is to be popular, your opinion is very important. We are all counting on you!

What is PopAsm?

PopAsm stands for Popular Assembler. It is an assembler (i.e. an assembly language compiler) designed to meet the needs of as many assembly language programmers out there as possible... This is NO easy mission, I know...

Why Popular?

Existing assemblers attempt to solve the same problem (assembly language programming) from a particular point of view. They handle issues such as compatibility with existing code, addition of new keywords and syntax&semantics in a variety of ways. Sometimes developers face problems their assemblers do not solve, at least not the way they should. Some of these problems are very simple, such as NASM lack of type checking; they are discussed next.

PopAsm vs. Other Assemblers

The following table compares PopAsm features with their counterparts in other assemblers.

Feature TASM NASM PopAsm
Free and open source NO YES YES
Portability NO YES YES
Heavy Arithmetics NO NO YES
Supports Pentium IV and AMD Athlon instructions ??? YES YES
MASM/TASM compatibility YES NO YES
NASM compatibility NO YES YES
Portability
Allows the assembler to be run under several systems. TASM is DOS/Windows-only, while NASM ports to a variety of platforms. Although PopAsm has been developed in pure ANSI C++, it has been tested under Windows and Linux only.
Heavy Arithmetics
Is the ability of performing mathematical operations on arbitrarily large numbers at assembly-time. TASM has limited arithmetic capabilities, NASM can only perform 32-bit integer arithmetics, while PopAsm has full support for heavy arithmetics on both integer and non-integer numbers. It should also be stressed that PopAsm internally stores all numbers in fraction format (i.e. a pair consisting of numerator / denominator). Only when the number must be definitively stored in memory there may be some precision loss inherent to the floating point binary representation.
MASM/TASM compatibility
Makes it possible to compile MASM/TASM legacy source code. NASM does not support most MASM directives; PopAsm implements only the most used ones, according to the users' feature requests.

Design Issues

The following paragraphs express my own opinion about other assemblers compared with PopAsm. I am sure many people will not agree with me in many points. Whenever this happens, please, post your opinion at the PopAsm Open Discution forum. Your opinion is important so that a better assembler can result. Also, it should be clear that this comparison is not intended to harm anyone involved in such projects.

Each section tackles specific topics about how assemblers should behave. Initially a short description of the issue is given, followed by TASM's point of view, then by NASM's point of view and finally PopAsm's approach. Each of them is written in a different color, for ease of reading.

Semantics of square brackets

All the three assemblers use square brackets for referencing memory. However, its absence introduces some ambiguity...

If one defines a variable, TASM refers to its contents by its name. It uses the OFFSET keyword to get the offset of a variable. Example:

FOO DD 1234h; Defines a double word named FOO, which holds 1234h.
MOVEAX,FOO; Moves 1234h to EAX
MOVEAX,OFFSET FOO ; EAX = FOO's offset

Despite the incompatibility its behavior would cause, NASM simply abolished the OFFSET keyword. If one wants the contents of a variable to be referenced, he must use square brackets. For instance, the second line of the above example would behave as the third one, which would cause an assembly error, unless you told NASM preprocessor to ignore the keyword. Here goes the code written in NASM syntax:

FOO DD 1234h; Defines a double word named FOO, which holds 1234h.
MOVEAX,[FOO]; Moves 1234h to EAX
MOVEAX,FOO ; EAX = FOO's offset

There seems to be no good reason for such change in the syntax; too little gain for major compatibilities headaches... NASM documentation cites a problem with TASM syntax, which is quoted below:

NASM was designed with simplicity of syntax in mind. One of the design goals of NASM is that it should be possible, as far as is practical, for the user to look at a single line of NASM code and tell what opcode is generated by it. You can't do this in MASM: if you declare, for example,
						foo       equ 1 
						bar       dw 2
					
then the two lines of code
						mov ax,foo 
						mov ax,bar
					
generate completely different opcodes, despite having identical- looking syntaxes.

While true, the above citation does not mention its weakness. Let's write something similar and see what happens (in NASM syntax):

FOO EQU1; FOO = 1
BAR DD 2; BAR holds value 2 at offset XXXX
MOVEAX,FOO
MOVEAX,BAR

As can be seen, the last two lines look equal, but performs different operations. One loads a user-defined constant into EAX, other loads the offset of a variable into EAX. At machine level, both have the same encoding, but semantics differ for the final user.

PopAsm gets the best from each point of view. If the user wishes to make clear he is accessing memory, he may use square brackets. If he wishes to make clear he is refering to an offset, he may use the OFFSET keyword, despite not being obliged to. The only syntax left is just stating the symbol name, which can be used to refer to EQU constants at the developer's discretion.

Note, however, that because TASM and NASM give different meanings for

FOO DD 1; FOO holds value 1 at offset XXXX
MOVEAX,FOO

PopAsm must choose between one of them. The default is to behave like TASM in this case, but because the final user always has the final word with PopAsm, he may change this default anytime he wishes. One might be wondering why PopAsm will act like TASM by default. This is due to coding justice: TASM came first and established its syntax, which became a de facto standard. When NASM appeared, it simply did things differently. There's no reason PopAsm should privilege NASM's syntax;

To finish this discussion about square brackets and OFFSET keyword, watch out for NASM %idefine offset pitfall: while this preprocessor directive may be helpful in some situations, code such as:

FOO DD 1; FOO holds value 1 at offset XXXX
MOVEAX,FOO

will be treated differently in TASM and NASM, despite it will be assembled successfully in both. This may lead anyone migrating from TASM to NASM to a very difficult to find bug.

Variable types

When the user declares his variables, he must use the appropriate instruction size to access it. For example, when declaring a variable using DB, all code refering to this variable must address it through BYTE operations, otherwise the memory occupied by adjacent variables will be overwritten.

TASM holds variables types and checks operand sizes. If a variable is a byte and the user attempts to load a value larger than 255, assembly process will fail, unless he uses an appropriate cast, such as WORD PTR.

NASM does not store variable types. If you declare a dword, every time you refer to it YOU must remember the type and write the cast yourself. You better not mistake the type of your variables, otherwise you will probably take long to find the bug. The code generated this way is unnecessarelly long and redundant.

I respect all free software initiative, including NASM. However, some criticism is in order here. NASM design is poor in this point. PopAsm remembers variable types as not to leave the hard work to the programmer. However, if one defines his variables someway he may refer to it as if it were of other type as well. For instance, suppose someone assembles the following code using PopAsm:

FOO DD 1; FOO is dword. Holds value 1 at offset XXXX
BAR DB 2; BAR is byte. Holds value 2 at offset YYYY
MOV[FOO],AL; Error: FOO is dword and AL is byte
MOV[BAR],EAX; Error: BAR is byte and EAX is dword
MOV[BAR],5; BAR is byte and so should be 5
MOVDWORD [BAR],EAX; Hmmm... BAR is byte, but if the user wants this...

As you can see, PopAsm protects the user from mistaking types, yet granting him feedom to do as he wishes (as in the last line, when he wrote a dword where only a byte was supposed to be).

Encoding optimization

Assemblers also differ about code aptimization. Some will not perform any. Others will do it even substituting instructions, while others take a more moderate approach.

TASM allows its users to choose whether optimizations are to be performed or not. In the case of conditional jumps, TASM may even decompose it to a pair of jumps, one conditional and the other near and unconditional. Operands that may be signed-extended by the processor are optimized as well.

NASM took the rough approach. It will never optimize your code, unless you explicitly say so. If someone wants a short jump, for example, he must write that himself. Later releases of NASM supports optimization levels via command-line options.

PopAsm behaves like TASM, but not because it came before NASM, but it is indeed the best approach: let the user decide whether he wants the optimizations or not, instead of making him type lots of redundant code. In a instruction set like the one of i386, where many instructions can be optimized, it is impractical to leave the hard job to the user. Unless turned off by the programmer, optimizations are performed unless the user states otherwise. In the following example, optimizations are supposed not to be turned off:

ADD EAX,1; Encoded as ADD EAX,BYTE 1
ADD EAX,DWORD 1; User's option respected
JMP RIGHT_HERE; Encoded as a short jump

As you see, the first instruction is optimized, as it should (optimizations are on) The second instruction is not, because the user does not want that. The third instruction is also optimized.

Why are optimizations on by default? Simply because most people want their code to be optimized, instead of having it done manually. For those who need the precise encoding in any unusual situation, the inefficient form of the instruction can be used (see the second line of the example above). In short, the idea is to make the job of the greatest part of the users easier.

Red tape and other issues

Some assemblers have directives that control the way code is generated. Classical examples are ASSUME, ORG, SEGMENT, etc. Some assemblers simply do not recognize them, while others force users to adopt them.

TASM has a lot of red tape, but it allows the user to work not using so much of it. For example, if you want to define your code segment in a default useful way, you can simply use .CODE instead of declaring it in the long form.

NASM does not support the greates part of the red tape, nor provides any form of compatibility. This may even lead the user to bugs and make the code less readable.

Again, PopAsm supports most existing code and allows you to choose between using red tape or not.