This is a basic FAQ that I hope will answer some basic questions about the PICEMU source. As I recieve questions about the source, I will add them to this file to create a more complete sourcecode FAQ. Questions, comments, compliments, complaints, and whatnot should be sent to picemulator@yahoo.com I will be keeping the latest version of the source code and this FAQ at www.picemulator.com and would appreciate a copy of any changes you make. You will be given full credit for your changes. The first question people will ask is probably: "Why the H@LL did he use such a Piece-of-SH&T compiler?" There are actually a couple of reasons for this: 1) I've been using this compiler since 1985. 2) I've written many, many megabytes of code using it, including an 80x86 interpreter. Thus, I had lots of code to draw upon. 3) I can write a "quick and dirty" program in it so fast it doesn't pay to think about using another compiler. And since PICEMU wasn't going to be anything like the program it has become, I didn't think about using anything else. If you don't have it yet, you can download the compiler from www.picemulator.com The next question will probably be: "Who the H@LL still uses DOS?" I do, for one. I'm an "old fogie", who started writing for DOS before Microsoft ever got it's hands on it (it was 86-DOS from Seattle Computer Products, for their S-100 boardset). Since then, I've done things like write anti-virus software (up to and including scanning / disinfecting Word and Excel files under DOS -- a very interesting headache), and I'm now doing embedded software. I find that Windows can be a useful environment for casual computer use, but for the type of software I write it just gets in the way. It's also much easier to access hardware (serial, keyboard, video) than via Windows calls. "What kind of C code is that?" It's K&R C code. This was the "standard" before ANSI-C. There are two noticable differences: 1) In K&R C, passed parameters are of the form: int foo(str,len) char *str; int len; { } versus the ANSI-C form of: int foo(char *str, int len) { } 2) K&R C isn't as picky about function prototypes. In fact, K&R function prototypes don't need a parameter list, just the function type. So, K&R C function prototypes would look like: int foo(); void bar(); long lseek(); versus the ANSI-C prototypes of: int foo(char *, int); void bar(void); long lseek(FILE, long, int); "Why do you use all those globals instead of locals and passed parameters?" There are a couple of reasons for this: 1) Globals are faster: The more that has to be passed on the stack, the more setup per function call. This produces larger and slower code. 2) It's easier to make additions to the code. If I find that I need to call yet another function about 3 levels down, I don't have to worry about making sure that all necessary variables are being passed to all levels. I just access the variable. 3) Less data indirection (variables that need to be changed inside a function) means fewer pointers, less chance of changing the wrong data (i.e. pointers out of bounds, etc), and more stable code. On the down side, I find I keep doing ugly things like re-using a global that I know isn't currently in use rather than create a new variable. I hope you won't start dreaming of hunting me down and making me pay for my bad habits... "What is this BYTE, WORD, and DWORD stuff?" Being an old assembly-language programmer, I find it natural to think in terms of basic storage types -- BYTEs are 8-bit unsigned variables, WORDs are 16-bit unsigned variables, and DWORDs are 32bit unsigned variables. So, I use typedef to declare my favorite storage types. It's a lot easier to type "BYTE" instead of "unsigned char", and actually makes the code more transportable (since, for example, an "int" may be 16 or 32 (or even 64) bits, depending on the system processor, I only need change the typedef to keep WORD a 16-bit quantity). Interestingly, I rarely find a need for a signed number. Though I do often use the shorthand "-1" for "0xffff". "How do you keep track of all the variables and/or code?" Unfortunately, that comes from writing it. It has been said that every large, complex system that works has grown from a small, simple system that works, and I find that code is no different. Since PICEMU has thoroughly outgrown it's original design, it's become fairly complex, and needs a fair amount of housecleaning. Dig in! "What's with all these ALIGNxxx variables?" This is a nod to the architecture of the CPU that PICEMU is running on. On a modern (Pentium or better) CPU, data is fetched/stored in 32-bit chunks. So, if a 16-bit variable is aligned so that the lower 8-bits are in one 32-bit chunk, and the high 8-bits are in the next 32-bit chunk, the CPU will have to do 2 32-bit reads to fetch it from memory, and 2 32-bit writes to save it. This slows down the program a LOT. So, variables that are frequently accessed (the ones used in the code interpreter loop) are best aligned so that they do not cross 32-bit boundaries. Rather than look at the addresses of all variables and make sure that BYTEs (8-bit variables), WORDs (16-bit variables), and DWORDs (32-bit variables) all "add up" so that nothing crosses a 32-bit boundary, I've added the ALIGNxx variables, so that a variable and it's associated ALIGNxxx variables always take up 32-bits. Yes, this wastes data space, but I've got enough to spare, and it makes it possible to automatically adjust data alignment via the PALIGN.EXE program (see below). "What are these .TIM files?" This is another nod to the architecture of the CPU that PICEMU is running on. As mentioned above, data elements are faster to access when they do not span 32-bit boundaries. The same is true for code. If an instruction spans a 32-bit boundary, it takes 2 memory cycles to fetch, instead of just one. However, it's not as easy to figure code alignment as it is data alignment. I can't just throw in NOPs after most instructions to make sure that the following instruction doesn't cross a 32-bit boundary. What can be done, however, is to add NOPs in places where they are not in the code flow (and so are not being continually executed, taking up time), but do affect the alignment of the code following it. For 32-bit alignment, adding 0 to 3 NOPs (0 to 3 bytes == 0 to 24 bits, more NOPs will just give us the same alignment MOD 32) will move the code below it. While not all instructions can be aligned, moving it around will cause more or less instructions to be aligned. The more aligned instructions, the faster the code. The PALIGN.EXE program can be told to repeatedly compile the code with different alignments, test the speed, and find the best set of alignments. The results are saved in the .TIM files. Of particular interest is the end of the .TIM file, which summarizes the alignment info as compiler commands (and command-line #define's) needed to optimize the code. You can edit it down to those last few lines and use it as a .BAT file to create an aligned, optimized program. Unfortunately, the next change will mess it all up, and so I don't bother with code alignment until I'm doing a release. See the discussion of PALIGN below for more information. "What is a 'small memory model'?" It's a relic of the early days of personal computers, Intel's decision to make the 8086 somewhat compatible with earlier generations of Intel processors, and lots of other stuff. If you're running a DOS program, you're still running code that is backwards compatible to the original IBM-PC (more or less...). As a result, you're using 16-bit registers, so you have a 16-bit address space to play with. And a 16-bit address space is 64K (65536) bytes. But the PC can address more memory than that -- it has (for DOS mode) up to 640K. To address more than 64K, Intel used "segment registers", which tell the CPU which 64K "segment" the 16-bit address register was referring to. There is a single code segment register, and a couple of data segment registers. How they are used determines the "memory model" in use. Tiny memory model Both code and data segments point to the same 64K segment. Also known as the "COM" memory model. Small memory model Code and Data each has it's own 64K segment, so a program can have up to 64K of code and 64K of data. Medium memory model Code has a single 64K segment, and there are multiple 64K data segments. Compact memory model Data has a single 64K segment, and there are multiple 64K code segments. Large memory model Both code and data have multiple 64K segments. Huge memory model Both code and data have multiple 64K segments, and all pointers / memory references are treated as "far" (have both a segment and an offset). The compiler used for PICEMU only supports the small memory model (what do you expect from 1985 technology?), so there is a limit of 64K of code and 64K of data. While this is tiny (sorry to use the word again) from the perspective of Windows programs, you can really do a lot in that much space. "Why do I need a math coprocessor?" I use the 80x87 floating point libraries, rather than the emulated libraries, since every modern chip (basically starting with the Pentium) has a built-in math coprocessor. This saves on code size, and is actually used in execution time calculations. "Why the separate programs for each processor?" The earliest PICEMU code had both 12 and 14 bit cores in the same program. Unfortunately, things kept growing larger and more complex, to the point where I thought that it would be best to separate them. Some traces of this early vintage still exists, however, in function names like void controls_12bit() and void controls_14bit() and the names of the the 12C509A emulator (P12) and the 16F877 emulator (P14). The 16F84A and 16F628 emulators are later additions, started by "cloning" P14, and modifying them into the individual processors. I'm trying to make the difference between P14, P84, and P628 just the #define's from PICEMU.H, but they aren't quite there yet. Feel free to continue the process. "What is this 'xlate_regs[]', 'readregs[]', and 'writeregs[]' business?" The PIC processors have quite a bit of hardware associated with their registers ('files' in Microchip speak). Reading and/or writing the registers will fetch data from hardware or write data to hardware, and can change the state of the processor (like writing to PCL will change the instruction pointer and flush the prefetch queue, although reading PCL doesn't affect any hardware). As a result, reading from and writing to registers must be active code, rather than just accessing a data array. Hence, the 'readregs' and 'writeregs' arrays of pointers to functions. In addition, some registers are also common to all regsiter banks (file banks). The 'xlate_regs' array makes sure that all such registers point to a common register, so that there is no problem with keeping separate copies of registers (one in each bank) in sync. "Why are there 2 instruction execution engines?" I wanted PICEMU to be fast (the slow speed of MPLAB's simulator was a major incentive to write PICEMU). I also wanted it to be very flexible. These aren't always compatible. Each little bit of processing in the instruction execution loop, occuring millions of times a second, adds up a LOT. The "Go" command can use breakpoints, but doesn't care about the exact number of cycles that have executed, and uses the "godoit()" engine. The "Execute" command (the "execute()" engine) not only uses breakpoints, you can tell it how many instruction cycles to execute before stopping. By not having the extra "how many instructions have executed" code in the "Go" command, it's about 10% faster. (Throttle is also part of the "Execute" command, and also slows it down a little). But, there is a downside to this -- if I find I have to make a change to the execution engine, I have to make sure the same change is made to BOTH execution engines. If you're willing to pay the time penalty, there is no need to keep 2 instruction execution engines in PICEMU. The "execute()" engine, with breakcount[] set to 0xffffffffffffffff (two DWORDs (32-bit numbers) team up to form a 64-bit number), will serve as the engine for "Go", too. (You won't live long enough to expire the 64-bit breakcount counter). "update_port_display() is doing more than just updating the screen!" Yup, it is. When I started emulating I/O stuff (Software Uarts, I2C EEPROM), it was a convenient central location, since it was called for each change in I/O state. But inertia has set in, and the name was never changed to reflect it's new functionality. Feel free to dig in. "Can you tell me what all the souce files do?" Sure. There are a couple of files common to all PICEMU programs, and files particular to each different PICEMU. PICTYPE.H has the same name, but different contents, for each PICEMU. The MAKEPxx.BAT file copies the PIC type file (12C509A.H, 16F877.H, 16F84A.H, or 16F628.H) to PICTYPE.H, for use by PICEMU.H PICEMU.H is the common include file, and sets up all the necessary #define's and delcarations. The differences between processors are setup based on the contents of PICTYPE.H PSHELL.C and PSHELL2.C are the "user interface" part of PICEMU. This is the code that handles user commands and PICEMU output. This keeps different flavors of PICEMU consistant. This was originally a single file that grew too big and had to be split. P12 is the 12C509A emulator. It was written as the basic 12-bit core program, hence it's name. In addition to the common files above, it consists of: 12C509A.H This just defines the processor type. P12.C This contains the individual instruction emulation functions, the instruction execution engines, the disassembly functions, the interrupt handlers (INT 9 keyboard, INT 8 time), and the I/O functions (port screen update, watch registers, OSCOPE, etc). This also contains the main() function and most variable declarations. P12REGS.C This contains the register read/write functions and xlate_regs[] table. It also contains fnkey routines, called by the INT 9 handler (keyboard interrupt), to toggle I/O pin values. P12ASM.C The assembler for the 12-bit core. P12REGDE.H Register and bit names (P12 Register Defines) MAKEP12.BAT A "make" file to compile and link P12. It also uses PALIGN (see below). P14 is the 16F877 emulator. It was written as the basic 14-bit core program, hence it's name. In addition to the common files above, it consists of: 16F877.H This just defines the processor type. P14.C This contains the individual instruction emulation functions and the instruction execution engines. This also contains the main() function and most variable declarations. P14_2.C This contains the disassembly functions, the interrupt handlers (INT 9 keyboard, INT 8 time, Serial port interrupt), and the I/O functions (port screen update, watch registers, OSCOPE, etc). This was split from P14.C when that file grew too big. P14REGS.C This contains the register read/write functions and xlate_regs[] table. It also contains fnkey routines, called by the INT 9 handler (keyboard interrupt), to toggle I/O pin values. P14ASM.C The assembler for the 14-bit core. P14REGDE.H Register and bit names (P14 Register Defines) MAKEP14.BAT A "make" file to compile and link P14. It also uses PALIGN (see below). Note that changes to the 14-bit core need to be added to the P84 and P628 programs, too. P84 is the 16F84A emulator. It was cloned from the basic 14-bit core (P14), and is maintained in parallel with P14 and P628 (changes to one need to be added to the others, even if they are "removed" by an #ifdef). In addition to the common files above, it consists of: 16F84A.H This just defines the processor type. P84.C This contains the individual instruction emulation functions and the instruction execution engines. This also contains the main() function and most variable declarations. P84_2.C This contains the disassembly functions, the interrupt handlers (INT 9 keyboard, INT 8 time), and the I/O functions (port screen update, watch registers, OSCOPE, etc). This was split from P84.C when that file grew too big. P84REGS.C This contains the register read/write functions and xlate_regs[] table. It also contains fnkey routines, called by the INT 9 handler (keyboard interrupt), to toggle I/O pin values. P14ASM.C The assembler for the 14-bit core. (Yes, this file is used "as-is" between all 14-bit core emulators). P84REGDE.H Register and bit names (P84 Register Defines) MAKEP84.BAT A "make" file to compile and link P84. It also uses PALIGN (see below). P628 is the 16F628 emulator. It was cloned from the basic 14-bit core (P14), and is maintained in parallel with P14 and P84 (changes to one need to be added to the others, even if they are "removed" by an #ifdef). In addition to the common files above, it consists of: 16F628.H This just defines the processor type. P628.C This contains the individual instruction emulation functions and the instruction execution engines. This also contains the main() function and most variable declarations. P628_2.C This contains the disassembly functions, the interrupt handlers (INT 9 keyboard, INT 8 time, Serial port interrupt), and the I/O functions (port screen update, watch registers, OSCOPE, etc). This was split from P628.C when that file grew too big. P628REGS.C This contains the register read/write functions and xlate_regs[] table. It also contains fnkey routines, called by the INT 9 handler (keyboard interrupt), to toggle I/O pin values. P14ASM.C The assembler for the 14-bit core. (Yes, this file is used "as-is" between all 14-bit core emulators). P628REGD.H Register and bit names (P628 Register Defines) MAKEP628.BAT A "make" file to compile and link P628. It also uses PALIGN (see below). "What is this 'magic' PALIGN program?" It's not magic. PICEMU was designed so that all the "important" variables (the ones used millions of times a second in the interpreter loop) are DWORD aligned for speed. PALIGN takes the name of the PICEMU program (P12, P14, P84, and P628) and uses that to find the corresponding link map (named Pxx.OPT, created in the MAKEPxx.BAT file). It looks for specific variables, and based on their alignment re-compiles specific files with command-line #define's to add variables to fix alignment. Then, it re-links the files (if you want to look at the final, optimized link map, it is Pxx.MAP). This is built into the MAKEPxx.BAT files. Since the data is nicely aligned, the program runs faster. It can be done by hand, but PALIGN just automates this tedious process, which is what computers are for. It can also do code alignment for maximum speed, if you add the -T option to the command line. However, some additional assumptions have to be made. First is that you can actually tell the difference in execution speed. If you're running under Windows, even in a full-screen DOS box, the underlying Windows processes can take enough time so that you can't tell the difference between different code alignments. So, I boot from a DOS floppy and do my compiles and testing from a RAM disk. Next is that there is a standard program to execute. PALIGN expects to find P12.COD, P14.COD, P84.COD, and P628.COD to test the appropriate program. I use a simple nested FOR loop, and both source and .COD files are included in the source .ZIP files. The default is to align the code for 16-byte boundaries. Why use a 16-byte (128-bit) boundary on a 32-bit chip? Because, for the Pentium-III I'm doing most of my development on, there is an additional penalty for an instruction crossing the boundary in the internal (on-chip) CPU code cache. I've found that multiples of 16-bytes (128 bits) return the same speed, so I'm using a default of 16. You can use 4, 8, 16, 32, or 64 byte alignment (i.e. -T4 or -T8 or -T16 or -T32 or -T64). Note that I'm not sure if 16-byte alignment is best for AMD Athlon chips or not. When PALIGN Pxx -T is done, the result will be an optimized executable, Pxx.EXE, and a timing report Pxx.TIM. You can look at the timing analysis, and the last couple of lines can be used as a .BAT file to re-create the optimized Pxx.EXE. I've found that, for normal development work, code alignment isn't worth the effort. I do use it for release versions, however.