This is a basic FAQ that I hope will answer some basic questions about
the PICEMU source.  As I recieve questions about the source, I will add
them to this file to create a more complete sourcecode FAQ.

Questions, comments, compliments, complaints, and whatnot should be sent to
   picemulator@yahoo.com

I will be keeping the latest version of the source code and this FAQ at
   www.picemulator.com
and would appreciate a copy of any changes you make.  You will be given
full credit for your changes.


The first question people will ask is probably:
   "Why the H@LL did he use such a Piece-of-SH&T compiler?"

   There are actually a couple of reasons for this:
      1) I've been using this compiler since 1985.
      2) I've written many, many megabytes of code using it, including an
         80x86 interpreter.  Thus, I had lots of code to draw upon.
      3) I can write a "quick and dirty" program in it so fast it doesn't
         pay to think about using another compiler.  And since PICEMU
         wasn't going to be anything like the program it has become, I
         didn't think about using anything else.

   If you don't have it yet, you can download the compiler from
   www.picemulator.com

The next question will probably be:
   "Who the H@LL still uses DOS?"

   I do, for one.  I'm an "old fogie", who started writing for DOS before
   Microsoft ever got it's hands on it (it was 86-DOS from Seattle Computer
   Products, for their S-100 boardset).

   Since then, I've done things like write anti-virus software (up to and
   including scanning / disinfecting Word and Excel files under DOS -- a
   very interesting headache), and I'm now doing embedded software.  I find
   that Windows can be a useful environment for casual computer use, but for
   the type of software I write it just gets in the way.

   It's also much easier to access hardware (serial, keyboard, video)
   than via Windows calls.

"What kind of C code is that?"
   It's K&R C code.  This was the "standard" before ANSI-C.  There are two
   noticable differences:
      1) In K&R C, passed parameters are of the form:
               int foo(str,len)
                  char *str;
                  int len;
                  {
                  }
         versus the ANSI-C form of:
               int foo(char *str, int len)
                  {
                  }
      2) K&R C isn't as picky about function prototypes.  In fact, K&R
         function prototypes don't need a parameter list, just the function
         type.

         So, K&R C function prototypes would look like:
            int foo();
            void bar();
            long lseek();
         versus the ANSI-C prototypes of:
            int foo(char *, int);
            void bar(void);
            long lseek(FILE, long, int);

"Why do you use all those globals instead of locals and passed parameters?"
   There are a couple of reasons for this:
      1) Globals are faster:  The more that has to be passed on the stack,
         the more setup per function call.  This produces larger and slower
         code.
      2) It's easier to make additions to the code.  If I find that I need
         to call yet another function about 3 levels down, I don't have to
         worry about making sure that all necessary variables are being passed
         to all levels.  I just access the variable.
      3) Less data indirection (variables that need to be changed inside a
         function) means fewer pointers, less chance of changing the wrong
         data (i.e. pointers out of bounds, etc), and more stable code.

   On the down side, I find I keep doing ugly things like re-using a global
   that I know isn't currently in use rather than create a new variable.
   I hope you won't start dreaming of hunting me down and making me pay for
   my bad habits...

"What is this BYTE, WORD, and DWORD stuff?"
   Being an old assembly-language programmer, I find it natural to think
   in terms of basic storage types -- BYTEs are 8-bit unsigned
   variables, WORDs are 16-bit unsigned variables, and DWORDs are 32bit
   unsigned variables.  So, I use typedef to declare my favorite storage
   types.  It's a lot easier to type "BYTE" instead of "unsigned char",
   and actually makes the code more transportable (since, for example,
   an "int" may be 16 or 32 (or even 64) bits, depending on the system
   processor, I only need change the typedef to keep WORD a 16-bit
   quantity).

   Interestingly, I rarely find a need for a signed number.  Though I
   do often use the shorthand "-1" for "0xffff".

"How do you keep track of all the variables and/or code?"
   Unfortunately, that comes from writing it.  It has been said that every
   large, complex system that works has grown from a small, simple system
   that works, and I find that code is no different.

   Since PICEMU has thoroughly outgrown it's original design, it's become
   fairly complex, and needs a fair amount of housecleaning.  Dig in!

"What's with all these ALIGNxxx variables?"
   This is a nod to the architecture of the CPU that PICEMU is running on.
   On a modern (Pentium or better) CPU, data is fetched/stored in 32-bit
   chunks.  So, if a 16-bit variable is aligned so that the lower 8-bits
   are in one 32-bit chunk, and the high 8-bits are in the next 32-bit chunk,
   the CPU will have to do 2 32-bit reads to fetch it from memory, and 2
   32-bit writes to save it.  This slows down the program a LOT.

   So, variables that are frequently accessed (the ones used in the code
   interpreter loop) are best aligned so that they do not cross 32-bit
   boundaries.

   Rather than look at the addresses of all variables and make sure that
   BYTEs (8-bit variables), WORDs (16-bit variables), and DWORDs (32-bit
   variables) all "add up" so that nothing crosses a 32-bit boundary, I've
   added the ALIGNxx variables, so that a variable and it's associated
   ALIGNxxx variables always take up 32-bits.  Yes, this wastes data space,
   but I've got enough to spare, and it makes it possible to automatically
   adjust data alignment via the PALIGN.EXE program (see below).

"What are these .TIM files?"
   This is another nod to the architecture of the CPU that PICEMU is running
   on.

   As mentioned above, data elements are faster to access when they do not
   span 32-bit boundaries.  The same is true for code.  If an instruction
   spans a 32-bit boundary, it takes 2 memory cycles to fetch, instead of
   just one.

   However, it's not as easy to figure code alignment as it is data alignment.
   I can't just throw in NOPs after most instructions to make sure that
   the following instruction doesn't cross a 32-bit boundary.

   What can be done, however, is to add NOPs in places where they are not
   in the code flow (and so are not being continually executed, taking up
   time), but do affect the alignment of the code following it.  For 32-bit
   alignment, adding 0 to 3 NOPs (0 to 3 bytes == 0 to 24 bits, more NOPs
   will just give us the same alignment MOD 32) will move the code below it.
   While not all instructions can be aligned, moving it around will cause
   more or less instructions to be aligned.  The more aligned instructions,
   the faster the code.  The PALIGN.EXE program can be told to repeatedly
   compile the code with different alignments, test the speed, and find
   the best set of alignments.  The results are saved in the .TIM files.

   Of particular interest is the end of the .TIM file, which summarizes the
   alignment info as compiler commands (and command-line #define's) needed
   to optimize the code.  You can edit it down to those last few lines and
   use it as a .BAT file to create an aligned, optimized program.
   Unfortunately, the next change will mess it all up, and so I don't
   bother with code alignment until I'm doing a release.

   See the discussion of PALIGN below for more information.

"What is a 'small memory model'?"
   It's a relic of the early days of personal computers, Intel's decision
   to make the 8086 somewhat compatible with earlier generations of Intel
   processors, and lots of other stuff.

   If you're running a DOS program, you're still running code that is
   backwards compatible to the original IBM-PC (more or less...).  As a
   result, you're using 16-bit registers, so you have a 16-bit address
   space to play with.

   And a 16-bit address space is 64K (65536) bytes.

   But the PC can address more memory than that -- it has (for DOS mode) up
   to 640K.  To address more than 64K, Intel used "segment registers", which
   tell the CPU which 64K "segment" the 16-bit address register was referring
   to.

   There is a single code segment register, and a couple of data segment
   registers.  How they are used determines the "memory model" in use.

      Tiny memory model     Both code and data segments point to the same
                            64K segment.  Also known as the "COM" memory
                            model.

      Small memory model    Code and Data each has it's own 64K segment, so
                            a program can have up to 64K of code and 64K of
                            data.

      Medium memory model   Code has a single 64K segment, and there are
                            multiple 64K data segments.

      Compact memory model  Data has a single 64K segment, and there are
                            multiple 64K code segments.

      Large memory model    Both code and data have multiple 64K segments.

      Huge memory model     Both code and data have multiple 64K segments,
                            and all pointers / memory references are treated
                            as "far" (have both a segment and an offset).

   The compiler used for PICEMU only supports the small memory model (what
   do you expect from 1985 technology?), so there is a limit of 64K of
   code and 64K of data.

   While this is tiny (sorry to use the word again) from the perspective of
   Windows programs, you can really do a lot in that much space.

"Why do I need a math coprocessor?"
   I use the 80x87 floating point libraries, rather than the emulated
   libraries, since every modern chip (basically starting with the Pentium)
   has a built-in math coprocessor.  This saves on code size, and is actually
   used in execution time calculations.

"Why the separate programs for each processor?"
   The earliest PICEMU code had both 12 and 14 bit cores in the same program.
   Unfortunately, things kept growing larger and more complex, to the point
   where I thought that it would be best to separate them.  Some traces of
   this early vintage still exists, however, in function names like
      void controls_12bit()
   and
      void controls_14bit()
   and the names of the the 12C509A emulator (P12) and the 16F877 emulator
   (P14).  The 16F84A and 16F628 emulators are later additions, started by
   "cloning" P14, and modifying them into the individual processors.

   I'm trying to make the difference between P14, P84, and P628 just the
   #define's from PICEMU.H, but they aren't quite there yet.  Feel free to
   continue the process.

"What is this 'xlate_regs[]', 'readregs[]', and 'writeregs[]' business?"
   The PIC processors have quite a bit of hardware associated with their
   registers ('files' in Microchip speak).  Reading and/or writing the
   registers will fetch data from hardware or write data to hardware, and
   can change the state of the processor (like writing to PCL will change
   the instruction pointer and flush the prefetch queue, although reading
   PCL doesn't affect any hardware).

   As a result, reading from and writing to registers must be active code,
   rather than just accessing a data array.  Hence, the 'readregs' and
   'writeregs' arrays of pointers to functions.

   In addition, some registers are also common to all regsiter banks
   (file banks).  The 'xlate_regs' array makes sure that all such registers
   point to a common register, so that there is no problem with keeping
   separate copies of registers (one in each bank) in sync.

"Why are there 2 instruction execution engines?"
   I wanted PICEMU to be fast (the slow speed of MPLAB's simulator was a
   major incentive to write PICEMU).  I also wanted it to be very flexible.
   These aren't always compatible.

   Each little bit of processing in the instruction execution loop, occuring
   millions of times a second, adds up a LOT.

   The "Go" command can use breakpoints, but doesn't care about the exact
   number of cycles that have executed, and uses the "godoit()" engine.
   The "Execute" command (the "execute()" engine) not only uses breakpoints,
   you can tell it how many instruction cycles to execute before stopping.
   By not having the extra "how many instructions have executed" code in the
   "Go" command, it's about 10% faster.  (Throttle is also part of the
   "Execute" command, and also slows it down a little).

   But, there is a downside to this -- if I find I have to make a change to
   the execution engine, I have to make sure the same change is made to BOTH
   execution engines.

   If you're willing to pay the time penalty, there is no need to keep 2
   instruction execution engines in PICEMU.  The "execute()" engine, with
   breakcount[] set to 0xffffffffffffffff (two DWORDs (32-bit numbers) team
   up to form a 64-bit number), will serve as the engine for "Go", too.
   (You won't live long enough to expire the 64-bit breakcount counter).

"update_port_display() is doing more than just updating the screen!"
   Yup, it is.  When I started emulating I/O stuff (Software Uarts, I2C
   EEPROM), it was a convenient central location, since it was called for
   each change in I/O state.

   But inertia has set in, and the name was never changed to reflect it's
   new functionality.  Feel free to dig in.

"Can you tell me what all the souce files do?"
   Sure.  There are a couple of files common to all PICEMU programs, and
   files particular to each different PICEMU.

   PICTYPE.H has the same name, but different contents, for each PICEMU.
   The MAKEPxx.BAT file copies the PIC type file (12C509A.H, 16F877.H,
   16F84A.H, or 16F628.H) to PICTYPE.H, for use by PICEMU.H

   PICEMU.H is the common include file, and sets up all the necessary
   #define's and delcarations.  The differences between processors are
   setup based on the contents of PICTYPE.H

   PSHELL.C and PSHELL2.C are the "user interface" part of PICEMU.  This
   is the code that handles user commands and PICEMU output.  This keeps
   different flavors of PICEMU consistant.  This was originally a single
   file that grew too big and had to be split.

   P12 is the 12C509A emulator.  It was written as the basic 12-bit core
   program, hence it's name.  In addition to the common files above, it
   consists of:
      12C509A.H    This just defines the processor type.
      P12.C        This contains the individual instruction emulation
                   functions, the instruction execution engines, the
                   disassembly functions, the interrupt handlers (INT 9
                   keyboard, INT 8 time), and the I/O functions (port screen
                   update, watch registers, OSCOPE, etc).
                   This also contains the main() function and most variable
                   declarations.
      P12REGS.C    This contains the register read/write functions and
                   xlate_regs[] table.  It also contains fnkey routines,
                   called by the INT 9 handler (keyboard interrupt), to
                   toggle I/O pin values.
      P12ASM.C     The assembler for the 12-bit core.
      P12REGDE.H   Register and bit names (P12 Register Defines)
      MAKEP12.BAT  A "make" file to compile and link P12.  It also uses
                   PALIGN (see below).

   P14 is the 16F877 emulator.  It was written as the basic 14-bit core
   program, hence it's name.  In addition to the common files above, it
   consists of:
      16F877.H     This just defines the processor type.
      P14.C        This contains the individual instruction emulation
                   functions and the instruction execution engines.
                   This also contains the main() function and most variable
                   declarations.
      P14_2.C      This contains the disassembly functions, the interrupt
                   handlers (INT 9 keyboard, INT 8 time, Serial port
                   interrupt), and the I/O functions (port screen
                   update, watch registers, OSCOPE, etc).  This was split
                   from P14.C when that file grew too big.
      P14REGS.C    This contains the register read/write functions and
                   xlate_regs[] table.  It also contains fnkey routines,
                   called by the INT 9 handler (keyboard interrupt), to
                   toggle I/O pin values.
      P14ASM.C     The assembler for the 14-bit core.
      P14REGDE.H   Register and bit names (P14 Register Defines)
      MAKEP14.BAT  A "make" file to compile and link P14.  It also uses
                   PALIGN (see below).
   Note that changes to the 14-bit core need to be added to the P84 and
   P628 programs, too.

   P84 is the 16F84A emulator.  It was cloned from the basic 14-bit core
   (P14), and is maintained in parallel with P14 and P628 (changes to one
   need to be added to the others, even if they are "removed" by an #ifdef).
   In addition to the common files above, it consists of:
      16F84A.H     This just defines the processor type.
      P84.C        This contains the individual instruction emulation
                   functions and the instruction execution engines.
                   This also contains the main() function and most variable
                   declarations.
      P84_2.C      This contains the disassembly functions, the interrupt
                   handlers (INT 9 keyboard, INT 8 time), and the I/O
                   functions (port screen update, watch registers, OSCOPE,
                   etc).  This was split from P84.C when that file grew too
                   big.
      P84REGS.C    This contains the register read/write functions and
                   xlate_regs[] table.  It also contains fnkey routines,
                   called by the INT 9 handler (keyboard interrupt), to
                   toggle I/O pin values.
      P14ASM.C     The assembler for the 14-bit core.  (Yes, this file is
                   used "as-is" between all 14-bit core emulators).
      P84REGDE.H   Register and bit names (P84 Register Defines)
      MAKEP84.BAT  A "make" file to compile and link P84.  It also uses
                   PALIGN (see below).

   P628 is the 16F628 emulator.  It was cloned from the basic 14-bit core
   (P14), and is maintained in parallel with P14 and P84 (changes to one
   need to be added to the others, even if they are "removed" by an #ifdef).
   In addition to the common files above, it consists of:
      16F628.H     This just defines the processor type.
      P628.C       This contains the individual instruction emulation
                   functions and the instruction execution engines.
                   This also contains the main() function and most variable
                   declarations.
      P628_2.C     This contains the disassembly functions, the interrupt
                   handlers (INT 9 keyboard, INT 8 time, Serial port
                   interrupt), and the I/O functions (port screen update,
                   watch registers, OSCOPE, etc).  This was split from
                   P628.C when that file grew too big.
      P628REGS.C   This contains the register read/write functions and
                   xlate_regs[] table.  It also contains fnkey routines,
                   called by the INT 9 handler (keyboard interrupt), to
                   toggle I/O pin values.
      P14ASM.C     The assembler for the 14-bit core.  (Yes, this file is
                   used "as-is" between all 14-bit core emulators).
      P628REGD.H   Register and bit names (P628 Register Defines)
      MAKEP628.BAT A "make" file to compile and link P628.  It also uses
                   PALIGN (see below).

"What is this 'magic' PALIGN program?"
   It's not magic.  PICEMU was designed so that all the "important" variables
   (the ones used millions of times a second in the interpreter loop) are
   DWORD aligned for speed.  PALIGN takes the name of the PICEMU program
   (P12, P14, P84, and P628) and uses that to find the corresponding
   link map (named Pxx.OPT, created in the MAKEPxx.BAT file).  It looks
   for specific variables, and based on their alignment re-compiles
   specific files with command-line #define's to add variables to fix
   alignment.  Then, it re-links the files (if you want to look at the
   final, optimized link map, it is Pxx.MAP).  This is built into the
   MAKEPxx.BAT files.

   Since the data is nicely aligned, the program runs faster.  It can be
   done by hand, but PALIGN just automates this tedious process, which is
   what computers are for.

   It can also do code alignment for maximum speed, if you add the -T
   option to the command line.  However, some additional assumptions
   have to be made.

   First is that you can actually tell the difference in execution speed.
   If you're running under Windows, even in a full-screen DOS box, the
   underlying Windows processes can take enough time so that you can't
   tell the difference between different code alignments.  So, I boot from
   a DOS floppy and do my compiles and testing from a RAM disk.

   Next is that there is a standard program to execute.  PALIGN expects to
   find P12.COD, P14.COD, P84.COD, and P628.COD to test the appropriate
   program.  I use a simple nested FOR loop, and both source and .COD files
   are included in the source .ZIP files.

   The default is to align the code for 16-byte boundaries.  Why use a
   16-byte (128-bit) boundary on a 32-bit chip?  Because, for the Pentium-III
   I'm doing most of my development on, there is an additional penalty for
   an instruction crossing the boundary in the internal (on-chip) CPU code
   cache.  I've found that multiples of 16-bytes (128 bits) return the
   same speed, so I'm using a default of 16.  You can use 4, 8, 16, 32, or 64
   byte alignment (i.e. -T4 or -T8 or -T16 or -T32 or -T64).  Note that I'm
   not sure if 16-byte alignment is best for AMD Athlon chips or not.

   When PALIGN Pxx -T is done, the result will be an optimized executable,
   Pxx.EXE, and a timing report Pxx.TIM.  You can look at the timing
   analysis, and the last couple of lines can be used as a .BAT file to
   re-create the optimized Pxx.EXE.

   I've found that, for normal development work, code alignment isn't worth
   the effort.  I do use it for release versions, however.