Where to Start ?
The first place to turn these days when embarking upon a quest for knowledge
is "The Internet". If the information you want can't be found there, it doesn't
exist, or you're unlikely to find it by any other means.
One of the first significant references found was the "Basic Stamp Divided By
Four", which is also known by the appellation "BS/4".
This is a PBASIC interpreter, written by Antti Lukats of Sistudio, which
is a cut down version of the Basic Stamp, running on the very limited PIC
16C84 and 16F84 chips. These have 64 bytes of EEPROM, allowing programs which
are up to a quarter of the maximum size of the original Basic Stamp programs
to run. Hence its name.
The interpreter is available for
download for
burning into a PIC, from
Reflection Technology.
Unfortunately, development has lapsed, and there is no source code available.
Another interpreter, along similar lines as the BS/4, is the ST1-64 from the
BSS Club, the
PIC Interpreters Club.
Again, there is an interpreter available for burning into a PIC, but no
source code, and the site doesn't seem to have been updated since March, 2000.
Prolonged searching for a ready written, freely available, royalty free, source
code provided, PBASIC interpreter got me nowhere.
The only way forward was to create an interpreter from scratch.
Not an easy option, but one which was appealing. Even if I never created an
actual interpreter, attempting to get there is part of the fun of software
engineering.
Getting Underway
Before even thinking about how to create your own PBASIC interpreter, the first
thing you need to know is what a compiled PBASIC program looks like. We don't
particularly care about how we compile a program, as there are a couple of
free tools to do this available ( these are discussed later ). All the
interpreter needs is the compiled image to execute.
Unfortunately, but unsurprisingly, Parallax are in the business of making money
and aren't publicly disclosing how their compiler or interpreter work or what
the compiled code looks like.
Luckily, there are some enterprising individuals out there who are never
thwarted by brick walls. One such person is
Chuck McManis who reverse
engineered almost the entire Basic Stamp compiled code in his, now infamous,
Decoding the Basic Stamp
article which he published on the web.
With Chuck's useful information, it's fairly easy to design a disassembler for
the compiled image, and creating a disassembler is a good place to start.
Firstly, it's a good way
to get to grips with the compiled code format; to understand how each line is
compiled, and find all those areas which Chuck hasn't touched upon - such as the
GOSUB Return Address Table, and the allocation for user defined EEPROM data.
Although a disassembler only reports what's in the compiled image, and doesn't
care about the semantics of execution, it is a step forward towards the creation
of an interpreter.
Of course, writing a disassembler doesn't come without its own trials and
tribulations.
The compiled "DEBUG" statement doesn't have any of the information about the
original source code stored with it, only an indicator of where itself is in
the compiled image, and is thus impossible to disassemble correctly.
The "FOR" statement has much of the source code information supplied moved to
the compiled code of the "NEXT" statement which means the two have to be
married up again. The job here is made easier because each compiled "NEXT" token
jumps to the line after the "FOR" statement, so the "FOR" to which a "NEXT"
applies to can be easily found.
Additionally, testing the disassembler shows up a number of interesting features
of the PBASIC language, compiler technology and interpreter implementation.
The mysteriously missing 0000 arithmetic operator code in McManis's
article revealed itself to be a unary negation assignment operator ( "= -" ),
and the reason for having the 0001, "=", arithmetic operator
became clear. If an assignment statement has the same variable on the
left of the assignment as on the right ( as in "LET B0 = B0 * 2" ), then the
"=" opcode is left out, as an optimisation, only appearing when an assignment
is made into a different variable on the left to that on the right.
The 11-bit address value starting in the second byte of the compiled image,
turned out to be a pointer to the byte and bit immediately after the end of
the program code. This is also the start of the GOSUB Return Address Table, if
it exists.
Despite all these issues cropping up, the disassembler turned out to be quite
simple to implement, although it is rather bare on the user interfacing side.
The UNSTAMP Disassembler
The UNSTAMP Disassembler decodes a Basic Stamp 1 ( BS1 ) compiled image from
the CODE.OBJ file created during compilation. It handles compiled PBASIC
programs targeted at the BS1 only.
The UNSTAMP Disassembler is available as an MS-DOS executable and as Basic
Source Code.
Downloading the UNSTAMP Disassembler
The UNSTAMP Disassembler Version
2.03
is available for download as part of the
DIYSTAMP.ZIP Distribution Archive. The source code
consists of the UNSTAMP.BAS file and a number of .BAZ "include files" ( please
see README.TXT in the Distribution Archive for details ), and the UNSTAMP.EXE
file is the disassembler executable. The entire Distribution Archive can be
downloaded by clicking the link below ...
Download DIYSTAMP.ZIP - Version
9.12 ( 357 KB )
Version
2.03
of the UNSTAMP Disassembler is the latest version.
Although I am running a Virus Checker on my development PC, please check
the DIYSTAMP.ZIP and UNSTAMP.EXE files after downloading and unzipping to
ensure that they are virus free.
Using the UNSTAMP Disassembler
The UNSTAMP Disassembler must be run in the same directory as the CODE.OBJ file
is located.
The disassembler can either be placed in that directory or placed
in a directory which is included in the "SET PATH=" environment variable, or
it can be run by prefixing its name with the fully qualified path of where it
is installed. In short, the UNSTAMP Disassembler is just like any other MS-DOS
executable you will encounter.
The disassembler is run by using the "UNSTAMP filename" command. This
will read the CODE.OBJ file and create the filename.LST and
filename.DIS files. These files are described below.
The filename must not include an extension, and must not include any
disk, directory or other relative path information. It must not be greater
than eight characters long; long filenames are not supported.
Command line help, and version details, can be obtained by using the
"UNSTAMP /?" command.
Output Files
The UNSTAMP Disassembler creates two output files ...
A .LST file which shows how and where the compiled image tokens are held in
the compiled image, and the source code for the compiled lines.
A .DIS file which contains just source code determined from the compiled image.
The best way to see the functionality of the disassembler is to try it; create
a .BS1 or .BAS source code file ( including the "BSAVE" line, and without using
any "SYMBOL" definitions ), compile using the Parallax or BSS Club compilers,
disassemble using UNSTAMP, and look at the two resulting files.
For an explanation of the tokens held within the compiled image, and shown in
the disassembler output; please see Chuck McManis's
Decoding the Basic Stamp
article.
Tokens are shown against their address in the compiled image, which is given in
the form xx:y, where xx indicates the byte in which the token
starts ( 00 being the first address of the image, and FF being the last ), and
y is the bit offset within that byte where the token starts ( with 0
being the leftmost, most significant bit and 7 being the rightmost, least
significant bit of the byte ).
Note that program statements run down the compiled image ( from address 00
towards FF ), while user defined EEPROM data runs upwards ( from FF towards
00 ). The disassembler displays the user defined EEPROM data in the order
it would have appeared in the original source code.
The disassembler can only create a source code representation based upon the
information which is held within the CODE.OBJ file, and therefore it is
impossible to know what label names were used and what "SYMBOL" definitions
were made in the source code.
Labels are auto-generated in numerically ascending order throughout the
disassembly files and are prefixed by "L" ( for "label" ) if they are the
destination of an "IF ... THEN", "GOTO" or "BRANCH" statement, and prefixed by
"S" ( for "subroutine" ) if they are the destination of a "GOSUB" statement.
Because "DEBUG" statements cannot be disassembled, these are always disassembled
as "DEBUG B0".
Although both the Parallax and BSS Club compilers appear to compile source
code using, "Brute force and ignorance", keeping almost everything entered as
source code within the compiled image, without optimisation, there may be some
optimisations applied which result in some source code being disassembled
differently to the original source code. The disassembled code should, however,
be as functionally correct as the original.
The UNSTAMP Source Code
The UNSTAMP Disassembler is written in Basic and is compatible with the
FirstBasic 1.00 shareware compiler and the PowerBasic 2.10f compiler
from PowerBasic Inc.
The source should be fairly easily convertible into other variants of the Basic
language, including QBasic and Visual Basic, and even alternative programming
languages, such as C, C++ and Java.
Reporting Bugs in the UNSTAMP Disassembler
The UNSTAMP disassembler is based upon the sterling work by Chuck McManis,
supplemented by my own experiments to fill in the details missing from his
reverse engineering efforts.
With Parallax's compiled image definitions otherwise cloaked in secrecy,
there is no real way to confirm that the reverse engineering effort is
complete and accurate in all respects, other than to generate code through
their compiler, disassemble it and check the two sources match.
This is a tortuous task, as there are a massive number of combinations of
almost every statement, and so although there has been a considerable effort
to test the interpretation of the compiled image, testing cannot be
exhaustive.
This means that there may be errors in the disassembler, both due to
misinterpretations of the compiled image format and in coding the disassembler
itself.
Before reporting a bug, please check that you have the latest Version
2.03
UNSTAMP Disassembler.
If you do find any significant bugs, please send an email detailing the
problem ( with an indication as to how to replicate the bug, and specifying
the disassembler version number ) to
hippy@psynet.net.
Licensing
The UNSTAMP Disassembler is provided as Freeware for personal, educational and
non-commercial use, however, it must not be used to reverse engineer, or
to attempt to reverse engineer, any compiled images which you have not created
yourself, or compiled images which have been delivered with, or pre-downloaded
into a Basic Stamp, or any similar device.
The UNSTAMP Disassembler Executable and Source Code may be modified and
redistributed providing that the Licensing and Copyright statements contained
therein are unchanged, and no charge other than distribution cost is made."
For commercial use of the UNSTAMP Disassembler, or any program derived from it,
please contact the original author, and Copyright holder, at
hippy@psynet.net.
Warranty
The UNSTAMP Disassembler is provided "as is", without any warranty of any kind,
and without any guarantee as to fitness for purpose.
Downloading, installing and using the UNSTAMP Disassembler is undertaken at your
own risk.
Compilers
To disassemble a compiled image, you need to create one. This means you will
need a PBASIC compiler, and you need one which stores the object code as a file
on disk, so the disassembler can access it.
There are two primary choices for the compiler; the official
STAMP.EXE
compiler from Parallax
and the ST1.EXE compiler from
BSS Club
which is part of their
ST1-64
package.
Both compilers are MS-DOS based ( and have been used successfully at an MS-DOS
Prompt under Windows 98 Second Edition ), and are free to download.
The Parallax STAMP.EXE Compiler
To create a CODE.OBJ file, a "BSAVE" line must be added to the PBASIC Source
Code which you are compiling.
The source code is compiled using "STAMP filename.ext"
command ( the file extension can be left off if it is .BAS ), which invokes
the full screen editor, compiler and loader. The Alt-R key is used to
compile the source, which is then normally downloaded into a connected Basic
Stamp automatically. If there is no Basic Stamp connected, the compiler
will produce an error message, but will have created a CODE.OBJ file.
The STAMP.EXE compiler is exited by using the ESC key.
The BSS Club ST1.EXE Compiler
Unlike the Parallax compiler, the ST1.EXE compiler is entirely command line
based, and will create a CODE.OBJ file by using the
"ST1 filename.ext" command. The filename extension is not
optional, and must be included, but having a "BSAVE" line in the PBASIC source
code is optional.
Because the BSS Club compiler is entirely command line based, it is much easier
to used than the Parallax compiler, and will normally be the preferred compiler
to use. It must, however, be borne in mind that the BSS Club compiler has been
developed on the back of reverse engineered effort, and may not always create
the same object code as the Parallax compiler, nor support the PBASIC language
completely.
In particular, the ST1 compiler will not compile assignment statements which
use unary negation ( such as "LET B0 = -1" ), whereas the official Parallax
compiler will, and it throws a runtime error for "EEPROM ( 256 )", and when any
EEPROM value larger than 8 bits is used. It also reports and EEPROM full error
when a program fills up the entire EEPROM exactly.
On the other hand, it will accept self-assignments ( such as "LET B0 = B0" )
which the Parallax won't, unless followed by an arithmetic operator.
The BS4.EXE Compiler
The Basic Stamp Divided By Four interpreter includes BS4.EXE, a full-screen
editor and compiler. This is a very nice MS-DOS "Window" application.
Unfortunately, the compiler is not as robust as either the Parallax or BSS Club
compilers, allowing more than sixteen GOSUB's to be compiled without error, and
generating completely wrong compiled image tokens for "LET B0 = -B1", and it
probably has other bugs as well.
It is therefore recommended that the BS4.EXE compiler is not used, unless there
is a particular reason for doing so, such as comparing the output of different
compilers. Anything it does produce by way of a compiled image should be
treated as suspicious, or wrong.
Lessons Learned
Compiling test programs and disassembling the compiled image revealed an awful
lot of information about the compiled image and how parts of the interpreter
works, beyond the information provided by McManis.
Both the Parallax and BSS Club compilers do very little optimisation; if you
write "LET B0 = B1 - 0 - 0 - 0", then all those unnecessary zero subtractions
are included in the compiled image. If you put an "END" at the end of your
program, the compiler will still add one of its own.
The only optimisation which has been seen is that where the variable
on the left hand side of an equation is the same as the one on the right; such
as in "LET B0 = B0 + 1". In these cases, no assignment operator appears in the
compiled image, only the equation itself.
Many instructions include the addresses of the subsequent instructions, or a
part of themselves. Presumably this is because the original processor in the
Basic Stamp had a limited stack and constrained memory, so it's easier to store
these addresses away somewhere and come back to them later.
In the case of the power-down and sleep commands, I guess it's easier to store
the 'start here when you wake-up' pointer than the current interpreter address,
but it's not at all clear at the moment why this would be the case.
The handling of GOSUB's is quite interesting. In a normal processor, every
GOSUB will push its return address on to the stack, every RETURN will pop
the return address, and continue executing from there.
The Basic Stamp's limited stack size precludes this, so a clever technique is
used. Every GOSUB is numbered, and every compiled GOSUB line includes this
number and the target destination for the GOSUB. When the GOSUB is executed,
the number of the GOSUB rather than the return address is pushed to a virtual
stack ( implemented as an array in memory ), and on return the number of the
GOSUB is popped from the virtual stack. The GOSUB number is used as an index
into an array of GOSUB Return Addresses stored in the compiled image, after
the rest of the compiled program. The address so indexed is taken, and
execution continues at the address, which is one after that of the GOSUB
instruction.
There is a limit to just sixteen GOSUB's which can appear in any PBASIC
program. Although this limit appears to have been somewhat arbitrarily set,
clues to the limitation are found in the Basic Stamp's operational description.
The variable w6, is reserved for use within subroutines and must not be
used by the programmer within any subroutines, and the contents of w6
will be 'corrupted' after a GOSUB. Given that there are sixteen GOSUB Return
Addresses allowed, identified by a 4-bit number, and subroutines can be
nested four deep; it looks likely that w6 is being used as the
GOSUB Return Address Stack.
It is quite remarkable that none of the compilers make any attempts to check for
the dangerous use of w6, and by association b12 and b13, or
the overly deep nesting of GOSUB's, but they don't.
The FOR / NEXT constructs are handled differently in PBASIC than in other
Basic dialects. Most of the semantics of the FOR statement
are transferred to the NEXT statement during compilation, making the FOR a
simple, initial assignment, with NEXT determining whether to loop again or
not.
This means that a FOR / NEXT loop will execute at least once in PBASIC whereas
it would not execute at all in other dialects. The statement, "FOR B0 = 1 TO 0",
will always execute once. Having "FOR B0 = 255 TO 0 STEP -1" causes a more
severe problem, as the Basic Stamp deals only with positive maths, so the index
variable (b0) at zero, is decremented, which becomes -1, but this is
really 255 in positive only maths, which is in range, and thus the loop will
continue for ever.
It is unclear why there is a separate "FOR" token, when the initial assignment
could have been compiled as an equivalent "LET" construct, nor is clear why,
given that the "FOR" token exists, that simple assignments to variables ( ie
"LET B0=B1" ) were not optimised as a "FOR" initialisation, which uses less
compiled code space.
Surprisingly, as I discovered later, the hardest part of compiling an image
and disassembling it, turned out to be handling the data held in the first byte
of the compiled image. This is the number of the byte in which the first unused
bit of EEPROM code ( after the tokens and GOSUB Return Address Table ), which
is inverted in the compiled image.
The 11-bit address, which starts in the second byte of the compiled image,
points to the first unused bit after the tokens, and points at the GOSUB Return
Address Table if one exists.
While this can be generated relatively easily ( once it's realised exactly what
the value signifies ), it is not possible to use this value to quickly
determine which is the last EEPROM byte used for token and GOSUB Return Address
storage, nor the size of the compiled image.
The DIYSTAMP Compiler
Having written the UNSTAMP disassembler, a good understanding of the compiled
image, the format of compiled statements and the semantics of the compiled
tokens has been gained.
To gain an even deeper understanding, it was decided that a compiler would be
written before starting work with an actual interpreter. Having to generate an
actual compiled image from source code would show up any areas of
misunderstanding or oversight. Writing a compiler would also create an
alternative to using that provided by Parallax and others, and allow the chance
to extend the PBASIC programming language for the BS1 to incorporate common
features used by professional programmers which are unimplemented in those
compilers.
The Compiler
The Parallax BS1 compiler ( STAMP.EXE ) is extremely lax in accepting
semantically incorrect programs, preferring to place the onus for program
correctness upon the programmer rather than rejecting incorrect programs. While
this makes a compiler extremely simple to write, it allows incorrect code to
be written, and unexpected operation to occur when it is executed.
The most common fault is with the use of the PINx variables which should
not be used to specify pin numbers in the BUTTON, HIGH, INPUT, LOW,
OUTPUT, PULSIN, PULSOUT, PWM, REVERSE, SERIN, SEROUT and SOUND commands.
The PINx variables always contain the value 0 or 1, reflecting the
status of the voltage driving the input pin. When they are used to specify a
pin number, the value is taken, and thus only pin 0 or 1 is referred to; not the
pin which the programmer thought had been specified.
The other common mistake is in using the RANDOM command with something other
than a word variable. Although such a mistake won't be rejected, the resulting
execution of the program is undefined and unpredictable. Using RANDOM with
the PORT variable, is an obviously stupid thing to do ( randomly
changing pins to input or output ), but the compiler won't complain.
Why Parallax took the decision not to report semantic errors is unclear, and a
mystery to anyone who has ever written an assembler or compiler, especially
when the Basic Stamp is targeted at novices and those who may be unfamiliar
with programming.
The decision also creates a nightmare for professional
programmers who are notorious for not reading manuals, preferring to write code
while relying on the compiler tell them when they have stepped outside the
bounds of reasonableness and correctness.
The failure to add semantic checking is a major one, and unforgivable from a
technical point of view, not least because it can make debugging erroneous
code almost impossible; "LOW pin0" may, or may not, operate correctly depending
upon the state of external hardware, and in the worse case may operate during
weeks of testing, only to fail, for an apparently inexplicable reason, long
after that particular line of code was written.
Whereas the ST1 and BS4 compilers slavishly follow the Parallax approach of
leaving the programmer to dig their own graves and guess what they did wrong,
the DIYSTAMP Compiler provides a lot of semantic checking, in particular where
pin numbering by variables may occur, misuse of w6, b12 and
b13 within subroutines, and detection of many other mistakes. The
compiler also extends the syntax to permit constructs which ought to have been
included in the original Parallax compiler.
The main additions are -
- More flexible symbol definitions ...
Arithmetic expressions, with precedence override, are handled ( ie SYMBOL X = 1 + 2 )
Full character strings are handled ( ie SYMBOL X = "Hello" )
Character string concatenation is handled ( ie SYMBOL X = "Hello" + "World" )
Compiler defined symbols for time and date
- Symbol defined variables can have !, % and $ postfix characters to aid
identifying the purpose of variables ( ie SYMBOL FRED$ = "STRING" )
- Numeric constant prefixes have been extended to include ...
0x, 0h and &h to specify hexadecimal numbers ( ie $AB, 0xAB, &hAB )
0b and &b to specify decimal numbers ( ie 0b0101, &b0101 )
0o and &o to specify octal numbers ( ie 0o67, &o67 )
- Include files can be specified by using INCLUDE "file.ext"
- Two character strings can be used as 16-bit constants ( ie LET W0 = "AB" )
- IF ... GOTO, IF ... THEN GOTO and IF ... THEN GOSUB are supported
- REPEAT / UNTIL and WHILE / WEND are supported
- ON ... GOTO is supported
- Boolean variables allowed in IF, REPEAT and WHILE expression
- A PLAY command, to produce music easily, is added
- The DEBUG command doesn't require variables to be specified
- Support for increment and decrement instructions ( ie B0++ and W1-- )
- Support for increment and decrement by value ( ie B0 ++ 2, W1 -- 3 )
- Alternative arithmetic operator naming supported ( XOR, MOD )
- Alternative comparator operator naming supported ( = = and != )
- Shift left ( << ) and shift right ( >> ) operators supported
- Access to EEPROM can be done using an eeprom[] array
- Compiled image sizes of 64, 128 and 256 bytes
Code optimisations are also performed when appropriate, including the removal
of redundant arithmetic operations and unnecessary END statements.
The NEWSTUFF.BS1 file in the DIYSTAMP.ZIP
Distribution Archive illustrates the new syntactical constructs supported.
The additional syntax and semantic checks may mean that a program which compiled
without errors using the Parallax compiler may not when using the DIYSTAMP
Compiler.
To compile code which conforms to the original parallax syntax, applying
relaxed semantic checking, the /STRICT switch may be specified on the command
line when the DIYSTAMP Compiler is run.
The DIYSTAMP Compiler generates a .PRN listing file which shows the
result of compilation, and comprehensive error messages are generated when a
syntactical or semantic error is detected. Errors are also given in a .ERR
error listing file.
Code which compiles using the Parallax compiler should compile using the
DIYSTAMP Compiler when no dubious semantics are used, and will always compile
if the /STRICT switch is specified.
Code using the enhancements provided by the DIYSTAMP Compiler may not always
compile through the Parallax and BSS Club compilers. If the code compiles with
the /STRICT switch, then it should also compile using the Parallax and BSS Club
compilers.
The compiled image created by the DIYSTAMP Compiler ( in the CODE.OBJ file )
will always be 100% compatible with the BS1 interpreter, provided that a
compiled image of 256 bytes is generated ( neither /64 nor /128 used ).
Downloading the DIYSTAMP Compiler
The DIYSTAMP Compiler Version
2.02
is available for download as part of the
DIYSTAMP.ZIP Distribution Archive. The source code
consists of the DIYSTAMP.BAS file and a number of .BAZ "include files" ( please
see README.TXT in the Distribution Archive for details ), and the DIYSTAMP.EXE
file is the compiler executable. The entire Distribution Archive can be
downloaded by clicking the link below ...
Download DIYSTAMP.ZIP - Version
9.12 ( 357 KB )
Version
2.02
of the DIYSTAMP Compiler is the latest version.
Although I am running a Virus Checker on my development PC, please check
the DIYSTAMP.ZIP and DIYSTAMP.EXE files after downloading and unzipping to
ensure that they are virus free.
Using the DIYSTAMP Compiler
The DIYSTAMP Compiler can either be placed in the directory from which it is to
be run or placed in a directory which is included in the "SET PATH="
environment variable, or it can be run by prefixing its name with the fully
qualified path of where it is installed. In short, the DIYSTAMP Compiler is
just like any other MS-DOS executable you will encounter.
The compiler is run by using the "DIYSTAMP filename" command. This
will read the specified source code file and create the filename.PRN,
filename.OBJ, CODE.OBJ files and a filename.ERR file if any
errors are detected. These files are described below.
The filename can specify a file in the directory where DIYSTAMP is run
from, or may be a fully qualified filename or relative path to the source code
file. Wildcard filenames are supported. The filename must not be greater
than eight characters long; long filenames are not supported.
Command line help, and version details, can be obtained by using the
"DIYSTAMP /?" command.
There is a set of test programs for 'regression testing' of the compiler
included with the Distribution Archive; these are held in the TESTS
sun-directory. Thse can be compiled by using the "DIYSTAMP .\TESTS\*"
command.
Output Files
No matter where the source file is located, the output files created by the
DIYSTAMP Compiler will be placed in the directory from which the compiler was
executed.
The DIYSTAMP Compiler creates three or four output files ...
A .PRN file which shows how and where the compiled image tokens are placed in
the compiled image, against the source code for the compiled lines, along with
other information pertinent to the compilation.
A .OBJ file and a CODE.OBJ which contains the compiled image.
A .ERR file will be generated if any errors are detected. Checking whether or
not this file exists will allow the success or failure of the compilation to
be determined if the compiler is run within a .BAT batch file.
The best way to see the functionality of the compiler is to try it. There are
a number of examples of source code included in the
DIYSTAMP.ZIP Distribution Archive which can be used.
The NEWSTUFF.BS1 file illustrates the enhanced syntax available with the
DIYSTAMP Compiler, and can be compiled using the "DIYSTAMP NEWSTUFF" command.
For an explanation of the tokens held within the compiled image, and shown in
the .PRN file; please see Chuck McManis's
Decoding the Basic Stamp
article.
Tokens are shown against their address in the compiled image, which is given in
the form xx:y, where xx indicates the byte in which the token
starts ( 00 being the first address of the image, and FF being the last ), and
y is the bit offset within that byte where the token starts ( with 0
being the leftmost, most significant bit and 7 being the rightmost, least
significant bit of the byte ).
Note that program statements run down the compiled image ( from address 00
towards FF ), while user defined EEPROM data runs upwards ( from FF towards
00 ). The compiler displays the user defined EEPROM data in the order they
appeared in the source code, interleaved between other tokens.
The DIYSTAMP Compiler Source Code
The DIYSTAMP Compiler is written in Basic and is compatible with the
FirstBasic 1.00 shareware compiler and the PowerBasic 2.10f compiler
from PowerBasic Inc.
The source should be fairly easily convertible into other variants of the Basic
language, including QBasic and Visual Basic, and even alternative programming
languages, such as C, C++ and Java.
Reporting Bugs in the DIYSTAMP Compiler
Operation of the DIYSTAMP compiler has been checked by compiling numerous source
code files through the DIYSTAMP compiler, the BSS Club and Parallax compilers,
and checking that the compiled images match. It is possible that there may be
some source code configurations which have not been checked and the behaviour
of the DIYSTAMP Compiler will not match that of the Parallax compiler; this is
most likely to be in cases where the Parallax compiler will allow programs with
incorrect, or questionable, semantics to compile whereas the DIYSTAMP compiler
will won't. To check that this is not the case, please attempt to
recompile using the /STRICT switch, which will turn off most semantic checks,
and see if the problem still remains.
Before reporting a bug, please check that you have the latest Version
2.02
DIYSTAMP Compiler.
If you do find any significant bugs, please send an email detailing the
problem ( with an indication as to how to replicate the bug, and specifying the
compiler version number ) to
hippy@psynet.net.
Licensing
The DIYSTAMP Compiler is provided as Freeware for personal, educational and
non-commercial use.
The DIYSTAMP Compiler Executable and Source Code may be modified and
redistributed providing that the Licensing and Copyright statements contained
therein are unchanged, and no charge other than distribution cost is made.
For commercial use of the DIYSTAMP Compiler, or any program derived from it,
please contact the original author, and Copyright holder, at
hippy@psynet.net.
Warranty
The DIYSTAMP Compiler is provided "as is", without any warranty of any kind,
and without any guarantee as to fitness for purpose.
Downloading, installing and using the DIYSTAMP Compiler is undertaken at your
own risk.
Enhancements to the Basic Stamp
With the Basic Stamp architecture now well understood, it is necessary to
step back and decide what we want or own interpreter to do. We could choose to
stick with exactly what Parallax and PBASIC offer us, but we have a golden
opportunity to to build upon their original design.
When Parallax designed the original Basic Stamp, they were constrained by the
availability of processors which they could use within that product. This
greatly limited what the PBASIC interpreter could do, and lead to the
limitations on data memory, program size and subroutine calls which would not
have existed had a better processor been available for use at the time.
Now that processors have improved, available at lower prices, with more memory,
in-built EEPROM, Flash code storage, and a variety of enhancements, it is
possible to overcome the previous limitations and produce an enhanced Basic
Stamp style interpreter. This is primary goal of the Build Your Own Stamp
project.
Providing additional serial baud rates is a simple exercise in
extending the interpreter to accept a wider range of values which specify the
baud rate to use. Other enhancements ( except those which are provided for by
the compiler, and generate compiled images which are 100% compatible with the
Basic Stamp ), require more significant changes to the interpreter, and
changes to the compiler to support the code generation for the enhanced
version.
A primary goal is that the enhanced interpreter must execute compiled image
code which has been generated for the Basic Stamp, and so enhancements must
be made to fit in with the existing compiled image structure. The enhanced
interpreter will run original PBASIC compiled images and enhanced compiled
images, but it is accepted that the enhanced images will not run on a Basic
Stamp. The DIYSTAMP Compiler will generate compiled code that will run on
the Basic Stamp, the enhanced interpreter, and often both.
The most obvious enhancement which can be made is to lose the GOSUB Return
Address Table. With more data memory, an interpreter can push the actual return
addresses to a stack, rather than an indicator of which GOSUB call it was, and
the table becomes completely redundant. This frees up all compiled image
space which would have been taken up by the table.
The consequence of using a proper subroutine stack is that W6,
b12 and b13
are no longer corrupted within subroutines, giving those variables back to the
programmer who no longer needs fear losing data stored within them during
execution.
The PBASIC interpreter allows the addressing of 64 variables, of which 56 are
mapped onto words, bytes, bits and I/O ports and pins for use; the eight unused
addresses can be used to provide additional variables for programmer's use.
The most obvious enhancement is to provide w7 and its corresponding byte
parts b14 and b15, which is easily accomplished.
With an increase in data memory available, it is desirable to allow the
programmer to use that memory if the interpreter isn't going to.
The data memory can be treated as an array of bytes which the programmer can
use ( ram[...] ), and it makes sense to provide a means of
accessing the memory by way of indexing. To support this, three additional
variables are added - ram[w0], ram[w1]
and ram[w2].
One thing which is noticeably lacking in the original Basic Stamp is an inability
to use interrupts. Although changes on input lines can be acted upon by using
the BUTTON command, it would be much nicer to have the code idling, with an
automatic jump to a subroutine when the input lines change.
This can be supported by two additional variables; an Interrupt Handler
variable ( set only by the INTERRUPT statement ), which stores the address of
the interrupt handler to use, and an Interrupt Mask variable ( mask ) to
determine which input lines should be monitored.
As well as providing indexed access into the ram[...]
array, it is desirable to access that array
directly ( ie, ram[45] ). This is achieved by modification
of the compiled image tokens which represent constant numbers.
The compiled image deals with 1, 4, 8 and 16-bit numbers differently, compacting
the image when small numbers are used. This means that a number below 16 will
always be stored as a 1 or 4-bit number and a number below 256 will be stored as
an 8-bit number.
If we find an 8-bit number which has its top four bits cleared, or a 16-bit
number with its top eight bits cleared, we know that this is not what the
compiler would have generated for that constant. We can use this feature to
gain access to 264 previously unused tokens.
We use 256 of these tokens to provide read-only access to
ram[0] through ram[255], and use the
other eight to provide read-only access to another array,
rtc[0] through rtc[7]. The
rtc[...] array is used to provide date and time information
which was not available on the original Basic Stamp.
The use of the ram[...] array not only provides additional
byte variables but also allows for the creation of large data tables without
utilising EEPROM data as would normally be necessary; a program is first run to
initialise the ram[...] array, and subsequent programs can
then be used to utilise the pre-loaded data, and modify it as required.
All addresses used in IF .. THEN, GOTO, GOSUB and INTERRUPT commands are stored
as a token which is 11-bits long; an 8-bit indicator of the byte within which
the destination of the jump starts, and a 7-bit offset into the byte to where
the first bit of the token jumped to is.
Addressees must therefore point to valid tokens which are the start of a command.
This means that there are numerous possible address tokens which are unusable
in most programs, however, it is difficult to determine which they
are, except for those which indicate a jump to a token which claims to start
in the first 19 bits of the compiled code. These bits are not part of the
executable code, but indicators of the size of the compiled program, and where
the program ends. We also know that the first five bits of the program are those
that form the first token of the program, of which four would never be jumped
to. Likewise, the last four bits of the compiled image would never be jumped to
either; a total of 27 known, and guaranteed, special cases of address
destinations.
We can use this feature to generate special destination addresses for
IF ... THEN, GOTO, GOSUB and INTERRUPT commands.
The maximum compiled image size of 256 bytes is somewhat limiting, but can be
extended by allowing an interpreter to load multiple compiled images and allow
the programs to jump from one image to another. Jumping to one of the special
addresses will cause a jump to a specific compiled image. To aid implementation,
the number of pages can be limited to 16; if the address is less than 16, it is
a jump to one of compiled images, in page 0 to 15.
Whether a particular interpreter will support multiple compiled images, and if
it does, how many, will depend upon the amount of token storage space that the
processor running the interpreter has. Although single-chip interpreters are
unlikely to support more than one compiled image at a time, interpreters which
utilise a PC platform as the 'microcontroller' may well. Using a PC to emulate
a Basic Stamp is overkill with just 256 bytes of image to play with, but with
16 pages, a total of 4096 bytes of image, this may well be practical.
It may also be ideal in cases where the compiled image and interpreter are
burnt into a microcontroller's program memory, rather than uploaded into a
microcontroller which has only the interpreter built in, and stores the
compiled image in on-chip EEPROM.
Separate compiled images can be used as continuations of programs which have
exceeded the size of a single page, and can be used as entire subroutines or
interrupt handlers. Libraries of routines can be held in separate compiled
images by calling them with a variable to specify which routine needs to be
executed by way of a BRANCH statement when that page is entered. Returning from
one page to another is done by using the RETURN statement; a call to another
page is as easy as it were to a local routine.
All these optimisations fit in with the existing compiled image format; were a
compiled image using these enhancements run on a Basic Stamp, it would probably
execute without crashing ( except in the case of the loss of the GOSUB Return
Address Table and the 'badly formed' inter-page address tokens ), but the
results would be unpredictable. It is this feature which allows both PBASIC
and enhanced compiled images to be run on the enhanced interpreter without the
interpreter needing to be aware of which target the code was generated for.
There are further enhancements which can be incorporated, but these require
that the interpreter is able to determine if the image is for a Basic Stamp
or an enhanced interpreter, as the compiled image is different in each case.
The original Basic Stamp compiled images include address tokens as part of
instructions to overcome limitations of the original processor. These are no
longer required as the limitations have been removed with later processor
capabilities, and if these redundant addresses can be removed, it will free up
code space.
Both the DEBUG and SERIN commands generate an unnecessary address token, both
of which point to an address immediately after the address token itself.
Because of this characteristic, we can determine at compile time what that
address will be, and likewise, an interpreter is able to predict what the
address should be.
Using this knowledge, we know that the most significant bit of the address token
will always match the most significant bit of the address which immediately
follows it, if the two bits are not the same then it is not a valid address
token.
This fact is used to remove the redundant addresses in an enhanced compiled
image. When a redundant address is found ( in a non-enhanced compiled image ),
the first bit will match the most significant bit of the address that would be
expected after the
address token, and therefore there is a complete address token. If the bit
does not match, then we know the address token is not there. The compiler has
only to generate a correct address token, or a single inverted bit when in
enhanced mode, and the interpreter has to only check a single bit to determine
which of the two cases it is. The enhanced interpreter will therefore execute
original Basic Stamp compiled images and enhanced images with very little
modification.
The token space saved with the optimisation of DEBUG and SERIN depends upon how
many times these are used within the program. In the case of DEBUG, the saving
reduces the impact of adding DEBUG statements during code development quite
considerably.
The READ and WRITE statements also include redundant address tokens, however
these cannot be optimised away as they are with DEBUG and SERIN, as it is not
known, at the time when the address token is encountered, where the address will
point to, as this is dependent upon other tokens which follow.
To gain write access to the ram[...] and
rtc[...] arrays using a constant index, we can
utilise the IF token sequence. Because it is impossible to compare with the
Interrupt Handler variable, and the variable identifying index isn't used by the
original Basic Stamp, we can use a comparison with the Interrupt Handler to
indicate that this is not really an IF statement, but a
specifier for the storage into the ram[...] or
rtc[...] arrays. Up to that point, the tokens will have
conformed with those expected for an IF statement, and will have had no adverse
effect on the operation of the interpreter or data variables, subsequent
tokens can be dealt with as if they were part of a LET assignment sequence.
As can be seen, considerable enhancements can be made to the original Basic
Stamp by fairly easily extending an interpreter for the Basic Stamp. The
enhancements are incorporated within the original compiled image, and do not
cause excessive compiled image bloat. Any increase in code size is likely to
be offset by code optimisations elsewhere.
The Enhanced DIYSTAMP Compiler
Version
2.02
and above of the DIYSTAMP Compiler supports all the enhancements which
are described above, and is capable of generating code in a Basic Stamp
compatible format, and in the newly defined enhanced format.
All that needs to be done to generate an enhanced compiled image is to specify
the /ENHANCED switch on the DIYSTAMP command line or include it within the
source code itself. The ENHANCED.BS1 file in the
DIYSTAMP.ZIP Distribution Archive illustrates the new
enhanced capabilities supported.
There is a set of test programs for 'regression testing' of the compiler
included with the Distribution Archive; these are held in the TESTS
sun-directory. Thse can be compiled by using the "DIYSTAMP .\TESTS\* /ENHANCED"
command.
Version
2.03
and above of the UNSTAMP Disassembler supports the disassembly of enhanced
compiled images.
The main additions to the enhanced interpreter are -
- Additional w7, b14 and b15 variables
- Access to ram[...] array data
- Access to rtc[...] array data
- Interrupt handling
- Support for multiple page compiled images
As there may be interpreters which will not support all enhancements, as they
may be implemented on a microntroller which is constrained in terms of memory or
other features, every enhanced compiled image is given a 'code signature' which
flags all the enhancements used, and required to allow that compiled image to
be executed properly.
When the compiled image is uploaded to the interpreter chip, it can check if
any features are required which it does not implement, so there is no danger
that an interpreter will fail to execute as expected beccause it isn't able
to support what the compiled image needs.
The CODE2ASM Compiled Image Converter
Getting a Compiled Image from the DIYSTAMP or any other compiler is all well
and good, but it can't be used unless it is uploaded into a microcontroller
which contains a PBASIC interpreter or into a Basic Stamp Emulator or Simulator.
From my own point of view, which seems to be backed up by those who have shown
an interest in this project, there is a requirement for having a PBASIC
interpreter which is burnt into a target microcontroller along with a predefined
PBASIC program. This allows a smaller interpreter to be built, as it is not
necessary to provide upload capabilities, and it also allows an interpreter to
be developed without having to sort out the uploading first.
Another good reason for using an interpreter with a pre-loaded and predefined
PBASIC program is that the program code cannot be easily extracted by a user,
nor cannot it be inadvertantly overwritten by any other PBASIC program.
To allow Compiled Imges to be incorporated within a microcontroller, or any
other program, the CODE2ASM utility is provided within the
DIYSTAMP.ZIP Distribution Archive which will allow
the conversion of a Compiled Image into a file in a format suitable for
for use with almost any other programming language. The generated file can of
course be processed further if so required.
The DIYSTAMP compiler ( from Version 2.02 upwards ) also supports the
generation of an include file by using the /ASM command line switch. All
switches supported by CODE2ASM are also supported by DIYSTAMP.
Documentation on the CODE2ASM utility and examples of use are included in the
CODE2ASM.TXT file which is included in the Distribution Archive. The source
code is also included.
If you are using this technique to create your own interpreter, it is
recommended that the generated file be included twice ( which may require two
different conversions to be made to get the correct formats ); once in read only
code space ( for program execution ) and again in EEPROM ( for data storage ),
so a running program can update any data which is stored in memory. This will
require the interpreter to fetch program tokens from one location and read
and write data in another. Partitioning a program in this way will allow up
to 256 bytes of PBASIC tokens and 256 bytes of EEPROM data simultaneously,
although such a program would very likely crash if run using an interpreter
where both share the same 256 bytes of memory. If a program does not need to
update the EEPROM data when executing, it is possible to store the program
wholly in the microcontroller's read only memory.
The Interpreter
My original plan was to develop a PC based interpreter, or more correctly,
a simulator, and then port the interpretor over to a PIC.
I have created a basic PC based simulator for PBASIC which supports all
compiled tokens, but doesn't allow any I/O control; PC's do not have many
convenient I/O lines beyond those provided by the parallel line printer port,
sound card joystick inputs and a few serial port control lines.
The exercise did reveal some issues related to developing a properly embedded
interpreter; ensuring compact program code, and the necessity of being able to
get the compiled tokens in a PC file uploaded into the interpreting device.
The obvious move, having created the basics of an interpreter, was to start
porting the interpreter to a PIC. Unfortunately, but for ease of design, the
interpreter has been written in Basic and is not easy to compile directly to
PIC code; the 'porting' is going to have to be program creation from scratch.
This is not a particularly large problem as that will need to be done no matter
what the target processor chosen is, but it does present me with a number of
problems ...
I have not used PIC processors for a few years now, and am pretty rusty in that
field.
I truly loathe the PIC Assembly Language; and I mean really loathe it. So much
so that I would not use it given an option.
When I was using PIC's regularly I
wrote my own cross-assembler which supported assembly language mnemonics akin to
the 6800 and 8051 series processors. The cross-assembler was designed for the
old 12-bit PIC's ( 16C5X family ) and does not now appear to be simple to adapt
for use with the current 14-bit PIC's ( 16F627 etc ).
I have looked at various C, and other high-level language cross-compilers but I
am not convinced that these will be suitable in creating an interpreter which is
fast and compact. I may be wrong, but low-level coding looks like the best way
to proceed.
I am also at a disadvantage when it comes to development tools as I don't have
any PIC programming hardware and don't want to get bogged-down in home-building
and getting that to work. Although ready-built programmers can be purchased
relatively cheaply, I don't want to invest money in a project which is going
to go nowhere.
The project going nowhere is what worries me at the present. I may have set my
goals too high, and have now wandered out into waters which are too deep;
understanding PIC's, understanding PIC Assembly Language, understanding PIC
programming, needing programming tools and having to write and debug the
interpreter on top of that,
This may be just a short-term lack of focus, and loss of confidence, but I need
to step back and decide how to move on, if at all.
There is considerable interest in a Basic Stamp clone, and it appears that many
people are also considering the development of a PIC based interpreter. The best
way forward may be to let them do that job, and I can then concentrate on the
PC side tools needed; a field in which I am happier to operate.
Other projects, both design and development and mundane household tasks, have
delayed the DIYSTAMP project, but I would like to see it fulfilled.
Whilst I am uncomfortable diving-in with further PIC based development, I have
considered using the Nintendo GameBoy and the Psion Organiser II as the hardware
platform for further development. These are both incredibly cheap to get hold
of, and offer considerably more than some PIC's do at the same second-hand
price. There is a phenomenal amount of architectural and system documentation
available on each, and Software Development Kits are freely available.
Both are full microcomputer systems with large memory spaces and include LCD
displays; 160 x 144 pixels for the GameBoy and 2 lines x 16 characters
for the Psion. With additional interfaces these would be as equally suited to
controlling projects as a Basic Stamp Clone would.
Both have advantages over PIC's but also have their problems, requiring hardware
interfaces to be built and programmers to be purchased. Although the gains to be
made look useful, their use takes us away from the single-chip Basic Stamp
clone design envisaged when I embarked upon the project.
The support and encouragement to continue with a PIC based interpreter has not
gone unnoticed, and it is the obvious, and more technically correct, way to
proceed.
I do have access to an EPE ICEbreaker
development system, utilising the PIC 16F877 which may be a suitable platform to
move on with, and it is in this direction I think the next steps will be.
There will thus be a short time-out, while I take stock of the current project
state, look at the ways I can get the project moved on, and decide how best
to proceed. Please bear with me, and I'll let you know how I've progressed and
what my plans are in the near future, by the end of November 2002 at the latest.
Basic Stamp and PBASIC are registered trademarks of Parallax Inc.
PICAXE is a trademark of Revolution Education Ltd.
PICmicro is a registered trademark of Microchip Inc.
MS-DOS is a registered trademark of Microsoft Corporation.
Build Your Own Basic Stamp, DIYSTAMP, DIYSTAMP Compiler, SHOWCODE, SHOWCODE
Compiled Image Viewer, UNSTAMP, UNSTAMP Disassembler, SIMSTAMP, SIMSTAMP PC
Simulator, RUNSTAMP and RUNSTAMP Interpreter are trademarks of the Happy Hippy.