There are two notable things about the Brainf**k language; its rather
embarrassing name ( which you can't easily list on a CV or drop into casual
conversation, and we'll call it BF from now on ), and its apparent
pointlessness.The BF language does however have some rather interesting properties; it is
Turing Complete, it has only eight instructions making it incredibly
easy to learn ( even if not easy to use ) and equally as easy to write compilers
and interpreters for. A BF program can easily be converted to another
programming language with a minimum of effort, and often using nothing more than
a text editor with a 'text substitution' function. A BF program which complies
with the BF specification or a subset of it will run on any system which has a
BF specification compliant compiler or interpreter.
Being Turing Complete is what gives BF its scientific curiosity value.
With respect to the mathematical genius of Alan Turing and his research into
computers and his concept of a "Turing Machine"; any Turing Complete
computer ( providing it has infinite resources and no limitations )
can emulate any other computational device every built or that ever will be.
In practical terms, for us more down-to-earth people, the appeal of BF is in
its demonstration of simplicity and the applicability to solving problems which
can draw upon the concepts used in BF.
The BF Programming Language
Despite its first appearances, BF is not a joke or hoax.
Of course, any joker or hoaxer would say that, but the truth lies in an analysis
of what BF is, and how it can be applied and used in the real world. Don't judge
a book by its cover, and don't reject BF simply because it seems such a silly or
ridiculous idea on the surface.
The BF programming language is a creation of
Urban Müller ( aka Mueller ),
who wanted to create a simple programming language which would require a very
small interpreter for his software development. Since then, BF interpreters
have been written in a number of more conventional programming languages and
have been incorporated in a wide variety of devices and software applications.
The original BF language has been extended upon and implemented in a number of
ways by a variety of people, tailoring it for specific tasks and needs.
The original and definitive BF language has just eight instructions. Programs
are plain text and each instruction is represented by a single ASCII
character ...
| > |
|
Move the Data Cell Pointer to the right; increment the Data Cell Pointer. |
| < |
|
Move the Data Cell Pointer to the left; decrement the Data Cell Pointer. |
| + |
|
Increment the value in the Data Cell pointed to by the Data Cell Pointer. |
| - |
|
Decrement the value in the Data Cell pointed to by the Data Cell Pointer. |
| , |
|
Read an ASCII character from the console and store its byte value in the Data
Cell pointed to by the Data Cell Pointer. |
| . |
|
Take the value in the Data Cell pointed to by the Data Cell Pointer and send it
to the console as an ASCII character. |
| [ |
|
If the value in the Data Cell pointed to by the Data Cell Pointer is zero, then
skip instructions until after the next matching ] instruction.
Note that [ ] pairs can be nested. |
| ] |
|
Continue execution from the earlier matching [ instruction. Because the
continuation of execution at the [ when the Data Cell is zero will cause a jump
straight past the ] we went from to start with, the ] instruction only needs to
return execution to the earlier matched [ when the Data Cell pointed to by the
Data Cell Pointer is non-zero. This is why you will find a variety of different
explanations on the actual ] instruction. Note that [ ] pairs can be nested.
|
A Short BF Critique
Actually, this section really ought to be called, "In Praise of
BF" ...It is interesting to compare BF with other esoteric languages which generally
fall into three sets; the joke languages creating ficticious and impractical
programming concepts like INTERCAL, languages which are designed to be quite
pointless, insane, unusable or entirely obfuscated by design ( ZT, Whitespace,
f**kf**k ), and the set of real programming languages which are designed to be
minimalistic but functional in what they set out to do.
For the later set, any obfuscation and lack of clarity arises through the
nature of the language's design and is not a design goal in itself. If you are
exploring the mirage of weird and wonderful programming languages outside of
the mainstream, this is the key element which distinguishes the useful from the
'useless', although even the later have their own merits in many cases;
INTERCAL's seemingly ridiculous COME FROM statement is now being accepted as a
valid and useful concept in some areas of software design.
BF definitely falls into the category of real and usable programmable languages
even if its usability seems somewhat open to question. It implements a minimum
of instructions, but by achieveing its Turing-Completeness it
demonstrates itself to be capable of being used to implement any program that
could be written.
BF is a language which is easy to implement in either a compiled form or as an
interpreter, and its concept is particularly suited to having to process code
which is not part of a microcontroller's main memory, such as where the program
has to be stored off-chip, such as in I2C EEPROM or on a Memory card. The
implementation of a BF interpreter is a good starting point for anyone who needs
or is thinking about writing an interpreter.
The difficulty of using BF is in having a limited language where we are used to
complex instruction sets, multiple registers, index pointers, stack manipulation
and data structures which all have to be devolved down into a long sequence of
fundamental instructions in BF.
At first glance, BF appears to be obfuscated by its choice of ASCII characters
for its instruction set, but it is not really obfuscation at work. All the
characters have an iconography which reflects what they do; < and > move
through Data Cells, leftwards or rightwards, + and - alter a Data Cell's value
up or down, [ and ] make for a loop structure no worse than in C and while
using a comma for input seems strange, thinking of it as a pause for
something ( in this case input ) sort of makes sense, whether that was the
motivation for its use originally or not.
Given that BF is, by concept, a legitimate, albeit limited, language, it is
hard to see what other representations could or should be used for each
instruction; LEFT, RIGHT, INCREMENT, DECREMENT and so on would not make the
language or its programs more clear, and would just require a lot more typing.
Arguments that BF is not a real language and can't be used to write programs in
can be dismissed by reference to its Turing Completeness; it may not be
easy but it can be done, failure really rests with the BF implementation being
used and the programmer's own weaknesses, not BF's. Likewise, claims that
BF is deliberately obfuscated to make it hard to write programs in can be
summarily dismissed.
The only real legitimate complaints about BF are that it is hard to actually
design and produce workable code, that it has only limited processing abilities
and doing some things are just plain long-winded. All of this is true.
In general terms, BF's processing capabilities are limited to generating output
from itself, or processing data read from an input console ( or data stream )
and producing output from that; it's a true input-process-output language. When
embedding BF inside another system it is not that hard though to pass data
between the main program and the BF program; either the main program can build
up an input stream and capture the output stream ( akin to CGI programming on
web servers ) or pre-defined Data Cells can be used to pass arguments to and
from the BF program.
There is merit in enhancing the BF language to cut down on the long-winded
coding which is otherwise necessary, and although it breaks the clean design of
a pure BF implementation it does add to its usability in practice ...
Extending the BF Programming Language
In its purest incarnation, BF is not so much limited in its ability to do
things ( that is a factor of implementation specifics ) but by our abilities to
express what we want to do in terms of BF syntax.
Pure BF is tailored to very specific types of processing and as such does not
map well to practical uses to which we may wish to put it; the PICAXE
implementation of BF adds the I and O instructions to make life easier when
being used with the PICAXE Programming Editor Terminal and the @ instruction
allows use and control of the PICAXE output pins. Further extensions
which are PICAXE specific would be to allow data to be read from digital and
analogue input pins, and to provide interfaces to on-chip resources of the
device; PWM, servo control, and so on. The two { and } instructions are added
because debugging is a fundamental requirement for all programming efforts, and
as with the other additions, the PICAXE interpreter software shows how easy it
was to add them to the basic BF interpreter.
Anyone who has used a more traditional programming language will see problems
that appear as major inconveniences when using BF; no ability to specify a
constant ( repeated + and - instructions must be used ) and no means of
direct access to any particular Data Cell ( repeated > and < instructions must
be used ) which create long-winded and consequently slow executing programs for
doing even the simplest of tasks.
By extending BF syntax, adding a '=n;' instruction to put the value
n in the data Cell pointed to by the Data Cell Pointer and adding a
'*n;' instruction to set the Data Cell Pointer to Data Cell n, we
can get many capabilities of a modern-day computer processor.
The following enhanced BF program will get the contents of Data Cell 1, add
the value 2 ( stored in data Cell 2 ) to it and print "*" if the value is not
zero ...
*2; Point to Cell 2
=2; Put the value 2 in Cell 2
*2; Point to Cell 2
[ While Cell 2 is non-zero execute these ...
*1; Point to Cell 1
+ Increment Cell 1
*2; Point to Cell 2
- Decrement Cell 2
] Go to earlier [
*1; Point to Cell 1
[ While Cell 1 is not zero execute these ...
=42; Put "*" in Cell 1
. Print the "*"
=0; Zero Cell 1
] Continue with next code, as Cell 1 is now zero
We could go further; by adding a '#n;' instruction, we can specify a
number by which + or - would increment by ( allowing us to avoid having to use
the Cell 2 variable ) and adding '(' and ')' as a matched set which acts as a
simple 'if' statement, rather than a 'while' using [ and ], we move our
enhanced BF language much closer to a traditional programming language ...
*1 Point to Cell 1
#2; Use Constant 2
- Decrement Cell 1 by 2
( If non-zero then
=42; Put "*" in Cell 1
. Print the "*"
)
We are well on the way to creating a perfectly usable, and fairly familiar
in style, programming language ( albeit using a rather esoteric syntax ), and
we can continue to extend BF to include GOTO's, subroutine calls and returns,
a data stack, and any number of programming concepts and functions we wish to
add.
What enhancements are possible and what is appropriate really depends upon what
the programmer wishes to achieve and how far they are willing to deviate from a
pure BF language; "all the way if necessary", is the pragmatic approach most
will adopt. Even if you wished to retain something close to pure BF you could
still add small tweaks to gain massive improvements; using the instructions 1
to 9 would allow a repeat constant to be set which could be applied to the < >
+ and - instructions to compact code considerably. Using just this trick alone
can reduce the size of an arbitrarily chosen BF "Hello World!" program down
from 134 bytes of code by almost 50% with little execution speed penalty ...
++++++++[->+++++++++ 8+[->9+<]>.<7+[->4+<
<]>.<+++++++[->++++< ]>+.7+..3+.>8+[->4+<
]>+.+++++++..+++.>++ ]>.<<8+.8-.3+.6-.8-.
++++++[->++++<]>.<<+ >>+.3-[3-<+>]<.
+++++++.--------.+++
.------.--------.>>+
.---[---<+>]<.
Similar savings can be attained by using 'subroutines'. The most convenient
way to add subroutines to BF would be to define "A" through "Z" and "a" through
"z" as the names of subroutines. Whenever &a is encountered, the
code will branch off to execute the named subroutine. Subroutines themselves
can be defined by using its letter name, followed by a sequence of instructions
which are terminated by a ; ( with ;'s belonging to BF extensions skipped ) as
demonstrated below, which prints "02468" to the console ...
S++;=48;.&S.&S.&S.&S.
By enhancing BF to do things which the programmer feels is useful, time saving
or necessary ( and often it's all three ), what we create is a programming
language that does what is wanted, but all built upon the simple base of BF.
This is what I have done myself in a number of commercial
software applications ranging from an SMS messaging system to adding 'macro
commands' to a word processor system, and as everything is built upon a very
simple and small interpreter to start with, the resources required to add new
functions to our own BF variant are often equally small, and usually being
incremental improvements, they are usually both easy to implement and debug.
What I ended up with in my developments was far away from pure BF, but BF
provided a solid foundation upon which to build and a framework within which to
work. Far from being a pointless programming language of nothing but scientific
interest or humorous derision, BF turned out to be quite fundamental to my own
software's commercial success. If that doesn't give the BF language some
credibility, I don't know what does.
The PICAXE BF Implementation
The PICAXE BF implementation is based upon the original BF specification ( and
is also a superset of it ), but it is constrained by the limitations of the
PICAXE device itself and affected by the extensions added, which prevents it
from being truly 100% compliant with the original BF language specification.
Language Limitations and Implementation Specifics
The original BF specification calls for all characters which form a BF source
code program to be ignored if the character does not represent one of the eight
BF instructions; the PICAXE implementation ignores all characters which it does
not recognise as valid instructions, but obviously does have to interpret those
characters which represent or form part of the extensions the PICAXE
implementation supports. Original BF source code which contains characters which
are not those representing the eight original BF instructions may have to be
modified to work correctly with the PICAXE implementation.
There are only 96 Data Cells and 256 Program Instruction cells, whereas the
original specification calls for 30,000 cells in total. This means that BF
programs which require more than 96 data cells or have more than 256 characters
of BF source will not run without modification, and not at all if more than 256
of the eight BF instruction set characters are needed to implement the program.
Data Cells and the program code space are separate. This is primarily because
the code is held in EEPROM and frequent writes to it ( as will happen when the
only data operations are increment and decrement Data Cell values ) will quickly
wear the chip out and, additionally, writing to Data EEPROM is unbelievably
slow, and the PICAXE BF interpreter is slow enough without being crippled
further. The only disadvantages of this separation are that it is impossible to
write self-modifying code ( and generally that's no great loss ), but more
importantly, the Data Cell Pointer cannot be moved into the program code to use
any pre-defined values, but the mechanism to do this would be rather awkward
anyway, and infrequently, if ever, used.
BF programs can be up to 256 characters long. All BF programs are stored in
the PICAXE Data EEPROM as plain text.
There are 96 Data Cells and when the BF program commences execution the Data
Cell Pointer is positioned to the left-most Data Cell. While Data Cells can be
thought of as being numbered ( left to right ) as a continuous set of cells
running 0 to 95, they are actually two sets of cells within the PICAXE numbered
80 to 127, and 192 to 239 ( $50 to $7F and $C0 to $EF ); the interpreter deals
with handling the Data Cells to present them as a sequential list, and the
actual location or numbering of Data Cells is hidden from the BF programmer and
is irrelevant outside the interpreter itself.
Data Cells store positive only byte values which can range between zero and 255.
All Data Cells are initialised to zero when the BF program commences execution.
There is no overflow or underflow error trapping when incrementing or
decrementing past the Data Cell's limit; 255 will increment to zero, and zero
will decrement to 255.
All console output ( using the . and O instructions ) is done via the Download
Programming Serial Out line at 4800 baud.
All console input ( using the , and I instructions ) is done via a user-defined
Serial Input Pin at 4800 baud.
There is no 'end of input' handling when using console input ( using the , and I
instructions ); all 'end of input' handling must be done under BF program
control.
Subroutines have been implemented as described earlier, but are limited to eight
which must be named by a single letter, "A" to "H", uppercase only. Subroutines
may be redefined as program execution progressive; a subroutine call will always
go the latest defined subroutine of that name. The subroutine stack is eight
levels deep.
Execution of BF programs on the PICAXE is slow; really slow. The execution
speed can be doubled by adding a "SETFREQ M8" command at the start of the
PICAXE program, and this will consequently increase all console communications
to 9600 baud.
Language Extensions
A number of extensions have been added to the original BF programming language
in the PICAXE implementation, and these fall into three categories; those which
make BF more useful when running on a PICAXE ( the ability to control the PICAXE
output pins ), useful instructions missing from the original BF language
design ( in particular debugging facilities ) and those which make BF more
useful and easier to program ( constants, direct Data Cell Pointer setting and
a program appended data Input Stream ).
The enhancements supported by the PICAXE BF implementation are as follows ...
| > |
|
Error 1 is generated if the Data Cell Pointer is moved past the last Data
Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive. |
| < |
|
Error 2 is generated if the Data Cell Pointer is moved past the first Data
Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive. |
| [ |
|
Error 3 is generated if no matching ] can be found. |
| ] |
|
Error 4 is generated if no matching [ can be found. |
| ! |
|
The ! instruction marks the end of the BF program and the start of the optional
Data Input Stream. No ! instruction is required for a BF program which is
exactly 256 bytes long, but must be included otherwise, or random instructions
may be executed after the end of the program. |
| I |
|
Similar to the , instruction except a decimal number is read from the console
and the resultant byte value of that number is stored in the Data Cell pointed
to by the Data Cell Pointer. |
| O |
|
Similar to the . instruction except the value in the Data Cell pointed to by
the Data Cell Pointer is sent to the console as a sequence of ASCII characters
representing the numeric value held in the Data Cell. |
| @ |
|
Similar to the . instruction except the value in the Data Cell pointed to by
the Data Cell Pointer is put on the PICAXE output pins. |
| ? |
|
Similar to the , instruction except that the ASCII character is taken from the
Data Input Stream which follows the ! instruction which marks the end of the BF
program. Error 5 is generated if no Data Input Stream data is available. |
| : |
|
Resets the Data Input Stream so the next ? instruction will read the first ASCII
character value which follows the ! instruction which marks the end of the BF
program. |
| = |
|
Reads the following ASCII digit characters ( 0 to 9 ) up to a terminating ; as
a decimal byte value and store in the Data Cell pointed to by the Data Cell
Pointer. Example; '=27;' will store the value 27 in the Data Cell.
Error 6 is generated if an error in the number specification is detected. |
| * |
|
Reads the following ASCII digit characters ( 0 to 9 ) up to a terminating ; as
a decimal byte value and set the Data Cell Pointer to that value. Example;
'*15;' will set the Data Cell Pointer to Data Cell 15. When using the *
instruction, the Data Cells are considered to be numbered 0 to 95.
Error 6 is generated if an error in the number specification is detected.
Error 7 is generated if the Data Cell Pointer is set past the last Data
Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive. |
| ^ |
|
Reads the value in the Data Cell pointed to by the Data Cell Pointer and sets
the Data Cell Pointer to that vlaue. This allows Data Cells to store indexes
to other Data Cells and allows Array Indexing to be achieved.
Error 7 is generated if the Data Cell Pointer is set past the last Data
Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive. |
| 0 to 9 |
|
Sets the number of times the next > < + or - instruction is to be executed.
The constant must be a value between 1 and 255. A constant of zero is
effectively ignored; the instruction it applies to will still be executed once. |
| A to H |
|
Defines a subroutine. The subroutine definition starts with one of the
uppercase letters A to H and continues to the next ; excluding any ;'s which
are part of an '=n;' or '*n;' extension. |
| & |
|
Call a subroutine. An uppercase letter, A to H, must follow the & which
indicates which subroutine to call.
Error 8 is generated if the letter following the ∓ is invalid.
Error 9 is generated if the nested subroutine depth becomes to deep, indicating
a stack overflow.
Error 10 is generated when there is a stack underflow when attempting to return
from a subroutine. |
| { |
|
Turns debugging on. When debugging is on, a trace of the executing program is
sent to the console showing the current Instruction Pointer, the instruction
being executed, plus the Data Cell Pointer and the Data Cell value as they are
before the instruction is executed, which will be repeated if either change
as a result of executing the instruction.
The { and } instructions do not have to be used in matched or balanced pairs. |
| } |
|
Turns debugging off.
The { and } instructions do not have to be used in matched or balanced pairs.
|
The PICAXE BF Interpreter
Because of the length of the software source code, the program is included as a
separate file ...
PICAXEBF.BAS
The PICAXE interpreter is quite large, but that is mainly due to the variety of
enhancements which have been made, the lack of optimisation in the instruction
dispatch coding, and the addition of comprehensive error checking. A pure BF
interpreter without any enhancements or error chacking came in around 240 bytes
of PICAXE program code, which is remarkably close to the same size which
Müller initially achieved in his own interpreter.
Error Codes
| 1 |
|
A > instruction has moved the Data Cell Pointer beyond the last Data Cell;
the Data Cell Pointer has a value greater than 95. |
| 2 |
|
A < instruction has moved the Data Cell Pointer to before the first Data
Cell; the Data Cell Pointer has a value less than zero ( actually, due to the
implementation, greater than 95 ). |
| 3 |
|
No matching ] can be found for a [ instruction. |
| 4 |
|
No matching [ can be found for a ] instruction. |
| 5 |
|
No Data Input Stream data is available for the ? instruction. |
| 6 |
|
An error in number formating in an '=n;' or '*n' instruction was
detected. |
| 7 |
|
A ^ or '*n;' instruction set the Data Cell Pointer past the last Data
Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive. |
| 8 |
|
An illegal subroutine call has been specified; & must be followed by an
uppercase letter only, A to H. |
| 9 |
|
Stack overflow. An attempt has been made to call too many subroutines from
within other subroutines. |
| 10 |
|
Stack underflow. A return from a subroutine is being attempted when there
is no apparant call having been made to that subroutine.
|
Conclusions
The BF programming language looks like it's all part of an elaborate joke, but
it isn't; esoteric maybe, but it is a very powerful programming language if you
can get to grips with it.Writing BF code is a challenge, but just because it is, doesn't mean that it's
something that should be instantly dismissed. Writing code in BF is an
exercise which makes one think ( often very hard ) but the rewards in terms of
personal satisfaction in cracking a problem can be huge and fulfilling. Most
programming requires the design of algorithms and BF will test your ability to
think through a problem and solution - A good challenge is the deceptively
labelled 'simple program challenge' which checks every character sent from the
console and prints the word "Bingo!" when you press the "X" key. And, no, I
don't have a solution to that in pure BF documented, although I've got some
good ideas on how to do it.
Outside of academic reflections, curiosity, and creating a challenge for
oneself, the benefits of BF rest with the ability to use it as the foundation
for enhanced BF implementations which can be used within other application's
software with good results.
Pure BF may not have much instant appeal to most programmers, but it is a
brilliant basis upon which to build. Understanding BF is a good introduction to
far more complex interpreter systems which you may wish to implement in the
future.
BF and Related Resources
Due to the rather distinctive naming of the BF language ( when spelt out
correctly ), it's quite easy to find information related to it using your
favourite search engine, and here are just a few of the places which are good
starting points for understanding and using BF ...
Mr Rock's BF Pages
Müller's original BF Specification
Müller's original BF interpreter code
The BF Archive
BF is of course not the only esoteric language in the programming world, and
details of many more can be found here ...
The Esoteric Language Web-Ring
Links to a few of the more promising esoteric, unusual and minimalistic
languages from the Web-Ring are presented below ( although at least one author
describes their work as 'obfuscated' when that is a debateable issue ) ...
The Great MOUSE Programming Language Revival
SockZ
CESIL
Enema
If you are looking for a more comprehensive and more practical interpreter
for the PICAXE, then the Extended Programming Interpreter provides for the
interpretation of an Assembly Language style language which can have programs
stored in on-chip Data EEPROM or external I2C EEPROM ...
The PICAXE Extended Programming Interpreter
The Brainfuck programming language is a creation of Urban Müller.
PICAXE is a trademark of Revolution Education Ltd.