The Brainf**ked PICAXE



Brainf**k is a minimalistic programming language, easy to learn but hard to use. It does however exhibit some interesting properties which make it worthy of further investigation.



This project describes a PICAXE program which runs as an interpreter for executing programs written in the Brainf**k programming language.

There are two notable things about the Brainf**k language; its rather embarrassing name ( which you can't easily list on a CV or drop into casual conversation, and we'll call it BF from now on ), and its apparent pointlessness.

The BF language does however have some rather interesting properties; it is Turing Complete, it has only eight instructions making it incredibly easy to learn ( even if not easy to use ) and equally as easy to write compilers and interpreters for. A BF program can easily be converted to another programming language with a minimum of effort, and often using nothing more than a text editor with a 'text substitution' function. A BF program which complies with the BF specification or a subset of it will run on any system which has a BF specification compliant compiler or interpreter.

Being Turing Complete is what gives BF its scientific curiosity value. With respect to the mathematical genius of Alan Turing and his research into computers and his concept of a "Turing Machine"; any Turing Complete computer ( providing it has infinite resources and no limitations ) can emulate any other computational device every built or that ever will be.

In practical terms, for us more down-to-earth people, the appeal of BF is in its demonstration of simplicity and the applicability to solving problems which can draw upon the concepts used in BF.


The BF Programming Language

Despite its first appearances, BF is not a joke or hoax.

Of course, any joker or hoaxer would say that, but the truth lies in an analysis of what BF is, and how it can be applied and used in the real world. Don't judge a book by its cover, and don't reject BF simply because it seems such a silly or ridiculous idea on the surface.

The BF programming language is a creation of Urban Müller ( aka Mueller ), who wanted to create a simple programming language which would require a very small interpreter for his software development. Since then, BF interpreters have been written in a number of more conventional programming languages and have been incorporated in a wide variety of devices and software applications. The original BF language has been extended upon and implemented in a number of ways by a variety of people, tailoring it for specific tasks and needs.

The original and definitive BF language has just eight instructions. Programs are plain text and each instruction is represented by a single ASCII character ...

>

  Move the Data Cell Pointer to the right; increment the Data Cell Pointer.

<

  Move the Data Cell Pointer to the left; decrement the Data Cell Pointer.

+

  Increment the value in the Data Cell pointed to by the Data Cell Pointer.

-

  Decrement the value in the Data Cell pointed to by the Data Cell Pointer.

,

  Read an ASCII character from the console and store its byte value in the Data Cell pointed to by the Data Cell Pointer.

.

  Take the value in the Data Cell pointed to by the Data Cell Pointer and send it to the console as an ASCII character.

[

  If the value in the Data Cell pointed to by the Data Cell Pointer is zero, then skip instructions until after the next matching ] instruction. Note that [ ] pairs can be nested.

]   Continue execution from the earlier matching [ instruction. Because the continuation of execution at the [ when the Data Cell is zero will cause a jump straight past the ] we went from to start with, the ] instruction only needs to return execution to the earlier matched [ when the Data Cell pointed to by the Data Cell Pointer is non-zero. This is why you will find a variety of different explanations on the actual ] instruction. Note that [ ] pairs can be nested.


A Short BF Critique

Actually, this section really ought to be called, "In Praise of BF" ...

It is interesting to compare BF with other esoteric languages which generally fall into three sets; the joke languages creating ficticious and impractical programming concepts like INTERCAL, languages which are designed to be quite pointless, insane, unusable or entirely obfuscated by design ( ZT, Whitespace, f**kf**k ), and the set of real programming languages which are designed to be minimalistic but functional in what they set out to do.

For the later set, any obfuscation and lack of clarity arises through the nature of the language's design and is not a design goal in itself. If you are exploring the mirage of weird and wonderful programming languages outside of the mainstream, this is the key element which distinguishes the useful from the 'useless', although even the later have their own merits in many cases; INTERCAL's seemingly ridiculous COME FROM statement is now being accepted as a valid and useful concept in some areas of software design.

BF definitely falls into the category of real and usable programmable languages even if its usability seems somewhat open to question. It implements a minimum of instructions, but by achieveing its Turing-Completeness it demonstrates itself to be capable of being used to implement any program that could be written.

BF is a language which is easy to implement in either a compiled form or as an interpreter, and its concept is particularly suited to having to process code which is not part of a microcontroller's main memory, such as where the program has to be stored off-chip, such as in I2C EEPROM or on a Memory card. The implementation of a BF interpreter is a good starting point for anyone who needs or is thinking about writing an interpreter.

The difficulty of using BF is in having a limited language where we are used to complex instruction sets, multiple registers, index pointers, stack manipulation and data structures which all have to be devolved down into a long sequence of fundamental instructions in BF.

At first glance, BF appears to be obfuscated by its choice of ASCII characters for its instruction set, but it is not really obfuscation at work. All the characters have an iconography which reflects what they do; < and > move through Data Cells, leftwards or rightwards, + and - alter a Data Cell's value up or down, [ and ] make for a loop structure no worse than in C and while using a comma for input seems strange, thinking of it as a pause for something ( in this case input ) sort of makes sense, whether that was the motivation for its use originally or not.

Given that BF is, by concept, a legitimate, albeit limited, language, it is hard to see what other representations could or should be used for each instruction; LEFT, RIGHT, INCREMENT, DECREMENT and so on would not make the language or its programs more clear, and would just require a lot more typing.

Arguments that BF is not a real language and can't be used to write programs in can be dismissed by reference to its Turing Completeness; it may not be easy but it can be done, failure really rests with the BF implementation being used and the programmer's own weaknesses, not BF's. Likewise, claims that BF is deliberately obfuscated to make it hard to write programs in can be summarily dismissed.

The only real legitimate complaints about BF are that it is hard to actually design and produce workable code, that it has only limited processing abilities and doing some things are just plain long-winded. All of this is true.

In general terms, BF's processing capabilities are limited to generating output from itself, or processing data read from an input console ( or data stream ) and producing output from that; it's a true input-process-output language. When embedding BF inside another system it is not that hard though to pass data between the main program and the BF program; either the main program can build up an input stream and capture the output stream ( akin to CGI programming on web servers ) or pre-defined Data Cells can be used to pass arguments to and from the BF program.

There is merit in enhancing the BF language to cut down on the long-winded coding which is otherwise necessary, and although it breaks the clean design of a pure BF implementation it does add to its usability in practice ...


Extending the BF Programming Language

In its purest incarnation, BF is not so much limited in its ability to do things ( that is a factor of implementation specifics ) but by our abilities to express what we want to do in terms of BF syntax.

Pure BF is tailored to very specific types of processing and as such does not map well to practical uses to which we may wish to put it; the PICAXE implementation of BF adds the I and O instructions to make life easier when being used with the PICAXE Programming Editor Terminal and the @ instruction allows use and control of the PICAXE output pins. Further extensions which are PICAXE specific would be to allow data to be read from digital and analogue input pins, and to provide interfaces to on-chip resources of the device; PWM, servo control, and so on. The two { and } instructions are added because debugging is a fundamental requirement for all programming efforts, and as with the other additions, the PICAXE interpreter software shows how easy it was to add them to the basic BF interpreter.

Anyone who has used a more traditional programming language will see problems that appear as major inconveniences when using BF; no ability to specify a constant ( repeated + and - instructions must be used ) and no means of direct access to any particular Data Cell ( repeated > and < instructions must be used ) which create long-winded and consequently slow executing programs for doing even the simplest of tasks.

By extending BF syntax, adding a '=n;' instruction to put the value n in the data Cell pointed to by the Data Cell Pointer and adding a '*n;' instruction to set the Data Cell Pointer to Data Cell n, we can get many capabilities of a modern-day computer processor.

The following enhanced BF program will get the contents of Data Cell 1, add the value 2 ( stored in data Cell 2 ) to it and print "*" if the value is not zero ...

    *2;         Point to Cell 2
    =2;         Put the value 2 in Cell 2

    *2;         Point to Cell 2
    [           While Cell 2 is non-zero execute these ...
      *1;         Point to Cell 1
      +           Increment Cell 1
      *2;         Point to Cell 2
      -           Decrement Cell 2
    ]           Go to earlier [

    *1;         Point to Cell 1
    [           While Cell 1 is not zero execute these ...
      =42;        Put "*" in Cell 1
      .           Print the "*"
      =0;         Zero Cell 1
    ]           Continue with next code, as Cell 1 is now zero

We could go further; by adding a '#n;' instruction, we can specify a number by which + or - would increment by ( allowing us to avoid having to use the Cell 2 variable ) and adding '(' and ')' as a matched set which acts as a simple 'if' statement, rather than a 'while' using [ and ], we move our enhanced BF language much closer to a traditional programming language ...

    *1          Point to Cell 1
    #2;         Use Constant 2
    -           Decrement Cell 1 by 2
    (           If non-zero then
      =42;      Put "*" in Cell 1
      .         Print the "*"
    )

We are well on the way to creating a perfectly usable, and fairly familiar in style, programming language ( albeit using a rather esoteric syntax ), and we can continue to extend BF to include GOTO's, subroutine calls and returns, a data stack, and any number of programming concepts and functions we wish to add.

What enhancements are possible and what is appropriate really depends upon what the programmer wishes to achieve and how far they are willing to deviate from a pure BF language; "all the way if necessary", is the pragmatic approach most will adopt. Even if you wished to retain something close to pure BF you could still add small tweaks to gain massive improvements; using the instructions 1 to 9 would allow a repeat constant to be set which could be applied to the < > + and - instructions to compact code considerably. Using just this trick alone can reduce the size of an arbitrarily chosen BF "Hello World!" program down from 134 bytes of code by almost 50% with little execution speed penalty ...

    ++++++++[->+++++++++        8+[->9+<]>.<7+[->4+<
    <]>.<+++++++[->++++<        ]>+.7+..3+.>8+[->4+<
    ]>+.+++++++..+++.>++        ]>.<<8+.8-.3+.6-.8-.
    ++++++[->++++<]>.<<+        >>+.3-[3-<+>]<.
    +++++++.--------.+++
    .------.--------.>>+
    .---[---<+>]<.

Similar savings can be attained by using 'subroutines'. The most convenient way to add subroutines to BF would be to define "A" through "Z" and "a" through "z" as the names of subroutines. Whenever &a is encountered, the code will branch off to execute the named subroutine. Subroutines themselves can be defined by using its letter name, followed by a sequence of instructions which are terminated by a ; ( with ;'s belonging to BF extensions skipped ) as demonstrated below, which prints "02468" to the console ...

    S++;=48;.&S.&S.&S.&S.

By enhancing BF to do things which the programmer feels is useful, time saving or necessary ( and often it's all three ), what we create is a programming language that does what is wanted, but all built upon the simple base of BF. This is what I have done myself in a number of commercial software applications ranging from an SMS messaging system to adding 'macro commands' to a word processor system, and as everything is built upon a very simple and small interpreter to start with, the resources required to add new functions to our own BF variant are often equally small, and usually being incremental improvements, they are usually both easy to implement and debug.

What I ended up with in my developments was far away from pure BF, but BF provided a solid foundation upon which to build and a framework within which to work. Far from being a pointless programming language of nothing but scientific interest or humorous derision, BF turned out to be quite fundamental to my own software's commercial success. If that doesn't give the BF language some credibility, I don't know what does.


The PICAXE BF Implementation

The PICAXE BF implementation is based upon the original BF specification ( and is also a superset of it ), but it is constrained by the limitations of the PICAXE device itself and affected by the extensions added, which prevents it from being truly 100% compliant with the original BF language specification.

Language Limitations and Implementation Specifics

The original BF specification calls for all characters which form a BF source code program to be ignored if the character does not represent one of the eight BF instructions; the PICAXE implementation ignores all characters which it does not recognise as valid instructions, but obviously does have to interpret those characters which represent or form part of the extensions the PICAXE implementation supports. Original BF source code which contains characters which are not those representing the eight original BF instructions may have to be modified to work correctly with the PICAXE implementation.

There are only 96 Data Cells and 256 Program Instruction cells, whereas the original specification calls for 30,000 cells in total. This means that BF programs which require more than 96 data cells or have more than 256 characters of BF source will not run without modification, and not at all if more than 256 of the eight BF instruction set characters are needed to implement the program.

Data Cells and the program code space are separate. This is primarily because the code is held in EEPROM and frequent writes to it ( as will happen when the only data operations are increment and decrement Data Cell values ) will quickly wear the chip out and, additionally, writing to Data EEPROM is unbelievably slow, and the PICAXE BF interpreter is slow enough without being crippled further. The only disadvantages of this separation are that it is impossible to write self-modifying code ( and generally that's no great loss ), but more importantly, the Data Cell Pointer cannot be moved into the program code to use any pre-defined values, but the mechanism to do this would be rather awkward anyway, and infrequently, if ever, used.

BF programs can be up to 256 characters long. All BF programs are stored in the PICAXE Data EEPROM as plain text.

There are 96 Data Cells and when the BF program commences execution the Data Cell Pointer is positioned to the left-most Data Cell. While Data Cells can be thought of as being numbered ( left to right ) as a continuous set of cells running 0 to 95, they are actually two sets of cells within the PICAXE numbered 80 to 127, and 192 to 239 ( $50 to $7F and $C0 to $EF ); the interpreter deals with handling the Data Cells to present them as a sequential list, and the actual location or numbering of Data Cells is hidden from the BF programmer and is irrelevant outside the interpreter itself.

Data Cells store positive only byte values which can range between zero and 255. All Data Cells are initialised to zero when the BF program commences execution. There is no overflow or underflow error trapping when incrementing or decrementing past the Data Cell's limit; 255 will increment to zero, and zero will decrement to 255.

All console output ( using the . and O instructions ) is done via the Download Programming Serial Out line at 4800 baud.

All console input ( using the , and I instructions ) is done via a user-defined Serial Input Pin at 4800 baud.

There is no 'end of input' handling when using console input ( using the , and I instructions ); all 'end of input' handling must be done under BF program control.

Subroutines have been implemented as described earlier, but are limited to eight which must be named by a single letter, "A" to "H", uppercase only. Subroutines may be redefined as program execution progressive; a subroutine call will always go the latest defined subroutine of that name. The subroutine stack is eight levels deep.

Execution of BF programs on the PICAXE is slow; really slow. The execution speed can be doubled by adding a "SETFREQ M8" command at the start of the PICAXE program, and this will consequently increase all console communications to 9600 baud.

Language Extensions

A number of extensions have been added to the original BF programming language in the PICAXE implementation, and these fall into three categories; those which make BF more useful when running on a PICAXE ( the ability to control the PICAXE output pins ), useful instructions missing from the original BF language design ( in particular debugging facilities ) and those which make BF more useful and easier to program ( constants, direct Data Cell Pointer setting and a program appended data Input Stream ).

The enhancements supported by the PICAXE BF implementation are as follows ...

>

  Error 1 is generated if the Data Cell Pointer is moved past the last Data Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive.

<

  Error 2 is generated if the Data Cell Pointer is moved past the first Data Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive.

[

  Error 3 is generated if no matching ] can be found.

]   Error 4 is generated if no matching [ can be found.

!

  The ! instruction marks the end of the BF program and the start of the optional Data Input Stream. No ! instruction is required for a BF program which is exactly 256 bytes long, but must be included otherwise, or random instructions may be executed after the end of the program.

I

  Similar to the , instruction except a decimal number is read from the console and the resultant byte value of that number is stored in the Data Cell pointed to by the Data Cell Pointer.

O

  Similar to the . instruction except the value in the Data Cell pointed to by the Data Cell Pointer is sent to the console as a sequence of ASCII characters representing the numeric value held in the Data Cell.

@

  Similar to the . instruction except the value in the Data Cell pointed to by the Data Cell Pointer is put on the PICAXE output pins.

?

  Similar to the , instruction except that the ASCII character is taken from the Data Input Stream which follows the ! instruction which marks the end of the BF program. Error 5 is generated if no Data Input Stream data is available.

:

  Resets the Data Input Stream so the next ? instruction will read the first ASCII character value which follows the ! instruction which marks the end of the BF program.

=

  Reads the following ASCII digit characters ( 0 to 9 ) up to a terminating ; as a decimal byte value and store in the Data Cell pointed to by the Data Cell Pointer. Example; '=27;' will store the value 27 in the Data Cell. Error 6 is generated if an error in the number specification is detected.

*

  Reads the following ASCII digit characters ( 0 to 9 ) up to a terminating ; as a decimal byte value and set the Data Cell Pointer to that value. Example; '*15;' will set the Data Cell Pointer to Data Cell 15. When using the * instruction, the Data Cells are considered to be numbered 0 to 95. Error 6 is generated if an error in the number specification is detected. Error 7 is generated if the Data Cell Pointer is set past the last Data Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive.

^

  Reads the value in the Data Cell pointed to by the Data Cell Pointer and sets the Data Cell Pointer to that vlaue. This allows Data Cells to store indexes to other Data Cells and allows Array Indexing to be achieved. Error 7 is generated if the Data Cell Pointer is set past the last Data Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive.

0 to 9

  Sets the number of times the next > < + or - instruction is to be executed. The constant must be a value between 1 and 255. A constant of zero is effectively ignored; the instruction it applies to will still be executed once.

A to H

  Defines a subroutine. The subroutine definition starts with one of the uppercase letters A to H and continues to the next ; excluding any ;'s which are part of an '=n;' or '*n;' extension.

&

  Call a subroutine. An uppercase letter, A to H, must follow the & which indicates which subroutine to call. Error 8 is generated if the letter following the ∓ is invalid. Error 9 is generated if the nested subroutine depth becomes to deep, indicating a stack overflow. Error 10 is generated when there is a stack underflow when attempting to return from a subroutine.

{

  Turns debugging on. When debugging is on, a trace of the executing program is sent to the console showing the current Instruction Pointer, the instruction being executed, plus the Data Cell Pointer and the Data Cell value as they are before the instruction is executed, which will be repeated if either change as a result of executing the instruction. The { and } instructions do not have to be used in matched or balanced pairs.

}   Turns debugging off. The { and } instructions do not have to be used in matched or balanced pairs.


The PICAXE BF Interpreter

Because of the length of the software source code, the program is included as a separate file ...

  PICAXEBF.BAS

The PICAXE interpreter is quite large, but that is mainly due to the variety of enhancements which have been made, the lack of optimisation in the instruction dispatch coding, and the addition of comprehensive error checking. A pure BF interpreter without any enhancements or error chacking came in around 240 bytes of PICAXE program code, which is remarkably close to the same size which Müller initially achieved in his own interpreter.

Error Codes

1

  A > instruction has moved the Data Cell Pointer beyond the last Data Cell; the Data Cell Pointer has a value greater than 95.

2

  A < instruction has moved the Data Cell Pointer to before the first Data Cell; the Data Cell Pointer has a value less than zero ( actually, due to the implementation, greater than 95 ).

3

  No matching ] can be found for a [ instruction.

4

  No matching [ can be found for a ] instruction.

5

  No Data Input Stream data is available for the ? instruction.

6

  An error in number formating in an '=n;' or '*n' instruction was detected.

7   A ^ or '*n;' instruction set the Data Cell Pointer past the last Data Cell; the Data Cell Pointer must maintain a value between 0 and 95 inclusive.

8   An illegal subroutine call has been specified; & must be followed by an uppercase letter only, A to H.

9   Stack overflow. An attempt has been made to call too many subroutines from within other subroutines.

10   Stack underflow. A return from a subroutine is being attempted when there is no apparant call having been made to that subroutine.


Conclusions

The BF programming language looks like it's all part of an elaborate joke, but it isn't; esoteric maybe, but it is a very powerful programming language if you can get to grips with it.

Writing BF code is a challenge, but just because it is, doesn't mean that it's something that should be instantly dismissed. Writing code in BF is an exercise which makes one think ( often very hard ) but the rewards in terms of personal satisfaction in cracking a problem can be huge and fulfilling. Most programming requires the design of algorithms and BF will test your ability to think through a problem and solution - A good challenge is the deceptively labelled 'simple program challenge' which checks every character sent from the console and prints the word "Bingo!" when you press the "X" key. And, no, I don't have a solution to that in pure BF documented, although I've got some good ideas on how to do it.

Outside of academic reflections, curiosity, and creating a challenge for oneself, the benefits of BF rest with the ability to use it as the foundation for enhanced BF implementations which can be used within other application's software with good results.

Pure BF may not have much instant appeal to most programmers, but it is a brilliant basis upon which to build. Understanding BF is a good introduction to far more complex interpreter systems which you may wish to implement in the future.


BF and Related Resources

Due to the rather distinctive naming of the BF language ( when spelt out correctly ), it's quite easy to find information related to it using your favourite search engine, and here are just a few of the places which are good starting points for understanding and using BF ...

  Mr Rock's BF Pages
  Müller's original BF Specification
  Müller's original BF interpreter code
  The BF Archive

BF is of course not the only esoteric language in the programming world, and details of many more can be found here ...

  The Esoteric Language Web-Ring

Links to a few of the more promising esoteric, unusual and minimalistic languages from the Web-Ring are presented below ( although at least one author describes their work as 'obfuscated' when that is a debateable issue ) ...

  The Great MOUSE Programming Language Revival
  SockZ
  CESIL
  Enema

If you are looking for a more comprehensive and more practical interpreter for the PICAXE, then the Extended Programming Interpreter provides for the interpretation of an Assembly Language style language which can have programs stored in on-chip Data EEPROM or external I2C EEPROM ...

  The PICAXE Extended Programming Interpreter


The Brainfuck programming language is a creation of Urban Müller. PICAXE is a trademark of Revolution Education Ltd.





Associated Articles

  The PICAXE Processors
  PICAXE News
  PICAXE Questions & Answers
  PICAXE Comparisons
  PICAXE Pinouts
  PICAXE Serial Interfacing
  PICAXE Infra-Red Interfacing
  PICAXE Wireless Interfacing
  PICAXE LCD Interfacing
  PICAXE LCD Interfacing
  A Real-Time Clock for the PICAXE-18X
  PICAXE Optimisations
  The PICAXE Birthday Box Project
  PICAXE Telephone Exchange Simulator
  The PICAXE Extended Programming Interpreter
  Build Your Own Basic Stamp
  Tech Toys



Sites to Visit

  PICAXE Home Page
  Revolution Education Ltd

  Tech-Supplies Ltd



Site Navigation

  Home Page
  What's New
  Search
  Add Bookmark
  Have Your Say
  Guestbook




First published on Monday the 9th of August, 2004 at 13:32:26
Last upload was on Saturday the 2nd of October, 2004 at 05:16:19