John's ECMA-55 Minimal BASIC Compiler

Tue Aug 31 10:23:34 2021 UTC

This software is a compiler for Minimal BASIC as specified by the ECMA-55 Minimal BASIC (81KB, updated) standard from the ECMA International organization (formerly known as the European Computer Manufacturers Association). A version of the ECMA-55 Minimal BASIC (83KB, updated) better suited to printing (uses backspace underline sequences to more correctly match the original scanned input) is also available. Both text files are in UTF-8 encoding. Finally, the original source for this is the ECMA-55 Minimal BASIC PDF but that is not easily searchable and is relatively huge (15 MB) since it is a picture-based scan and not text-based.


Alternative Implementation: bas55

If you want an interpreter instead of a compiler, a new open source interpreter called bas55 was written in 2014 by Jorge Giner Cordero. The bas55 system even has a nice online manual including a tutorial on programming in ECMA-55 Minimal BASIC.


The target is AMD64/EM64T/x86-64 machines running a modern Linux distribution. This compiler creates assembly language output files. These are then assembled into object files and linked to create an executable. The assembly dialect used is that of GNU gas from GNU binutils, since that will be available on any modern general purpose x86-64 Linux distribution. No libc or libm is used by the generated code, which allows creating very small executables. To keep the generated code small and simple, generated code output of SIN(), COS(), TAN(), ATAN(), EXP(), POW, LOG(), RND, and RANDOMIZE is only emitted if those features are required by the BASIC program.

Paper Published!

A journal article has been published about this compiler, as of version 1.7. You can download the article from this URL:
http://www.mdpi.com/2073-431X/3/3/69

Current Status

Version 2.40 Released

This release continues the effort to implement all of the ECMA-116 Full BASIC mathematical functions by adding DATE and TIME. It also adds our first string functions, DATE$ and TIME$, also from ECMA-116 Full BASIC. It also fixes the bug where PI and MAXNUM didn't work as functions in DATA statements when -X was specified. AVX code generation was removed since it actually generated programs that were slower since this compiler has no vectorization logic. Finally, a new appendix has been added to the included book, and the generated PDF for the book is compressed and linearized, while still remaining PDF/A-1b.

See the NEWS file for many more details.

Programming Guide

I have finally finished writing an introduction to programming using ECMA-55 Minimal BASIC as the language. It is included in 2.17 and later versions of the compiler. If you are using bas55 or some other old-style BASIC, you can just directly download a PDF file of the book. This includes updates to chapters 5 and 9. The book was last modified on Mon Aug 30 09:07:38 2021 UTC and its size is approximately 1.6M.

Computer Science Programs

I have coded several `classic' computer science algorithms using ECMA-55 Minimal BASIC as demonstration programs. These are included in the compiler download file, or you can grab each one separately from this list.

SEQSEARCH.BAS Sequential Search
BSEARCH.BAS Binary Search
BSORT.BAS Bubble Sort
COMBSORT.BAS Comb Sort
FACTORIAL.BAS Recursive Factorial
FIBONACCI.BAS Generate Fibonacci numbers iteratively
FIBOR.BAS Generate Fibonacci numbers recursively
HSORT.BAS Heap Sort
ISORT.BAS Insertion Sort
MSORT.BAS Merge Sort
QSORT.BAS Quick Sort
SSORT.BAS Selection Sort
SLLIST.BAS Singly Linked List Demo
SLLISORT.BAS Singly Linked List Insertion Sort
SLLDEMO.BAS Menu-driven Singly Linked List Demo
MATMUL.BAS 3 x 3 Matrix Multiply Demo
MATINV.BAS 3 x 3 Matrix Inverse Demo
MATDET.BAS 3 x 3 Matrix Determinant Demo

I have also found this program and ported it to ECMA-55 Minimal BASIC.

ERATOSTHENESE.BAS Sieve of Eratosthenes to find prime numbers

You can download my old presentation slides from my talk entitled Resurrect MinimalBASIC in PDF format. Updated slides reflecting recent changes are available in the ECMA55-slideshow.pdf file.

You can try Source Code Version 2.40 today. If you just want to use it, but you don't want to compile it, you can use the Binary Action Pack Version 2.40 instead. If you are OK with git then you can get a read-only copy of the repository with this command:

git clone git://git.code.sf.net/p/buraphakit/MB_git MinimalBasic

Test Results

Emmanuel Roche from France kindly provided paper copies of the missing NBS test sources which were used for NBS tests numbers 56, 57, 65, 66, 67, 68, 69, 109, 117, 118, 119, 120, 121, 122, 123, and 124. All tests are now typed in and are part of the self-test 'check' and 'check32' targets. And yes, the new (to me) tests did find problems.

Jorge Giner Cordero from Spain has reported bugs in NBS tests 12, 14, 25, 39, 43, 74, 108, 115, 128, 185, 191, and 206 which I have fixed using his helpful bug reports. Test P030.BAS had a typo which was fixed for the 1.97 release. He is the same Mr. Cordero who wrote the bas55 interpreter.

Doug Kearns reported the bug about @ being accepted without exceptions. He also provided spelling fixes for the included Learn BASIC book and comments & error messages in the codegen.c file. Then he reported that having a horizontal tab in the source code failed (as it should) but with an incorrect error message. He also submitted a typo fix for ECMA-55.TXT.

The complete status of the compiler's ability to pass the NBS tests can be seen in README.NBS. The complete status of the compiler's ability to pass the new HAM tests can be seen in README.HAM.

Licensing Details

Different parts of the software have difference licenses, but all are open source and free to use.

The source code for the compiler itself is available under the GPL version 2 license only. See COPYING for details.

The license for the groff format manual pages and the included book "An Introduction to Programming with ECMA-55 Minimal BASIC" is the GNU Free Documentation License version 1.3 only. See GNU_FDL for details.

This author of the book, the groff format manual pages, and the actual compiler software is John Gatewood Ham.

The included runtime library assembly routines for SIN(), COS(), TAN(), ATAN(), LOG(), EXP(), POW (used by ^ operator), ACOS(), ASIN(), LOG2(), LOG10(), COSH(), SINH(), TANH(), SEC(), CSC(), COT(), and ANGLE() are from SLEEF-3.5.1 (tweaked), from Naoki Shibata (Boost Software License 1.0).

The included runtime library assembly routines for RND, and RANDOMIZE are public domain from ISAAC-64 (tweaked), from Bob Jenkins.

The included runtime library assembly routines for floating point output are derived from David M. Gay's dtoa and g_fmt routines. That code is copyrighted by Lucent Technologies, but is open source and freely redistributable.

The included runtime library assembly routines for accessing the Linux vDSO come from the Linux kernel and are written by Andrew Lutomirski. His code uses the Creative Commons Zero license for the reference vDSO parser and the GNU GPL v2.0 only for the stack walking and pointer setup code run at program startup that uses that reference parser.

The included runtime library assembly routines for accessing the timezone database to generate correct local date and time values come from David Olson's tzcode2020f (now maintained by Paul Eggert). The code used (generated from localtime.c and some definitions from headers) is in the public domain. It is used to implement the ECMA-116 numeric functions DATE and TIME.

The file dumpregs.s is not part of the compiler, but is optionally used for debugging. John Gatewood Ham wrote dumpregs.s and places that particular file into the public domain.

Why this dialect of BASIC?

The ECMA-55 standard was chosen over the "ANSI X3.60-1978 minimal BASIC" standard since it is free. ANSI, despite cancelling the standard, still keeps the 35 year old standard locked down and available only if you pay for it, which is a quite mean-spirited attempt to prevent any compliant free and open source implementations from being written. The same attitude exists with ISO for the "ISO 6373:1984 Data processing -- Programming languages -- Minimal BASIC" standard. This standard has many other names, such as "AS 2797-1985 Programming language - Minimal BASIC", and the only free one is ECMA-55, since all the other standards bodies are trying to kill BASIC forever.

Many products exist today that call themselves BASIC, but if it doesn't have line numbers, it's not really BASIC. If it has multi-line statements, it's not really BASIC. The modern un-BASICs have nice features, but they are not really BASIC at all, but instead seem to be more Pascal-like in nature (without the semicolons). The nicest thing I can say is that they are derivatives of BASIC. Sadly, when Kemeny and Kurtz decided to make this change, they didn't change the name of the language. One reference about the intention of the original designers to change BASIC to a more Pascal-like language I found was "BASIC Becomes a Structured Language" from Computer Language magazine's premier issue in 1984, page 21, written by Kemeny, Kurtz, and Brig Elliot. Whether the new language was better or not, this article proves that the new language was not BASIC in the original 1960's sense. It is unfortunate that the new language was considered by many to be a small evolution and not a totally new language since the syntax was intentionally entirely incompatible. The new style is called True BASIC in the article, but few people make that distiction, and the article still hints it's an evolution, and not a total rewrite of something completely different that has the same goal (a teaching langauge) but little else in common with the original 1960's Dartmouth BASIC. The style proposed in "On the Way to Standard BASIC" from BYTE Volume 7 Number 6 page 182, by Kurtz in June of 1982 is something more palatable and still uses line numbers.

I could not find any Minimal BASIC implementation at all for modern machines. Few true compilers (compilers to bytecode do not count) for any BASIC with line numbers exist, and most of them do not compile to assembly. None of them (that I could find) were both FOSS and ready for 64bit. Thus this project was undertaken to fill the need for a FOSS Minimal BASIC compiler for modern 64bit Linux machines.

http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/

Note that independently but almost at the same time, Jorge Giner Cordero created another implementation of ECMA-55 Minimal BASIC, bas55. It works well, but does not compile to assembly (or machine) code. It does, however, support multiple platforms.

How can others benefit from this project?

Well, obviously if you need a Minimal BASIC implementation on Linux this will be useful. However, I suspect mostly it will show how to generate some simple but effective code for 64bit Linux on AMD64/EM64T with a small enough system that a normal programmer can learn it and understand it. Projects like gcc and llvm are huge and learning how to modify those compilers is so hard most people just give up. This project is much simpler, which no need to learn LISP, any intermediate languages, etc. You just need to know C99 and AMD64 assembly language in the GNU gas dialect. That's still a lot, but it is a tiny fraction of what is required for production compilers.

With production compilers, especially ones with multiple front and back ends, it is easy to get overwhelmed by the amount of information required to understand it. With this Minimal BASIC compiler, that problem does not exist. Many compilers need many dependencies (cmake, or the GNU autotools, or flex/bison, etc.) but this compiler just needs make, gcc (or clang), and binutils to build. Most compilers today emit code that requires a dynamic linker. Even what gcc calls static is actually dynamic in most cases, since GNU libc wants to be able to dynamically load code to resolve things for NSS. Here the generated code doesn't link with glibc so this problem is avoided.

This project can serve as an example of how to produce programs that don't link to glibc but instead call the kernel directly. Most assembly examples for the GNU gcc that I could find are embedded asm in C or C++ programs; very little stand-alone example code exists (that I could find) for GNU as dialect beyond a few simple hello world programs. There were no good examples of dealing with floating point exceptions that I could find at the assembly level for GNU as either. This compiler emits code that does full floating point exception handling.

Now is this generated code great? No, there is a lot of room for improvement. I've aimed for correctness and simplicity. Exception handling for math operations is exceptionally ugly, but even the ugly code took me a long time to get working. Constructive feedback with examples of doing it in a way that is cleaner, but still gives the same results and doesn't require any external libraries, would be welcome.

Why was a new compiler not using SSE4.x or AVX?

My work machine while developing this compiler from August 2013 through August 2104 was a Core 2 Duo E4700 (Conroe), and my request for something newer was placed on hold. Since the processor I had available did not support those instructions, the original design didn't use them.

I am very pleased to report that my employer, Faculty of Informatics at Burapha University, provided me with a new Haswell refresh machine in September 2014. On September 13, 2014, I completed adding support for SSE4.1's PINSR[QD] & PINEXTR[QD] and now they are used when you supply the -4 option. The compiler still generates code for a Conroe by default, but when the -4 option is supplied the nicer SSE4.1 instruction sequences are used. Surprisingly, when testing on the i7-4790 the run time is essentially unchanged. However, reading the code is much nicer with the new instructions. Version 1.7 and greater include this work.

I implemented scalar support for AVX, but ultimately removed it since it made things slower, not faster since this compiler lacks vectorization.