>>4
B was written in BPCL. BCPL was written in CPL. CPL was written in my anus. My anus was written ``in Lisp''. ``in Lisp'' was written in Common Lisp. Common Lisp was written using McCarthy's eval (seven primitive operators). McCarthy's eval was hand-compiled to Assembly.
Name:
Anonymous2011-04-18 11:35
Page 1
SBCL: a Sanely-Bootstrappable Common Lisp
Christophe Rhodes
Department of Computing
Goldsmiths, University of London
New Cross, London SE14 6NW
c.rhodes@gold.ac.uk
Abstract. This paper describes the development of an implementation
of Common Lisp with the peculiarity that it is bootstrappable neither
solely from itself, nor from some other language, but rather from a variety
of other Common Lisp implementations. We explain the motivation for
this bootstrap strategy, discuss some of the technical details involved in
achieving it, and attempt to assess the technical and social effects that
it has had on the development of the implementation and on Common
Lisp users in general.
1 Introduction
The Lisp family of languages has a long history, being invented (or perhaps
‘discovered’) by John McCarthy in the late 1950s. Of the languages still in use
today, only Fortran is older – though of course both modern Fortran and modern
Lisp are worlds removed from the versions used in the 1950s. A wide range of
dialects of Lisp were developed and disseminated in the following two decades,
until in the 1980s moves towards consolidation between the various Lisp fami-
lies occurred. The two most popular dialects of Lisp at the time of writing are
Scheme and Common Lisp, both of which have international standards docu-
ments associated with them [1,2] (though in the case of Scheme there are also
more lightweight community-led processes which have largely superseded the
international standard [3,4]).
This paper is primarily concerned with implementation strategies for Com-
mon Lisp, and how those strategies affect the ease with which development of
the implementations can occur. We do not address in this paper the implications
of the details for other environments, presenting instead a case study of the so-
cial and technical effects observed in this single domain. In particular, we do
not address the issues involved in cross-compiling Common Lisp for a different
machine architecture, as these have been discussed elsewhere (see for example
[5] and references therein).
Current Common Lisp implementations can usually support both image-
oriented and source-oriented development. Image-oriented environments (for ex-
ample, Squeak Smalltalk [6]) have as their interchange format an image file or
memory dump containing all the objects present in the system, which can be
later restarted on the same or distinct hardware. By contrast, a source-oriented
Page 2
environment uses individual, human-readable files for recording information for
reconstructing the project under development; these files are processed by the
environment to convert their contents into material which can be executed.
It would be unusual to find a Common Lisp application programmer these
days working in an image-oriented manner; it is far more usual to work with
source code stored in files and loaded using compile-file, than to define func-
tions exclusively using the evaluator or read-eval-print loop and to store state
by saving memory images, though the functionality of saving images is retained
in contemporary Common Lisp implementations (despite not being part of stan-
dardized functionality) and is most often used as a deployment strategy.
Of course, Common Lisp application programmers are used to making in-
cremental modifications to their software; Lisp environments are renowned for
having the facilities to develop functions one at a time, coupled with the ability
to use the image’s introspective capabilities for finding information about callers
and callees of functions and where variables are bound, for providing views of
data structures (through an inspector or through more specialized browsers for
classes, generic functions and the like), as well as for rapid recompilation and
incorporation of modifications.
However, as image formats are not standardized, and indeed historically do
change between releases of Common Lisp implementations, the application pro-
grammer is used to verifying from time to time that their current sources compile
cleanly from scratch – that is, that no dependency on something which is only
present in the image has been introduced in the sources.1
In the sphere of Lisp implementations themselves, however, this picture is
reversed: it is somewhat unusual to find a Lisp implementation, written primarily
in Lisp, which does not have a flavour of this image-oriented development within
it. The typical build process in this case involves using a host lisp of the same
implementation (but an earlier version), then mutating it incrementally to the
point where it matches the new sources sufficiently to be able to compile those
new sources, and then dumping an image. The mutation is in general different
for each particular change at the source code level – many changes require no
mutation at all, while changes to compiler-internal data structures may require
very involved mutations: we give an example in section 4.1.
This paper discusses Steel Bank Common Lisp (SBCL), a Common Lisp
implementation which is largely written in Lisp, while limiting and containing
the image-based incremental modification of its own self as part of its build
process to a strictly manageable level: the outcome of the build does not depend
on the state of the host lisp compiler. The rest of this paper is organized as
follows: in section 2, we describe the history and current state of Steel Bank
1 A simple but real-world example of this comes from the abstraction of a syntactic
pattern into a macro which has uses before its definition, because the most expe-
dient place to put that definition was not in the first source file to be compiled
from scratch. SBCL has a number of source files with prefix early- (for example,
early-package.lisp and early-setf.lisp) for the purpose of holding definitions
which must be seen early in the build.
Page 3
Common Lisp; then in section 3, we go into the detail of how SBCL is built,
comparing our approach with other Common Lisps. We discuss the benefits and
drawbacks of this build process in section 4, and draw conclusions in section 5.
2 Steel Bank Common Lisp
Although Steel Bank Common Lisp (SBCL) is a relatively new Common Lisp im-
plementation, it shares much code and a long development history with its clos-
est relative, Carnegie-Mellon University Common Lisp [7] (CMUCL). CMUCL
was a project funded by DARPA under CMU’s “Research on Parallel Comput-
ing” contract, and began life as SPICE Lisp. Under that contract, CMUCL was
developed continuously at Carnegie-Mellon University from the early 1980s un-
til funding was stopped in 1994; at that point, CMUCL support at CMU was
discontinued, but the project continues to this day, with a group of users and
developers collaborating over the Internet.
SBCL was announced as a CMUCL variant with a ‘clean’ bootstrap process,
in December 1999 by Bill Newman on the CMUCL developers’ mailing list [8].
Since then, it has been developed further, initially by Newman alone, then with
an increasing number of contributions from individuals, starting from the move
to public CVS hosting on SourceForge in September 2000. The number of con-
tributors has since risen significantly; at the time of writing, there are 23 people
with commit privileges to the master CVS repository, while over the course of
2007 code contributions from over 40 people were incorporated.
The system as of early 2008 contains approximately
– 90,000 lines of lisp code implementing the ‘standard library’, excluding the
Common Lisp Object System (CLOS);
– 60,000 lines of lisp code implementing the compiler (and related subsystems,
such as the debugger internals);
– between 10,000 and 20,000 lines of lisp code per architecture backend imple-
menting the code generators and low-level assembly routines;
– 20,000 lines of lisp code implementing CLOS;
– 20,000 lines of lisp code implementing contributed modules or ‘extras’;
– 35,000 lines of C and assembly code, for services such as signal handling and
garbage collection;
– 30,000 lines of shell and lisp code for regression tests.
It is perhaps worth discussing briefly why there is a substantial component
written in C and assembler: some 10% of the total. Partly this is because of
the large number of architecture/operating system pairs supported; each such
pair contributes some 200 lines of code implementing platform-specific opera-
tors (such as finding the faulting address from within a memory fault handler
function); additionally, each supported operating system (of which there five)
contributes 2000 lines, and each architecture (of seven) 2500 lines. The Garbage
Collector is about 8000 lines of code, and is written in C for essentially pragmatic
reasons: when the system is unstable enough for the GC to require debugging,
Page 4
using an external debugger (such as the GNU debugger, gdb) removes some
uncertainty in the debugging process: and such external debuggers are better
tailored to debugging C than Lisp.
We discuss the technical details of SBCL’s build process in more detail in the
next section; to give a high-level overview, SBCL’s build achieves independence
from the host lisp used to build it by embedding an SBCL compiler within the
host, before using that embedded compiler to generate a fresh, standalone SBCL
image. These two compilers, embedded and standalone, are generated from the
same source code files; this works because we have effectively done the same as
is commonly described as idiomatic Common Lisp programming style: to write
a domain-specific language for solving one’s problem, then solving the problem
in that language – but in our case, the domain-specific language happens to be
Common Lisp itself.
Most of us use C and assembler for low-level tasks, even if we can use high-level or multi-paradigm languages just as well to accomplish them (for example, you can make an assembler in Lisp using macros (or you could make one in Forth), and just use the FFI to run the code directly, like it was natively generated). In general, C is useful to have, but it's not a requirement. Most garbage collectors are still written in C as it's a low-level memory management problem and tends to require access to various non-portable OS APIs. Rest assured that this could be done entirely without C (but not without the platform's assembly and knowledge of the ABI), however C's purpose is such things and most people just use the right tool for the job!
As for SEPPLES, the only time I have to touch it is when I'm reading or editing other people's code.
Name:
Anonymous2011-04-18 12:15
define "functional"
Name:
Anonymous2011-04-18 12:16
>>16
Your mom comes to me for sex because she needs the right tool for the right job.
Name:
Anonymous2011-04-18 12:57
>>1
Most languages are up on assembly and/or bootstrapped. The only one that isn't that I can think of is Ruby. End the murdering. Save the anuses.
Name:
Anonymous2011-04-18 13:06
Trust the programmer.
Don't prevent the programmer from doing what needs to be done.
Keep the language small and simple.
Provide only one way to do an operation.
Make it fast, even if it is not guaranteed to be portable.
>>20 Keep the language small and simple.
C99 is 552 pages. Provide only one way to do an operation.
strcpy, strncpy, memcpy, memmove, ...
gets, fgets, strn?*, array vs pointers, ... Make it fast, even if it is not guaranteed to be portable.
Pointers make C ``hard'' to optimize.
I should add, 552 pages of pure nothing. R7RS' first draft includes:
- Exceptions.
- Call/cc.
- Complex numbers.
- Rational numbers.
- Delay/force.
- Byte vectors (blobs)
- FUCKING MODULES
Other than non-broken strings/vectors, in 67 pages.
>>26
C wants to be ``low-level'', and does so by using low-level concepts like pointers and those (mostly) pointless unions, and limiting how much code can be abstracted.
Now, since C code is not abstracted, and pointers ``mostly'' map one-to-one with assembly, there's nothing to optimize away, there's no abstraction! The compiler can only:
1) Optimize trivialities
2) Guess what you meant and compile to the most performant code to do that task.
And you can't help the compiler, since you can't access the machine's assembly and hand-optimize it yourself! It's slow by design.
>>28
"C wants to be ``low-level'', and does so by using low-level concepts like pointers and those (mostly) pointless unions, and limiting how much code can be abstracted."
Unions are used when you want to re-interpet a value instead of just changing it. Examples of this would be some of the older malloc implementions.
"Now, since C code is not abstracted, and pointers ``mostly'' map one-to-one with assembly, there's nothing to optimize away, there's no abstraction! The compiler can only:"
Bullshit. Compile a piece a code that the optimizations turned off. Then look at the assembly generated by this code. Now do the same thing with full optimiztions.
"1) Optimize trivialities"
No. Have you ever written production level C code?
"2) Guess what you meant and compile to the most performant code to do that task.
And you can't help the compiler, since you can't access the machine's assembly and hand-optimize it yourself! It's slow by design. "
>>32 Unions are used when you want to re-interpet a value instead of just changing it. Examples of this would be some of the older malloc implementions. mostly Bullshit. Compile a piece a code that the optimizations turned off. Then look at the assembly generated by this code. Now do the same thing with full optimiztions.
See 2. No. Have you ever written production level C code?
Yeah, you missed the point. You're a fucking moron.
How do you divmod in C? On x86 it is just one instruction.
>>34
If you both div and mod the same 2 numbers, x86 compilers may be smart enough to do it for you. Of course, in languages which support multiple returns and have smart-enough compilers, it will use a idiv or div for it (some Common Lisp compilers do this just fine) and take the result from edx:eax or rdx:rax.
>>35 If you both div and mod the same 2 numbers, x86 compilers may be smart enough to do it for you.
That's right. That's what I said, you have to trust the compiler to get performant code.
Name:
Autistic Duck2011-04-18 15:17
Trust the compiler, it will optimize your fears away
Pay no attention to assembly the perfect code is made
Annihilate the cache lines, restart the malloc walk
Who cares about the memory? compiler is your friend
>>11 shitty memory model of ancient C-based systems
Odd that no one caught this dumb statement.
The "memory model" of C is that you can store a value in memory at some address. That's it. If that memory model doesn't work for you, then programming might not be your thing.