Dynamically typed languages represent almost everything as tagged pointers, which sounds very inefficient: doesn't that waste a lot of space? And doesn't 64-bit addressing make this waste much, much worse?
I used to take this seriously. But one day a few years ago I was playing with Squeak, and wondering why its images (savable worlds) were so large - 10 MB, when the original Smalltalk ran in 64 KB. I stumbled across allObjectsDo:
, which iterates across all objects in the heap, so I wrote a heap profiler to find out where the space was going. Once I finished it, I discovered Squeak came with a much better one, so I used that instead. Here are all the classes that take more than 1% of the space:
Class | Purpose | # Instances | Space | % |
---|---|---|---|---|
CompiledMethod | bytecode | 33951 | 2.0 MB | 20% |
Bitmap | graphics | 1208 | 1.8 MB | 17% |
String | text | 28743 | 1.7 MB | 16% |
Array | various | 30263 | 1.3 MB | 12% |
ByteArray | various | 379 | 800 KB | 8% |
Symbol | selectors mostly | 27786 | 550 KB | 5% |
MethodDictionary | method dispatch | 3092 | 320 KB | 3% |
WeakArray | symbol table mostly | 15 | 230 KB | 2.2% |
Point | GUI | 17193 | 200 KB | 1.9% |
MorphExtension | GUI | 3344 | 170 KB | 1.6% |
Color | GUI | 8323 | 163 KB | 1.6% |
Association | hashtables | 10761 | 126 KB | 1.2% |
Float | boxed float | 9732 | 114 KB | 1.1% |
All others | 1.1 MB | 10% | ||
Classes without pointers | >6.6 MB | >65% | ||
Total | 10.2 MB | 100% |
This shows why tagged pointers are not a big space problem: there just aren't that many of them. Five of the top six classes, accounting for two thirds of the space, are binary data. (The exception is Array
.) Dynamically typed languages, it turns out, don't encode most data as tagged pointers.
Well, of course. What do you do with a large data structure? You find a more compact representation. Often that means replacing low-entropy pointers with high-entropy bytes. That's what happened to all of those five non-pointer classes. Dynamic typing makes no difference: as long as you have some sort of byte arrays (in Smalltalk, Class»variableByteSubclass:
), you can do this in dynamically typed languages the same way you do it in statically typed ones. If tagged pointers are wasteful, the waste is in boxing small objects like floats, not in the presence of pointers.
Of course this is only one image, and an odd one at that - instead of application-specific data, it's dominated by library code. The reason Squeak is so much bigger than the old Smalltalks is that it has a much bigger standard library - and no autoloading, so it's all in the image all the time. Code accounts for at least 30% of the space: CompiledMethod
, Symbol
, MethodDictionary
, and WeakArray
(most of which is the symbol table). This 3 MB would be insignificant in a bigger image, but otherwise I expect larger applications to have space profiles much like Squeak's. Most classes are made of pointers, but most of the space goes on the few that aren't.
No comments:
Post a Comment
It's OK to comment on old posts.