Quantitative results are rewarding

Recently I spent some time profiling and optimizing a certain program, and tripled its performance. My reaction was predictable. I am awesome, I thought. I made this three times as good as it was before. I'm three times as good a programmer as whoever wrote that slow code. (Even though “whoever wrote that slow code” was me.)

The results of optimization are easy to quantify: not just faster, but three times as fast. I suspect this accounts for some of the lure of optimization. Comparing performance numbers provides an obvious indication of improvement, and exaggerates its value, so even a modest optimization feels like a triumph. Adding a new feature usually has much less emotional reward, especially if it's not technically impressive or immediately useful. Fixing a bug may bring feelings of disgust at the bug, and guilt (if I created it) or contempt (if some other idiot someone else did), but it seldom feels like much of an accomplishment. Optimization, though, feels good, because it's measurable and therefore obviously valuable (even when it's not).

I also enjoy cleaning up code, for the same reason: if I make something 50 lines shorter, that tells me how much I've accomplished, so I have something to feel good about. Predictably, I feel better about making code shorter than about merely making it more readable, because readability doesn't come with numerical confirmation.

If this effect is real, it would be valuable to have a similar quantification of bugs and features, to make improvements emotionally rewarding. An automated test framework can provide this by conspicuously reporting the number of tests passed, and perhaps the change from previous versions. (It's probably best to report the number passed, not the number failed, so writing more tests makes the number better rather than worse.) If you've used a test framework with this feature, have you noticed any effect on your motivation to fix bugs?

Irregular indentation reflects semantic structure

There are well-known rules for indenting Lisp code, but it's sometimes convenient to break the rules when the code has semantic structure not reflected in the S-expressions. This sometimes happens when there are alternating sequences, as in some versions of let and cond and setf. It's also common to leave forms unindented when they're supposed to be top-level but something has been wrapped around them anyway, such as eval-when, or the library or unit forms required in some Scheme module systems. In either case, the indentation reflects the structure a human reader understands the program by, rather than the structure of its S-expressions.

JRM recently posted a particularly extreme example of irregular indentation:

(define (polynomial-erf z)
  ;; Numerical Recipies 6.2
  ;; fractional error < 1.2e-7
  (let* ((t (/ 1.0 (+ 1.0 (* .5 (abs z)))))
         (ans (- 1.0 (* t
                        (exp (+ (- (* z z))
                                (+ -1.26551223
                                   (* t (+ 1.00002368
                                   (* t (+ .37409196
                                   (* t (+ .09678418
                                   (* t (+ -.18628806 
                                   (* t (+ .27886807 
                                   (* t (+ -1.13520398
                                   (* t (+ 1.48851587
                                   (* t (+ -.82215223 
                                   (* t .17087277))))))))))))))))))))))))
    (if (>= z 0) ans (- ans))))

Note the big polynomial written in sequential multiply-and-add style. It has two semantic structures: it's a tree of arithmetic expressions, but it's also a sequence of polynomial terms. The latter interpretation is more useful, and the indentation reflects this: it's intended to be read as a sequence, so it's indented as a sequence. If it were indented normally, according to its tree structure, its semantic sequentiality would be less obvious, not to mention it would be impractically wide:

(+ -1.26551223
   (* t (+ 1.00002368
           (* t (+ .37409196
                   (* t (+ .09678418
                           (* t (+ -.18628806 
                                   (* t (+ .27886807 
                                           (* t (+ -1.13520398
                                                   (* t (+ 1.48851587
                                                           (* t (+ -.82215223 
                                                                   (* t .17087277))))))))))))))))))

Either way, that's an imposing pile of right-parens. (You could avoid this with a simple polynomial function or macro, but it's not worth the trouble for one polynomial.)

One of the reasons expression-based languages are so convenient is that their program structure reflects the semantic structure humans understand them by. Irregular indentation is a reminder that they don't always do so perfectly.

Clojure is almost as big as Common Lisp

How big is Clojure's standard library? Never mind the Java library, or even clojure.contrib; how much is built in to clojure.jar?

One way to answer this is to count definitions, in the form of public Vars:

(defn ns-size "How many public Vars does this namespace have?" [name]
  (require name)
  (count (ns-publics (find-ns name))))

(def builtins '(clojure.core clojure.core.protocols clojure.data
                clojure.inspector clojure.java.browse clojure.java.browse-ui
                clojure.java.io clojure.java.javadoc clojure.java.shell
                clojure.main clojure.pprint ;clojure.parallel ;deprecated
                clojure.reflect clojure.repl clojure.set clojure.stacktrace
                clojure.string clojure.template clojure.test clojure.test.junit
                clojure.test.tap clojure.walk clojure.xml clojure.zip))

In Clojure 1.3.0a6:

user=> (ns-size 'clojure.core)
563
user=> (reduce + (map ns-size builtins))
826

So Clojure is already approaching Common Lisp's 1064 definitions1. This measure probably overstates Clojure's size relative to CL, because it compares all of Clojure to only the standard parts of Common Lisp, but every CL implementation includes other libraries in addition to the standard. CL also tends to pack more features into each definition, through keyword arguments and complex embedded languages like format and loop, so counting definitions understates its size. On the other hand, many CL features are of dubious value, so Clojure may already have surpassed it in useful library.

If it hasn't, it will soon, because Clojure's library is still growing. The Var count has increased by 50% in the last two years:

VersionDateclojure.coreclojure.*
1.0.0May 2009458543
1.1.0Dec 2009502576
1.2.0Aug 2010546782
1.3.0a6Mar 2011563826

At this rate, Clojure 1.6 will have a bigger standard library than Common Lisp. (The timing depends on how quickly parts of clojure.contrib get assimilated into clojure.core.) I suppose when that happens, Common Lisp will still be perceived as huge and bloated, and Clojure as relatively small and clean. Any perceived complexity in Clojure will be blamed on Java, even though Java interoperation accounts for only a tiny part of Clojure's library. Scheme will still be considered small and elegant, no matter how big its libraries (or its core semantics) get. (R5RS Scheme has 196 definitions, and R6RS about 170, or 680 with its standard libraries. Racket is much bigger.) Traditional beliefs about language sizes are not very sensitive to data.

Update: riffraff on HN counts Python's definitions: len(__builtins__.__dict__.values()) ⇒ 143 definitions in the default namespace, plus len(set(chain( *[dir(o) for o in __builtins__.__dict__.values()]))) ⇒ 250 method names, so about 400 built-in definitions. There are also over 200 other modules in the standard distribution, so its full standard library is much bigger — the first 24 modules have sum([len(dir(__import__(n))) - 4 for n in "string re struct difflib StringIO cStringIO textwrap codecs unicodedata stringprep fpformat datetime calendar collections heapq bisect array sets sched mutex weakref UserDict UserList UserString".split()]) ⇒ 476 definitions, so the total is something like 5000. “Batteries included” indeed.

However, standard library size is less important than it used to be. When every language has an automatic library fetcher like Leiningen or Quicklisp or PLaneT, built-in libraries aren't much more readily available than other well-known libraries. The main obstacle to library use is no longer finding suitable libraries, or downloading them, but learning to use them.

1 Common Lisp has 978 symbols, but symbols ≠ definitions: some symbols have definitions in more than one namespace, and some have no definition. Common Lisp has 752 definitions in the function namespace (636 functions, 91 macros, and 25 special forms), 116 variables, 85 classes, 66 setf functions, and 45(?) type specifiers, for a total of 1064 definitions. There are about 30 symbols without definitions: declare and its declarations and optimization qualities, a few locally bound symbols like call-next-method, and eight lambda-list keywords. There's also plenty of room for disagreement about what counts as a definition.