The cost of macros

I had a tool problem while reading obfuscated Lisp: I wanted automatic refactoring. In particular I wanted to be able to α-rename the obnoxious variables to something that didn't look so much like brackets. But I had to do it by hand, because there are no good refactoring tools for Lisp. This is partly because Lisp culture values tools for expression more than tools for maintenance, but partly because automating most refactorings is hard in the presence of macros.

Most Lisps have macros in their purest form: they're arbitrary functions that transform new forms to old ones. That means there's no general way to walk the arguments, because neither the function nor the expansion will tell you what parts of the call are forms, let alone what environment they belong in. There is no way to be sure the macro doesn't implement a different language, which makes analysis nearly impossible.

You can almost do it by observation: if part of a macro call appears as a form in the expansion, you can treat it as a form in the call - and you even know its environment. (Note that this requires eq, because you want to detect that it's the same value, not another identical one.) Unfortunately this fails when the same form appears more than once in the original - and this is normal for symbols, so this technique doesn't get you very far. Even if the representation of code were different, so variable references appeared as (ref x) rather than being abbreviated to x, it would still break when a form appears more than once in the same tree. And in the presence of macros, partial sharing is actually rather common, because multiple calls to the same macro often share part of their expansions. So reliably walking macro calls requires having more information about a macro than just how to expand it.

This is an advantage of more restrictive macro systems: they're easier to analyze. In a strict template-filling system like syntax-rules, you can always determine the role of a macro argument. DrScheme takes advantage of this for its fancy (but not very useful IME) syntax-highlighting. It doesn't work for procedural macros (Update: yes it does; see comments) but it could, if there were a way for macro definitions to supply the analysis along with the expander. Of course it would still be necessary to support unanalyzable mystery macros, because some macros are too hard to analyze, and because many authors won't bother.

I don't think procedural macros are a bad feature - on the contrary, I think they are the best and purest form of one of the four most important abstraction methods in any language (the other three are variables, functions, and user-defined datatypes). But they do have a cost. And I think the cost is mostly in what other tools they interfere with, not in any difficulty humans have with them.

3 comments:

  1. DrScheme's Check Syntax tool does work in presence of procedural macros. Indeed, all syntax-rules macros in PLT Scheme are macro-expanded into syntax-case procedural macros, so it could hardly work otherwise.

    ReplyDelete
  2. Oh, it works because syntax-objects specify the source for all forms, so it can easily tell what ended up where. It looks like it only fails for bindings inserted with datum->syntax-object, since they have no identifiable source, and of course for macro arguments that don't appear in the expansion. I thought I remembered seeing it failing completely on a procedural macro, but maybe that was in an old version of DrScheme or maybe I'm imagining it.

    It even supports alpha-renaming! I never noticed that before.

    So procedural macros alone don't cause trouble; it's only when they use a representation of code that doesn't carry enough information to reconstruct the original's environments from the macroexpansion. I don't like the opacity of syntax-objects, but they have a definite advantage over plain lists here. I guess the same result could be obtained in a more transparent structure as long as there was a way to attach annotations to forms, to tell where they came from.

    ReplyDelete
  3. Even when you use `datum->syntax', you can provide source location. Consider this macro:

    #lang scheme

    (define-syntax (if-it stx)
    (syntax-case stx ()
    [(kw tst thn els)
    (with-syntax ([it (datum->syntax #'kw 'it #'tst #'tst)])
    (syntax/loc stx
    (let ([it tst])
    (if it thn els))))]))

    (if-it 3
    it
    'two)

    If you hit Check Syntax, you'll see a binding arrow from the '3' to the occurrence of 'it'. Note the extra two arguments to `datum->syntax', which provide location context.

    While having syntax objects as a different datatype has some drawbacks, like making debugging harder, it's very necessary to enable all of the other features of the macro system.

    ReplyDelete

It's OK to comment on old posts.