Other uses for Unicode

Unicode identifiers are impractical because users can't type them. But languages can safely use Unicode characters for many other purposes. For example:

  • Prompts. Doesn't every language need a distinctive prompt? Why not adopt a Unicode character, so the prompt won't look like anything users type?
  • Alternate names: If you allow one symbol to have multiple names, the pretty-printer can use Unicode while humans type in ASCII. Doing a lot of this would interfere with learning, but basics like printing Haskell \ x -> x as λ x → x shouldn't hurt much.
  • Alternate reader syntax: Similarly, print can use Unicode characters in place of the least satisfactory ASCII ones. Wouldn't it be nice if vectors printed with proper angle brackets instead of #(...)?
  • Unreadable objects: If an object's printed representation can't be read back in, it need not be typed by humans, so it can use arbitrary characters. This might improve the printed representations of functions, classes, promises, etc., which often print unhelpfully at the REPL.
  • Non-ASCII art: Languages (and libraries) often print everything as text, even things like tables that shouldn't be, because that's the only output interface they can rely on. Unicode can help make this less ugly. (Box-drawing characters!)
  • Documentation and error messages: you can use whatever you want here.

REPLs, by the way, should exploit terminal control before worrying about Unicode. Small improvements like coloring and a syntax-aware line editor make a big difference to usability. Not everyone has an IDE or wants to use it for everything.

3 comments:

  1. Provided you can turn them off. I find coloring immensely irritating and a major contributor to unreadability.

    ReplyDelete
  2. Moby Latin keyboard for U.S. Windows; Whacking Latin keyboard family for UK Windows. Both allow almost 1000 characters to be typed in a fairly intuitive way, preempting only AltGr.

    ReplyDelete
  3. John: your docs for Whacking etc do not render correctly because you don't specify UTF-8.

    http://diveintohtml5.info/semantics.html#encoding

    ReplyDelete

It's OK to comment on old posts.