Strings and lines

I was recently surprised by the behavior of the . regex operator in Perl: it matches any character except newline. This seems like the sort of thing I should have already known. It's right there in the documentation. But if I ever knew it, I've forgotten, which suggests I've never tried to match a regex against a multiline string.

Well, of course. How often do you do anything to strings containing newlines, except for printing them out intact?

For that matter, how often do you have such strings at all? Most strings, after all, are either 1) lines of files, which by definition don't contain newlines, or 2) names of some sort, which usually can't contain control characters of any kind. (Yes, filenames can theoretically contain control characters. But have you ever seen one, except for the time you created one to see if it worked?) Even strings that could in principle contain newlines, such as error messages, usually don't, because the possibility doesn't occur to programmers. Control characters aren't real characters, are they?

And, of course, many languages don't allow newlines in string literals. In those that do, they're rarely used.

It seems to me that I (and most programmers, probably) overgeneralize from this: I think of newlines, like nulls and other control characters, as not generally being valid in strings. Consciously, I know strings can contain arbitrary characters, but unconsciously, I think a string must be one line.

1 comment:

  1. I know someone who is constantly dealing with mismatches between what the Backspace key sends from various terminal emulators and what his *ix system expects. So he has two files in his ~/bin directory named ^H and ^?, which contain "stty bs ^H" and "stty bs ^?" respectively.

    Then when he logs in, he types Backspace Enter, which either just gives him a new prompt or runs the appropriate stty script and — gives him a new prompt. Problem solved.

    ReplyDelete

It's OK to comment on old posts.