Escaping Quotes is a Big Mistake
I am procrastinating on a script that contains a line like the following:
c:\windows\system32\bash.exe -c "rsync -e "ssh -o ..." ..."
and I am filled with dread. Windows bash is going to call rsync which
is going to call SSH, and passing appropriate parameters is going to
be a disaster. This happens way too often, because it is really handy
to have one script call another with appropriate parameters. You end
up trying to mix '
and '"' quotes, and using a million backslashes
to obfuscate what you really want to do. I hate it and it wastes so
much time trying to get the escapes and quoting right.
Things did not need to be this way, but they are this way because we speak English and made a terrible user interface mistake: we surround quoted text with the same beginning and end character. For example, take the following quotation:
"Barindar looked at Carlos. "How goes it?" he asked."
In Microsoft Word (and maybe Markdown?) the straight quotes "
might
get turned into "smart quotes". But in computer languages the quotes
remain the same both at the beginning and ending of the quotation. So
now the computer needs to know that the "
in ""How goes" actually is
starting a nested quotation, not completing the outer quotation that
starts with "Barindar". Thus we have to escape the inner quote, but
then that messes other things up.
We can fix this easily. Instead of using the same quotation marks for
opening and closing strings, we should use paired markers. Smart
quotes would be one solution to this, but those are tough to compose
in ASCII keyboards. A better solution might be
to borrow from the French , and use
«
and »
, but these are harder
to compose on imperialist US-ASCII keyboards. My original proposal was
to ASCII-fy the French symbols using angle brackets: <<
and `>>',
but later I realized that Bourne Shell (and bash, and Powershell) use
these tokens for appending and here-documents.
There are some uglier solutions that might work. One might be <|
and
|>
or even C-style comments: /"
and "/
. Let's try the former.
Now the quotation becomes unambigious:
<|Barinder looked at Carlos. <|How goes it?|> he asked.|>
Parsing this string is much easier because it is just bracket matching, which most programming languages support already.
This could be introduced as an addition to common scripting languages like Bash and CMD and Powershell and Python, and then a lot of quoting horrors go away. You can continue to support the awful single and double quotes if you want for backwards compatibility.
I am not sure whether <|
and |>
are tokens in any of these
languages, but if not then this is reasonably easy to support.
Language purists will note that matching brackets cannot be done using regular expressions (in the CS language theory sense). Boo hoo. Even Bash supports nested parentheses for arithmetic expressions. Furthermore, the things that we call "regular expressions" in our programming languages support backreferences, so the CS language theory use of regexps is already impure.