Paul's Internet Landfill/ 2020/ Escaping Quotes is a Big Mistake

Escaping Quotes is a Big Mistake

I am procrastinating on a script that contains a line like the following:

c:\windows\system32\bash.exe -c "rsync -e "ssh -o ..." ..."

and I am filled with dread. Windows bash is going to call rsync which is going to call SSH, and passing appropriate parameters is going to be a disaster. This happens way too often, because it is really handy to have one script call another with appropriate parameters. You end up trying to mix ' and '"' quotes, and using a million backslashes to obfuscate what you really want to do. I hate it and it wastes so much time trying to get the escapes and quoting right.

Things did not need to be this way, but they are this way because we speak English and made a terrible user interface mistake: we surround quoted text with the same beginning and end character. For example, take the following quotation:

"Barindar looked at Carlos. "How goes it?" he asked."

In Microsoft Word (and maybe Markdown?) the straight quotes " might get turned into "smart quotes". But in computer languages the quotes remain the same both at the beginning and ending of the quotation. So now the computer needs to know that the " in ""How goes" actually is starting a nested quotation, not completing the outer quotation that starts with "Barindar". Thus we have to escape the inner quote, but then that messes other things up.

We can fix this easily. Instead of using the same quotation marks for opening and closing strings, we should use paired markers. Smart quotes would be one solution to this, but those are tough to compose in ASCII keyboards. A better solution might be to borrow from the French , and use « and », but these are harder to compose on imperialist US-ASCII keyboards. My original proposal was to ASCII-fy the French symbols using angle brackets: << and `>>', but later I realized that Bourne Shell (and bash, and Powershell) use these tokens for appending and here-documents.

There are some uglier solutions that might work. One might be <| and |> or even C-style comments: /" and "/. Let's try the former. Now the quotation becomes unambigious:

<|Barinder looked at Carlos. <|How goes it?|> he asked.|>

Parsing this string is much easier because it is just bracket matching, which most programming languages support already.

This could be introduced as an addition to common scripting languages like Bash and CMD and Powershell and Python, and then a lot of quoting horrors go away. You can continue to support the awful single and double quotes if you want for backwards compatibility.

I am not sure whether <| and |> are tokens in any of these languages, but if not then this is reasonably easy to support.

Language purists will note that matching brackets cannot be done using regular expressions (in the CS language theory sense). Boo hoo. Even Bash supports nested parentheses for arithmetic expressions. Furthermore, the things that we call "regular expressions" in our programming languages support backreferences, so the CS language theory use of regexps is already impure.