366 lines
16 KiB
Markdown
366 lines
16 KiB
Markdown
|
|
// vim: textwidth=99
|
||
|
|
/*
|
||
|
|
Meta note: This file is loaded as a .rs file by rustdoc only.
|
||
|
|
*/
|
||
|
|
/*!
|
||
|
|
|
||
|
|
A more detailed version of the [warning at the top level](super#warning) about the `quote`/`join`
|
||
|
|
family of APIs.
|
||
|
|
|
||
|
|
In general, passing the output of these APIs to a shell should recover the original string(s).
|
||
|
|
This page lists cases where it fails to do so.
|
||
|
|
|
||
|
|
In noninteractive contexts, there are only minor issues. 'Noninteractive' includes shell scripts
|
||
|
|
and `sh -c` arguments, or even scripts `source`d from interactive shells. The issues are:
|
||
|
|
|
||
|
|
- [Nul bytes](#nul-bytes)
|
||
|
|
|
||
|
|
- [Overlong commands](#overlong-commands)
|
||
|
|
|
||
|
|
If you are writing directly to the stdin of an interactive (`-i`) shell (i.e., if you are
|
||
|
|
pretending to be a terminal), or if you are writing to a cooked-mode pty (even if the other end is
|
||
|
|
noninteractive), then there is a **severe** security issue:
|
||
|
|
|
||
|
|
- [Control characters](#control-characters-interactive-contexts-only)
|
||
|
|
|
||
|
|
Finally, there are some [solved issues](#solved-issues).
|
||
|
|
|
||
|
|
# List of issues
|
||
|
|
|
||
|
|
## Nul bytes
|
||
|
|
|
||
|
|
For non-interactive shells, the most problematic input is nul bytes (bytes with value 0). The
|
||
|
|
non-deprecated functions all default to returning [`QuoteError::Nul`] when encountering them, but
|
||
|
|
the deprecated [`quote`] and [`join`] functions leave them as-is.
|
||
|
|
|
||
|
|
In Unix, nul bytes can't appear in command arguments, environment variables, or filenames. It's
|
||
|
|
not a question of proper quoting; they just can't be used at all. This is a consequence of Unix's
|
||
|
|
system calls all being designed around nul-terminated C strings.
|
||
|
|
|
||
|
|
Shells inherit that limitation. Most of them do not accept nul bytes in strings even internally.
|
||
|
|
Even when they do, it's pretty much useless or even dangerous, since you can't pass them to
|
||
|
|
external commands.
|
||
|
|
|
||
|
|
In some cases, you might fail to pass the nul byte to the shell in the first place. For example,
|
||
|
|
the following code uses [`join`] to tunnel a command over an SSH connection:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
std::process::Command::new("ssh")
|
||
|
|
.arg("myhost")
|
||
|
|
.arg("--")
|
||
|
|
.arg(join(my_cmd_args))
|
||
|
|
```
|
||
|
|
|
||
|
|
If any argument in `my_cmd_args` contains a nul byte, then `join(my_cmd_args)` will contain a nul
|
||
|
|
byte. But `join(my_cmd_args)` is itself being passed as an argument to a command (the ssh
|
||
|
|
command), and command arguments can't contain nul bytes! So this will simply result in the
|
||
|
|
`Command` failing to launch.
|
||
|
|
|
||
|
|
Still, there are other ways to smuggle nul bytes into a shell. How the shell reacts depends on the
|
||
|
|
shell and the method of smuggling. For example, here is Bash 5.2.21 exhibiting three different
|
||
|
|
behaviors:
|
||
|
|
|
||
|
|
- With ANSI-C quoting, the string is truncated at the first nul byte:
|
||
|
|
```bash
|
||
|
|
$ echo $'foo\0bar' | hexdump -C
|
||
|
|
00000000 66 6f 6f 0a |foo.|
|
||
|
|
```
|
||
|
|
|
||
|
|
- With command substitution, nul bytes are removed with a warning:
|
||
|
|
```bash
|
||
|
|
$ echo $(printf 'foo\0bar') | hexdump -C
|
||
|
|
bash: warning: command substitution: ignored null byte in input
|
||
|
|
00000000 66 6f 6f 62 61 72 0a |foobar.|
|
||
|
|
```
|
||
|
|
|
||
|
|
- When a nul byte appears directly in a shell script, it's removed with no warning:
|
||
|
|
```bash
|
||
|
|
$ printf 'echo "foo\0bar"' | bash | hexdump -C
|
||
|
|
00000000 66 6f 6f 62 61 72 0a |foobar.|
|
||
|
|
```
|
||
|
|
|
||
|
|
Zsh, in contrast, actually allows nul bytes internally, in shell variables and even arguments to
|
||
|
|
builtin commands. But if a variable is exported to the environment, or if an argument is used for
|
||
|
|
an external command, then the child process will see it silently truncated at the first nul. This
|
||
|
|
might actually be more dangerous, depending on the use case.
|
||
|
|
|
||
|
|
## Overlong commands
|
||
|
|
|
||
|
|
If you pass a long string into a shell, several things might happen:
|
||
|
|
|
||
|
|
- It might succeed, yet the shell might have trouble actually doing anything with it. For example:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
x=$(printf '%010000000d' 0); /bin/echo $x
|
||
|
|
bash: /bin/echo: Argument list too long
|
||
|
|
```
|
||
|
|
|
||
|
|
- If you're using certain shells (e.g. Busybox Ash) *and* using a pty for communication, then the
|
||
|
|
shell will impose a line length limit, ignoring all input past the limit.
|
||
|
|
|
||
|
|
- If you're using a pty in cooked mode, then by default, if you write so many bytes as input that
|
||
|
|
it fills the kernel's internal buffer, the kernel will simply drop those bytes, instead of
|
||
|
|
blocking waiting for the shell to empty out the buffer. In other words, random bits of input can
|
||
|
|
be lost, which is obviously insecure.
|
||
|
|
|
||
|
|
Future versions of this crate may add an option to [`Quoter`] to check the length for you.
|
||
|
|
|
||
|
|
## Control characters (*interactive contexts only*)
|
||
|
|
|
||
|
|
Control characters are the bytes from `\x00` to `\x1f`, plus `\x7f`. `\x00` (the nul byte) is
|
||
|
|
discussed [above](#nul-bytes), but what about the rest? Well, many of them correspond to terminal
|
||
|
|
keyboard shortcuts. For example, when you press Ctrl-A at a shell prompt, your terminal sends the
|
||
|
|
byte `\x01`. The shell sees that byte and (if not configured differently) takes the standard
|
||
|
|
action for Ctrl-A, which is to move the cursor to the beginning of the line.
|
||
|
|
|
||
|
|
This means that it's quite dangerous to pipe bytes to an interactive shell. For example, here is a
|
||
|
|
program that tries to tell Bash to echo an arbitrary string, 'safely':
|
||
|
|
```rust
|
||
|
|
use std::process::{Command, Stdio};
|
||
|
|
use std::io::Write;
|
||
|
|
|
||
|
|
let evil_string = "\x01do_something_evil; ";
|
||
|
|
let quoted = shlex::try_quote(evil_string).unwrap();
|
||
|
|
println!("quoted string is {:?}", quoted);
|
||
|
|
|
||
|
|
let mut bash = Command::new("bash")
|
||
|
|
.arg("-i") // force interactive mode
|
||
|
|
.stdin(Stdio::piped())
|
||
|
|
.spawn()
|
||
|
|
.unwrap();
|
||
|
|
let stdin = bash.stdin.as_mut().unwrap();
|
||
|
|
write!(stdin, "echo {}\n", quoted).unwrap();
|
||
|
|
```
|
||
|
|
|
||
|
|
Here's the output of the program (with irrelevant bits removed):
|
||
|
|
|
||
|
|
```text
|
||
|
|
quoted string is "'\u{1}do_something_evil; '"
|
||
|
|
/tmp comex$ do_something_evil; 'echo '
|
||
|
|
bash: do_something_evil: command not found
|
||
|
|
bash: echo : command not found
|
||
|
|
```
|
||
|
|
|
||
|
|
Even though we quoted it, Bash still ran an arbitrary command!
|
||
|
|
|
||
|
|
This is not because the quoting was insufficient, per se. In single quotes, all input is supposed
|
||
|
|
to be treated as raw data until the closing single quote. And in fact, this would work fine
|
||
|
|
without the `"-i"` argument.
|
||
|
|
|
||
|
|
But line input is a separate stage from shell syntax parsing. After all, if you type a single
|
||
|
|
quote on the keyboard, you wouldn't expect it to disable all your keyboard shortcuts. So a control
|
||
|
|
character always has its designated effect, no matter if it's quoted or backslash-escaped.
|
||
|
|
|
||
|
|
Also, some control characters are interpreted by the kernel tty layer instead, like CTRL-C to send
|
||
|
|
SIGINT. These can be an issue even with noninteractive shells, but only if using a pty for
|
||
|
|
communication, as opposed to a pipe.
|
||
|
|
|
||
|
|
To be safe, you just have to avoid sending them.
|
||
|
|
|
||
|
|
### Why not just use hex escapes?
|
||
|
|
|
||
|
|
In any normal programming languages, this would be no big deal.
|
||
|
|
|
||
|
|
Any normal language has a way to escape arbitrary characters in strings by writing out their
|
||
|
|
numeric values. For example, Rust lets you write them in hexadecimal, like `"\x4f"` (or
|
||
|
|
`"\u{1d546}"` for Unicode). In this way, arbitrary strings can be represented using only 'nice'
|
||
|
|
simple characters. Any remotely suspicious character can be replaced with a numeric escape
|
||
|
|
sequence, where the escape sequence itself consists only of alphanumeric characters and some
|
||
|
|
punctuation. The result may not be the most readable[^choices], but it's quite safe from being
|
||
|
|
misinterpreted or corrupted in transit.
|
||
|
|
|
||
|
|
Shell is not normal. It has no numeric escape sequences.
|
||
|
|
|
||
|
|
There are a few different ways to quote characters (unquoted, unquoted-with-backslash, single
|
||
|
|
quotes, double quotes), but all of them involve writing the character itself. If the input
|
||
|
|
contains a control character, the output must contain that same character.
|
||
|
|
|
||
|
|
### Mitigation: terminal filters
|
||
|
|
|
||
|
|
In practice, automating interactive shells like in the above example is pretty uncommon these days.
|
||
|
|
In most cases, the only way for a programmatically generated string to make its way to the input of
|
||
|
|
an interactive shell is if a human copies and pastes it into their terminal.
|
||
|
|
|
||
|
|
And many terminals detect when you paste a string containing control characters. iTerm2 strips
|
||
|
|
them out; gnome-terminal replaces them with alternate characters[^gr]; Kitty outright prompts for
|
||
|
|
confirmation. This mitigates the risk.
|
||
|
|
|
||
|
|
But it's not perfect. Some other terminals don't implement this check or implement it incorrectly.
|
||
|
|
Also, these checks tend to not filter the tab character, which could trigger tab completion. In
|
||
|
|
most cases that's a non-issue, because most shells support paste bracketing, which disables tab and
|
||
|
|
some other control characters[^bracketing] within pasted text. But in some cases paste bracketing
|
||
|
|
gets disabled.
|
||
|
|
|
||
|
|
### Future possibility: ANSI-C quoting
|
||
|
|
|
||
|
|
I said that shell syntax has no numeric escapes, but that only applies to *portable* shell syntax.
|
||
|
|
Bash and Zsh support an obscure alternate quoting style with the syntax `$'foo'`. It's called
|
||
|
|
["ANSI-C quoting"][ansic], and inside it you can use all the escape sequences supported by C,
|
||
|
|
including hex escapes:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
$ echo $'\x41\n\x42'
|
||
|
|
A
|
||
|
|
B
|
||
|
|
```
|
||
|
|
|
||
|
|
But other shells don't support it — including Dash, a popular choice for `/bin/sh`, and Busybox's
|
||
|
|
Ash, frequently seen on stripped-down embedded systems. This crate's quoting functionality [tries
|
||
|
|
to be compatible](crate#compatibility) with those shells, plus all other POSIX-compatible shells.
|
||
|
|
That makes ANSI-C quoting a no-go.
|
||
|
|
|
||
|
|
Still, future versions of this crate may provide an option to enable ANSI-C quoting, at the cost of
|
||
|
|
reduced portability.
|
||
|
|
|
||
|
|
### Future possibility: printf
|
||
|
|
|
||
|
|
Another option would be to invoke the `printf` command, which is required by POSIX to support octal
|
||
|
|
escapes. For example, you could 'escape' the Rust string `"\x01"` into the shell syntax `"$(printf
|
||
|
|
'\001')"`. The shell will execute the command `printf` with the first argument being literally a
|
||
|
|
backslash followed by three digits; `printf` will output the actual byte with value 1; and the
|
||
|
|
shell will substitute that back into the original command.
|
||
|
|
|
||
|
|
The problem is that 'escaping' a string into a command substitution just feels too surprising. If
|
||
|
|
nothing else, it only works with an actual shell; [other languages' shell parsing
|
||
|
|
routines](crate#compatibility) wouldn't understand it. Neither would this crate's own parser,
|
||
|
|
though that could be fixed.
|
||
|
|
|
||
|
|
Future versions of this crate may provide an option to use `printf` for quoting.
|
||
|
|
|
||
|
|
### Special note: newlines
|
||
|
|
|
||
|
|
Did you know that `\r` and `\n` are control characters? They aren't as dangerous as other control
|
||
|
|
characters (if quoted properly). But there's still an issue with them in interactive contexts.
|
||
|
|
|
||
|
|
Namely, in some cases, interactive shells and/or the tty layer will 'helpfully' translate between
|
||
|
|
different line ending conventions. The possibilities include replacing `\r` with `\n`, replacing
|
||
|
|
`\n` with `\r\n`, and others. This can't result in command injection, but it's still a lossy
|
||
|
|
transformation which can result in a failure to round-trip (i.e. the shell sees a different string
|
||
|
|
from what was originally passed to `quote`).
|
||
|
|
|
||
|
|
Numeric escapes would solve this as well.
|
||
|
|
|
||
|
|
# Solved issues
|
||
|
|
|
||
|
|
## Solved: Past vulnerability (GHSA-r7qv-8r2h-pg27 / RUSTSEC-2024-XXX)
|
||
|
|
|
||
|
|
Versions of this crate before 1.3.0 did not quote `{`, `}`, and `\xa0`.
|
||
|
|
|
||
|
|
See:
|
||
|
|
- <https://github.com/advisories/GHSA-r7qv-8r2h-pg27>
|
||
|
|
- (TODO: Add Rustsec link)
|
||
|
|
|
||
|
|
## Solved: `!` and `^`
|
||
|
|
|
||
|
|
There are two non-control characters which have a special meaning in interactive contexts only: `!` and
|
||
|
|
`^`. Luckily, these can be escaped adequately.
|
||
|
|
|
||
|
|
The `!` character triggers [history expansion][he]; the `^` character can trigger a variant of
|
||
|
|
history expansion known as [Quick Substitution][qs]. Both of these characters get expanded even
|
||
|
|
inside of double-quoted strings\!
|
||
|
|
|
||
|
|
If we're in a double-quoted string, then we can't just escape these characters with a backslash.
|
||
|
|
Only a specific set of characters can be backslash-escaped inside double quotes; the set of
|
||
|
|
supported characters depends on the shell, but it often doesn't include `!` and `^`.[^escbs]
|
||
|
|
Trying to backslash-escape an unsupported character produces a literal backslash:
|
||
|
|
```bash
|
||
|
|
$ echo "\!"
|
||
|
|
\!
|
||
|
|
```
|
||
|
|
|
||
|
|
However, these characters don't get expanded in single-quoted strings, so this crate just
|
||
|
|
single-quotes them.
|
||
|
|
|
||
|
|
But there's a Bash bug where `^` actually does get partially expanded in single-quoted strings:
|
||
|
|
```bash
|
||
|
|
$ echo '
|
||
|
|
> ^a^b
|
||
|
|
> '
|
||
|
|
|
||
|
|
!!:s^a^b
|
||
|
|
```
|
||
|
|
|
||
|
|
To work around that, this crate forces `^` to appear right after an opening single quote. For
|
||
|
|
example, the string `"^` is quoted into `'"''^'` instead of `'"^'`. This restriction is overkill,
|
||
|
|
since `^` is only meaningful right after a newline, but it's a sufficient restriction (after all, a
|
||
|
|
`^` character can't be preceded by a newline if it's forced to be preceded by a single quote), and
|
||
|
|
for now it simplifies things.
|
||
|
|
|
||
|
|
## Solved: `\xa0`
|
||
|
|
|
||
|
|
The byte `\xa0` may be treated as a shell word separator, specifically on Bash on macOS when using
|
||
|
|
the default UTF-8 locale, only when the input is invalid UTF-8. This crate handles the issue by
|
||
|
|
always using quotes for arguments containing this byte.
|
||
|
|
|
||
|
|
In fact, this crate always uses quotes for arguments containing any non-ASCII bytes. This may be
|
||
|
|
changed in the future, since it's a bit unfriendly to non-English users. But for now it
|
||
|
|
minimizes risk, especially considering the large number of different legacy single-byte locales
|
||
|
|
someone might hypothetically be running their shell in.
|
||
|
|
|
||
|
|
### Demonstration
|
||
|
|
|
||
|
|
```bash
|
||
|
|
$ echo -e 'ls a\xa0b' | bash
|
||
|
|
ls: a: No such file or directory
|
||
|
|
ls: b: No such file or directory
|
||
|
|
```
|
||
|
|
The normal behavior would be to output a single line, e.g.:
|
||
|
|
```bash
|
||
|
|
$ echo -e 'ls a\xa0b' | bash
|
||
|
|
ls: cannot access 'a'$'\240''b': No such file or directory
|
||
|
|
```
|
||
|
|
(The specific quoting in the error doesn't matter.)
|
||
|
|
|
||
|
|
### Cause
|
||
|
|
|
||
|
|
Just for fun, here's why this behavior occurs:
|
||
|
|
|
||
|
|
Bash decides which bytes serve as word separators based on the libc function [`isblank`][isblank].
|
||
|
|
On macOS on UTF-8 locales, this passes for `\xa0`, corresponding to U+00A0 NO-BREAK SPACE.
|
||
|
|
|
||
|
|
This is doubly unique compared to the other systems I tested (Linux/glibc, Linux/musl, and
|
||
|
|
Windows/MSVC). First, the other systems don't allow bytes in the range [0x80, 0xFF] to pass
|
||
|
|
<code>is<i>foo</i></code> functions in UTF-8 locales, even if the corresponding Unicode codepoint
|
||
|
|
does pass, as determined by the wide-character equivalent function, <code>isw<i>foo</i></code>.
|
||
|
|
Second, the other systems don't treat U+00A0 as blank (even using `iswblank`).
|
||
|
|
|
||
|
|
Meanwhile, Bash checks for multi-byte sequences and forbids them from being treated as special
|
||
|
|
characters, so the proper UTF-8 encoding of U+00A0, `b"\xc2\xa0"`, is not treated as a word
|
||
|
|
separator. Treatment as a word separator only happens for `b"\xa0"` alone, which is illegal UTF-8.
|
||
|
|
|
||
|
|
[ansic]: https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html
|
||
|
|
[he]: https://www.gnu.org/software/bash/manual/html_node/History-Interaction.html
|
||
|
|
[qs]: https://www.gnu.org/software/bash/manual/html_node/Event-Designators.html
|
||
|
|
[isblank]: https://man7.org/linux/man-pages/man3/isblank.3p.html
|
||
|
|
[nul]: #nul-bytes
|
||
|
|
|
||
|
|
[^choices]: This can lead to tough choices over which
|
||
|
|
characters to escape and which to leave as-is, especially when Unicode gets involved and you
|
||
|
|
have to balance the risk of confusion with the benefit of properly supporting non-English
|
||
|
|
languages.
|
||
|
|
<br>
|
||
|
|
<br>
|
||
|
|
We don't have the luxury of those choices.
|
||
|
|
|
||
|
|
[^gr]: For example, backspace (in Unicode lingo, U+0008 BACKSPACE) turns into U+2408 SYMBOL FOR BACKSPACE.
|
||
|
|
|
||
|
|
[^bracketing]: It typically disables almost all handling of control characters by the shell proper,
|
||
|
|
but one necessary exception is the end-of-paste sequence itself (which starts with the control
|
||
|
|
character `\x1b`). In addition, paste bracketing does not suppress handling of control
|
||
|
|
characters by the kernel tty layer, such as `\x03` sending SIGINT (which typically clears the
|
||
|
|
currently typed command, making it dangerous in a similar way to `\x01`).
|
||
|
|
|
||
|
|
[^escbs]: For example, Dash doesn't remove the backslash from `"\!"` because it simply doesn't know
|
||
|
|
anything about `!` as a special character: it doesn't support history expansion. On the other
|
||
|
|
end of the spectrum, Zsh supports history expansion and does remove the backslash — though only
|
||
|
|
in interactive mode. Bash's behavior is weirder. It supports history expansion, and if you
|
||
|
|
write `"\!"`, the backslash does prevent history expansion from occurring — but it doesn't get
|
||
|
|
removed!
|
||
|
|
|
||
|
|
*/
|
||
|
|
|
||
|
|
// `use` declarations to make auto links work:
|
||
|
|
use ::{quote, join, Shlex, Quoter, QuoteError};
|
||
|
|
|
||
|
|
// TODO: add more about copy-paste and human readability.
|