313 lines
14 KiB
Markdown
313 lines
14 KiB
Markdown
|
|
prettyplease::unparse
|
|||
|
|
=====================
|
|||
|
|
|
|||
|
|
[<img alt="github" src="https://img.shields.io/badge/github-dtolnay/prettyplease-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/dtolnay/prettyplease)
|
|||
|
|
[<img alt="crates.io" src="https://img.shields.io/crates/v/prettyplease.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/prettyplease)
|
|||
|
|
[<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-prettyplease-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20">](https://docs.rs/prettyplease)
|
|||
|
|
[<img alt="build status" src="https://img.shields.io/github/actions/workflow/status/dtolnay/prettyplease/ci.yml?branch=master&style=for-the-badge" height="20">](https://github.com/dtolnay/prettyplease/actions?query=branch%3Amaster)
|
|||
|
|
|
|||
|
|
A minimal `syn` syntax tree pretty-printer.
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
This is a pretty-printer to turn a `syn` syntax tree into a `String` of
|
|||
|
|
well-formatted source code. In contrast to rustfmt, this library is intended to
|
|||
|
|
be suitable for arbitrary generated code.
|
|||
|
|
|
|||
|
|
Rustfmt prioritizes high-quality output that is impeccable enough that you'd be
|
|||
|
|
comfortable spending your career staring at its output — but that means
|
|||
|
|
some heavyweight algorithms, and it has a tendency to bail out on code that is
|
|||
|
|
hard to format (for example [rustfmt#3697], and there are dozens more issues
|
|||
|
|
like it). That's not necessarily a big deal for human-generated code because
|
|||
|
|
when code gets highly nested, the human will naturally be inclined to refactor
|
|||
|
|
into more easily formattable code. But for generated code, having the formatter
|
|||
|
|
just give up leaves it totally unreadable.
|
|||
|
|
|
|||
|
|
[rustfmt#3697]: https://github.com/rust-lang/rustfmt/issues/3697
|
|||
|
|
|
|||
|
|
This library is designed using the simplest possible algorithm and data
|
|||
|
|
structures that can deliver about 95% of the quality of rustfmt-formatted
|
|||
|
|
output. In my experience testing real-world code, approximately 97-98% of output
|
|||
|
|
lines come out identical between rustfmt's formatting and this crate's. The rest
|
|||
|
|
have slightly different linebreak decisions, but still clearly follow the
|
|||
|
|
dominant modern Rust style.
|
|||
|
|
|
|||
|
|
The tradeoffs made by this crate are a good fit for generated code that you will
|
|||
|
|
*not* spend your career staring at. For example, the output of `bindgen`, or the
|
|||
|
|
output of `cargo-expand`. In those cases it's more important that the whole
|
|||
|
|
thing be formattable without the formatter giving up, than that it be flawless.
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
## Feature matrix
|
|||
|
|
|
|||
|
|
Here are a few superficial comparisons of this crate against the AST
|
|||
|
|
pretty-printer built into rustc, and rustfmt. The sections below go into more
|
|||
|
|
detail comparing the output of each of these libraries.
|
|||
|
|
|
|||
|
|
| | prettyplease | rustc | rustfmt |
|
|||
|
|
|:---|:---:|:---:|:---:|
|
|||
|
|
| non-pathological behavior on big or generated code | 💚 | ❌ | ❌ |
|
|||
|
|
| idiomatic modern formatting ("locally indistinguishable from rustfmt") | 💚 | ❌ | 💚 |
|
|||
|
|
| throughput | 60 MB/s | 39 MB/s | 2.8 MB/s |
|
|||
|
|
| number of dependencies | 3 | 72 | 66 |
|
|||
|
|
| compile time including dependencies | 2.4 sec | 23.1 sec | 29.8 sec |
|
|||
|
|
| buildable using a stable Rust compiler | 💚 | ❌ | ❌ |
|
|||
|
|
| published to crates.io | 💚 | ❌ | ❌ |
|
|||
|
|
| extensively configurable output | ❌ | ❌ | 💚 |
|
|||
|
|
| intended to accommodate hand-maintained source code | ❌ | ❌ | 💚 |
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
## Comparison to rustfmt
|
|||
|
|
|
|||
|
|
- [input.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/input.rs)
|
|||
|
|
- [output.prettyplease.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.prettyplease.rs)
|
|||
|
|
- [output.rustfmt.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.rustfmt.rs)
|
|||
|
|
|
|||
|
|
If you weren't told which output file is which, it would be practically
|
|||
|
|
impossible to tell — **except** for line 435 in the rustfmt output, which
|
|||
|
|
is more than 1000 characters long because rustfmt just gave up formatting that
|
|||
|
|
part of the file:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
match segments[5] {
|
|||
|
|
0 => write!(f, "::{}", ipv4),
|
|||
|
|
0xffff => write!(f, "::ffff:{}", ipv4),
|
|||
|
|
_ => unreachable!(),
|
|||
|
|
}
|
|||
|
|
} else { # [derive (Copy , Clone , Default)] struct Span { start : usize , len : usize , } let zeroes = { let mut longest = Span :: default () ; let mut current = Span :: default () ; for (i , & segment) in segments . iter () . enumerate () { if segment == 0 { if current . len == 0 { current . start = i ; } current . len += 1 ; if current . len > longest . len { longest = current ; } } else { current = Span :: default () ; } } longest } ; # [doc = " Write a colon-separated part of the address"] # [inline] fn fmt_subslice (f : & mut fmt :: Formatter < '_ > , chunk : & [u16]) -> fmt :: Result { if let Some ((first , tail)) = chunk . split_first () { write ! (f , "{:x}" , first) ? ; for segment in tail { f . write_char (':') ? ; write ! (f , "{:x}" , segment) ? ; } } Ok (()) } if zeroes . len > 1 { fmt_subslice (f , & segments [.. zeroes . start]) ? ; f . write_str ("::") ? ; fmt_subslice (f , & segments [zeroes . start + zeroes . len ..]) } else { fmt_subslice (f , & segments) } }
|
|||
|
|
} else {
|
|||
|
|
const IPV6_BUF_LEN: usize = (4 * 8) + 7;
|
|||
|
|
let mut buf = [0u8; IPV6_BUF_LEN];
|
|||
|
|
let mut buf_slice = &mut buf[..];
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This is a pretty typical manifestation of rustfmt bailing out in generated code
|
|||
|
|
— a chunk of the input ends up on one line. The other manifestation is
|
|||
|
|
that you're working on some code, running rustfmt on save like a conscientious
|
|||
|
|
developer, but after a while notice it isn't doing anything. You introduce an
|
|||
|
|
intentional formatting issue, like a stray indent or semicolon, and run rustfmt
|
|||
|
|
to check your suspicion. Nope, it doesn't get cleaned up — rustfmt is just
|
|||
|
|
not formatting the part of the file you are working on.
|
|||
|
|
|
|||
|
|
The prettyplease library is designed to have no pathological cases that force a
|
|||
|
|
bail out; the entire input you give it will get formatted in some "good enough"
|
|||
|
|
form.
|
|||
|
|
|
|||
|
|
Separately, rustfmt can be problematic to integrate into projects. It's written
|
|||
|
|
using rustc's internal syntax tree, so it can't be built by a stable compiler.
|
|||
|
|
Its releases are not regularly published to crates.io, so in Cargo builds you'd
|
|||
|
|
need to depend on it as a git dependency, which precludes publishing your crate
|
|||
|
|
to crates.io also. You can shell out to a `rustfmt` binary, but that'll be
|
|||
|
|
whatever rustfmt version is installed on each developer's system (if any), which
|
|||
|
|
can lead to spurious diffs in checked-in generated code formatted by different
|
|||
|
|
versions. In contrast prettyplease is designed to be easy to pull in as a
|
|||
|
|
library, and compiles fast.
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
## Comparison to rustc_ast_pretty
|
|||
|
|
|
|||
|
|
- [input.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/input.rs)
|
|||
|
|
- [output.prettyplease.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.prettyplease.rs)
|
|||
|
|
- [output.rustc.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.rustc.rs)
|
|||
|
|
|
|||
|
|
This is the pretty-printer that gets used when rustc prints source code, such as
|
|||
|
|
`rustc -Zunpretty=expanded`. It's used also by the standard library's
|
|||
|
|
`stringify!` when stringifying an interpolated macro_rules AST fragment, like an
|
|||
|
|
$:expr, and transitively by `dbg!` and many macros in the ecosystem.
|
|||
|
|
|
|||
|
|
Rustc's formatting is mostly okay, but does not hew closely to the dominant
|
|||
|
|
contemporary style of Rust formatting. Some things wouldn't ever be written on
|
|||
|
|
one line, like this `match` expression, and certainly not with a comma in front
|
|||
|
|
of the closing brace:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
fn eq(&self, other: &IpAddr) -> bool {
|
|||
|
|
match other { IpAddr::V4(v4) => self == v4, IpAddr::V6(_) => false, }
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Some places use non-multiple-of-4 indentation, which is definitely not the norm:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
pub const fn to_ipv6_mapped(&self) -> Ipv6Addr {
|
|||
|
|
let [a, b, c, d] = self.octets();
|
|||
|
|
Ipv6Addr{inner:
|
|||
|
|
c::in6_addr{s6_addr:
|
|||
|
|
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF,
|
|||
|
|
0xFF, a, b, c, d],},}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
And although there isn't an egregious example of it in the link because the
|
|||
|
|
input code is pretty tame, in general rustc_ast_pretty has pathological behavior
|
|||
|
|
on generated code. It has a tendency to use excessive horizontal indentation and
|
|||
|
|
rapidly run out of width:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
::std::io::_print(::core::fmt::Arguments::new_v1(&[""],
|
|||
|
|
&match (&msg,) {
|
|||
|
|
_args =>
|
|||
|
|
[::core::fmt::ArgumentV1::new(_args.0,
|
|||
|
|
::core::fmt::Display::fmt)],
|
|||
|
|
}));
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The snippets above are clearly different from modern rustfmt style. In contrast,
|
|||
|
|
prettyplease is designed to have output that is practically indistinguishable
|
|||
|
|
from rustfmt-formatted code.
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
## Example
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
// [dependencies]
|
|||
|
|
// prettyplease = "0.2"
|
|||
|
|
// syn = { version = "2", default-features = false, features = ["full", "parsing"] }
|
|||
|
|
|
|||
|
|
const INPUT: &str = stringify! {
|
|||
|
|
use crate::{
|
|||
|
|
lazy::{Lazy, SyncLazy, SyncOnceCell}, panic,
|
|||
|
|
sync::{ atomic::{AtomicUsize, Ordering::SeqCst},
|
|||
|
|
mpsc::channel, Mutex, },
|
|||
|
|
thread,
|
|||
|
|
};
|
|||
|
|
impl<T, U> Into<U> for T where U: From<T> {
|
|||
|
|
fn into(self) -> U { U::from(self) }
|
|||
|
|
}
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
fn main() {
|
|||
|
|
let syntax_tree = syn::parse_file(INPUT).unwrap();
|
|||
|
|
let formatted = prettyplease::unparse(&syntax_tree);
|
|||
|
|
print!("{}", formatted);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
## Algorithm notes
|
|||
|
|
|
|||
|
|
The approach and terminology used in the implementation are derived from [*Derek
|
|||
|
|
C. Oppen, "Pretty Printing" (1979)*][paper], on which rustc_ast_pretty is also
|
|||
|
|
based, and from rustc_ast_pretty's implementation written by Graydon Hoare in
|
|||
|
|
2011 (and modernized over the years by dozens of volunteer maintainers).
|
|||
|
|
|
|||
|
|
[paper]: http://i.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf
|
|||
|
|
|
|||
|
|
The paper describes two language-agnostic interacting procedures `Scan()` and
|
|||
|
|
`Print()`. Language-specific code decomposes an input data structure into a
|
|||
|
|
stream of `string` and `break` tokens, and `begin` and `end` tokens for
|
|||
|
|
grouping. Each `begin`–`end` range may be identified as either "consistent
|
|||
|
|
breaking" or "inconsistent breaking". If a group is consistently breaking, then
|
|||
|
|
if the whole contents do not fit on the line, *every* `break` token in the group
|
|||
|
|
will receive a linebreak. This is appropriate, for example, for Rust struct
|
|||
|
|
literals, or arguments of a function call. If a group is inconsistently
|
|||
|
|
breaking, then the `string` tokens in the group are greedily placed on the line
|
|||
|
|
until out of space, and linebroken only at those `break` tokens for which the
|
|||
|
|
next string would not fit. For example, this is appropriate for the contents of
|
|||
|
|
a braced `use` statement in Rust.
|
|||
|
|
|
|||
|
|
Scan's job is to efficiently accumulate sizing information about groups and
|
|||
|
|
breaks. For every `begin` token we compute the distance to the matched `end`
|
|||
|
|
token, and for every `break` we compute the distance to the next `break`. The
|
|||
|
|
algorithm uses a ringbuffer to hold tokens whose size is not yet ascertained.
|
|||
|
|
The maximum size of the ringbuffer is bounded by the target line length and does
|
|||
|
|
not grow indefinitely, regardless of deep nesting in the input stream. That's
|
|||
|
|
because once a group is sufficiently big, the precise size can no longer make a
|
|||
|
|
difference to linebreak decisions and we can effectively treat it as "infinity".
|
|||
|
|
|
|||
|
|
Print's job is to use the sizing information to efficiently assign a "broken" or
|
|||
|
|
"not broken" status to every `begin` token. At that point the output is easily
|
|||
|
|
constructed by concatenating `string` tokens and breaking at `break` tokens
|
|||
|
|
contained within a broken group.
|
|||
|
|
|
|||
|
|
Leveraging these primitives (i.e. cleverly placing the all-or-nothing consistent
|
|||
|
|
breaks and greedy inconsistent breaks) to yield rustfmt-compatible formatting
|
|||
|
|
for all of Rust's syntax tree nodes is a fun challenge.
|
|||
|
|
|
|||
|
|
Here is a visualization of some Rust tokens fed into the pretty printing
|
|||
|
|
algorithm. Consistently breaking `begin`—`end` pairs are represented by
|
|||
|
|
`«`⁠`»`, inconsistently breaking by `‹`⁠`›`, `break` by `·`, and the
|
|||
|
|
rest of the non-whitespace are `string`.
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
use crate::«{·
|
|||
|
|
‹ lazy::«{·‹Lazy,· SyncLazy,· SyncOnceCell›·}»,·
|
|||
|
|
panic,·
|
|||
|
|
sync::«{·
|
|||
|
|
‹ atomic::«{·‹AtomicUsize,· Ordering::SeqCst›·}»,·
|
|||
|
|
mpsc::channel,· Mutex›,·
|
|||
|
|
}»,·
|
|||
|
|
thread›,·
|
|||
|
|
}»;·
|
|||
|
|
«‹«impl<«·T‹›,· U‹›·»>» Into<«·U·»>· for T›·
|
|||
|
|
where·
|
|||
|
|
U:‹ From<«·T·»>›,·
|
|||
|
|
{·
|
|||
|
|
« fn into(·«·self·») -> U {·
|
|||
|
|
‹ U::from(«·self·»)›·
|
|||
|
|
» }·
|
|||
|
|
»}·
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The algorithm described in the paper is not quite sufficient for producing
|
|||
|
|
well-formatted Rust code that is locally indistinguishable from rustfmt's style.
|
|||
|
|
The reason is that in the paper, the complete non-whitespace contents are
|
|||
|
|
assumed to be independent of linebreak decisions, with Scan and Print being only
|
|||
|
|
in control of the whitespace (spaces and line breaks). In Rust as idiomatically
|
|||
|
|
formatted by rustfmt, that is not the case. Trailing commas are one example; the
|
|||
|
|
punctuation is only known *after* the broken vs non-broken status of the
|
|||
|
|
surrounding group is known:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
let _ = Struct { x: 0, y: true };
|
|||
|
|
|
|||
|
|
let _ = Struct {
|
|||
|
|
x: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
|
|||
|
|
y: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy, //<- trailing comma if the expression wrapped
|
|||
|
|
};
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The formatting of `match` expressions is another case; we want small arms on the
|
|||
|
|
same line as the pattern, and big arms wrapped in a brace. The presence of the
|
|||
|
|
brace punctuation, comma, and semicolon are all dependent on whether the arm
|
|||
|
|
fits on the line:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
match total_nanos.checked_add(entry.nanos as u64) {
|
|||
|
|
Some(n) => tmp = n, //<- small arm, inline with comma
|
|||
|
|
None => {
|
|||
|
|
total_secs = total_secs
|
|||
|
|
.checked_add(total_nanos / NANOS_PER_SEC as u64)
|
|||
|
|
.expect("overflow in iter::sum over durations");
|
|||
|
|
} //<- big arm, needs brace added, and also semicolon^
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The printing algorithm implementation in this crate accommodates all of these
|
|||
|
|
situations with conditional punctuation tokens whose selection can be deferred
|
|||
|
|
and populated after it's known that the group is or is not broken.
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
#### License
|
|||
|
|
|
|||
|
|
<sup>
|
|||
|
|
Licensed under either of <a href="LICENSE-APACHE">Apache License, Version
|
|||
|
|
2.0</a> or <a href="LICENSE-MIT">MIT license</a> at your option.
|
|||
|
|
</sup>
|
|||
|
|
|
|||
|
|
<br>
|
|||
|
|
|
|||
|
|
<sub>
|
|||
|
|
Unless you explicitly state otherwise, any contribution intentionally submitted
|
|||
|
|
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
|
|||
|
|
be dual licensed as above, without any additional terms or conditions.
|
|||
|
|
</sub>
|