chore: checkpoint before Python removal

This commit is contained in:
2026-03-26 22:33:59 +00:00
parent 683cec9307
commit e568ddf82a
29972 changed files with 11269302 additions and 2 deletions

1
vendor/chumsky/.cargo-checksum.json vendored Normal file
View File

@@ -0,0 +1 @@
{"files":{".cargo_vcs_info.json":"d487b0ad463620f11e09986e57892c5b73ac06352d6c723e88bd4678c00419da",".github/FUNDING.yml":"ab18b2d04a647da2d2e56121974f3de4db30e8b05b1cb4006e204a9928acb064",".github/workflows/rust.yml":"d0bccf054d39b5800d2455a1c80aa149bc194f1b1dfc20aace1b16a3bf208e71","CHANGELOG.md":"895ee076a2b38a6e89ef7f759b62b8166efbd11780a20195011ede48a27e8613","Cargo.lock":"1f2cef542f53d35a19a5f288718d9a221a9c04e3645bb7b5ccc643287027e261","Cargo.toml":"a56a98fa593d8e4b254f2b4de729aa66d7f2d67bde4bf69a263b6deacac64524","Cargo.toml.orig":"c72c7a3daa4211ca95d8f36024886aa1bc6fe5f4258a6f94d828dab65f572ba9","LICENSE":"88f7ddf73afcffee97e0a19211ddeecd7d178ecb5c09bbfe472ce4cfcceb6269","README.md":"6fcfa1cd0095887f6e839f90a6463671ad642bb62d257650c8237b44103db429","examples/brainfuck.rs":"c10b4c29190211fb3c3e5739a5a0c2380cd0430bf27a660c8e5cf8d24782370b","examples/foo.rs":"d902a764e0c22ffa6f8ee9048813912c272f05d9c4cf830b403a2faef42595a1","examples/json.rs":"b702803d2060058f91451af8f41d361954d1afb7572197b3fe5e0eb04237290a","examples/nano_rust.rs":"3b62dcebe0ac4817301ecd6c445de42cda39b25665bb4c921a6be0fc6be12afd","examples/pythonic.rs":"1c170e18429d139ddb3757f4753ba9f5469f2d75cd2965d289da47067f7f5627","examples/sample.bf":"824a423496b3847b635d8da95d183867115318e671f731149c3603c732dbed77","examples/sample.foo":"f0c3345067c4716498e0c37d142254c6eb8edbb600b3fedd16a90c1031a6fbdf","examples/sample.json":"9586cdc4f06fdaf3cdb66864d6018184fd6898ccfd30045ec3dbcb2b9ea25d7c","examples/sample.nrs":"e62d5df3a6b54b1fd5efdc531731e984eda7b81b3067c8f561b372832309520e","examples/sample.py":"d6a788dd91e4f8d9083826583d503d8b3eea13f8aa5d9d713e10123cb12e48c4","src/chain.rs":"3bd0c177e896ce0d23ec9f774eba10eaa3f650c6b6586f2d1be02bc801242628","src/combinator.rs":"147ea3cd64b8f50bf9aa44aa01a9093cd1243c6905810d11c959d9031d5129aa","src/debug.rs":"6c51c6f997a092f6b2d5b21e03022a832b223b099d295b558b94163dff1a70e9","src/error.rs":"5876ad31b4c90f9bba33458dfe0cb67ac9b6cf2bf829d99846d39403172ad25e","src/lib.rs":"b9e0e9d42ab9a5848399f97b520952f6bc9144185091f447b597c0a9615aef7a","src/primitive.rs":"49364a64ad83d89057ab32d9c7d9b28812b0e917c24a34a51e9ec10395d78c54","src/recovery.rs":"3a5bc629c02204b51daf3639324e6d01bbbc5a57e146f572c476863cd194745b","src/recursive.rs":"1f8fbce84cbed25ed1a344320bb1ab854214dd54e60ca006cb7b61150c80ee6d","src/span.rs":"8c46807902b6851f3c86d56a57cf015cb1e17ac8fd5d20ff7c32c12543c5da24","src/stream.rs":"31866b663c1dc4930f6168fa5cf2f31c176fbc937040aa3b9cd20390943cffe0","src/text.rs":"f001bcde5818a973cfe47be9322d32f3c6254b12ce3b93c1789e64fb2f5fbf95","tutorial.md":"8662e4013c4afb9f90c72212a10a3d64207205cb939cf8be680d605df747a618"},"package":"8eebd66744a15ded14960ab4ccdbfb51ad3b81f51f3f04a80adac98c985396c9"}

6
vendor/chumsky/.cargo_vcs_info.json vendored Normal file
View File

@@ -0,0 +1,6 @@
{
"git": {
"sha1": "5101cc86a8568a6d33743145e5e8bd0b194332b8"
},
"path_in_vcs": ""
}

1
vendor/chumsky/.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1 @@
github: [zesterer]

View File

@@ -0,0 +1,36 @@
name: Rust
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
env:
CARGO_TERM_COLOR: always
jobs:
check:
name: Check Chumsky
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install latest nightly
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
components: rustfmt, clippy
- name: Run cargo check
run: cargo check --verbose --no-default-features
test:
name: Test Chumsky
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install latest nightly
uses: dtolnay/rust-toolchain@master
with:
toolchain: nightly
components: rustfmt, clippy
- name: Run cargo check
run: cargo test --verbose --all-features

166
vendor/chumsky/CHANGELOG.md vendored Normal file
View File

@@ -0,0 +1,166 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
# Unreleased
### Added
### Removed
### Changed
### Fixed
# [0.9.2] - 2023-03-02
### Fixed
- Properly fixed `skip_then_retry_until` regression
# [0.9.1] - 2023-03-02
### Fixed
- Regression in `skip_then_retry_until` recovery strategy
# [0.9.0] - 2023-02-07
### Added
- A `spill-stack` feature that uses `stacker` to avoid stack overflow errors for deeply recursive parsers
- The ability to access the token span when using `select!` like `select! { |span| Token::Num(x) => (x, span) }`
- Added a `skip_parser` recovery strategy that allows you to implement your own recovery strategies in terms of other
parsers. For example, `.recover_with(skip_parser(take_until(just(';'))))` skips tokens until after the next semicolon
- A `not` combinator that consumes a single token if it is *not* the start of a given pattern. For example,
`just("\\n").or(just('"')).not()` matches any `char` that is not either the final quote of a string, and is not the
start of a newline escape sequence
- A `semantic_indentation` parser for parsing indentation-sensitive languages. Note that this is likely to be
deprecated/removed in the future in favour of a more powerful solution
- `#[must_use]` attribute for parsers to ensure that they're not accidentally created without being used
- `Option<Vec<T>>` and `Vec<Option<T>>` now implement `Chain<T>` and `Option<String>` implements `Chain<char>`
- `choice` now supports both arrays and vectors of parsers in addition to tuples
- The `Simple` error type now implements `Eq`
### Changed
- `text::whitespace` returns a `Repeated` instead of an `impl Parser`, allowing you to call methods like `at_least` and
`exactly` on it.
- Improved `no_std` support
- Improved examples and documentation
- Use zero-width spans for EoI by default
- Don't allow defining a recursive parser more than once
- Various minor bug fixes
- Improved `Display` implementations for various built-in error types and `SimpleReason`
- Use an `OrderedContainer` trait to avoid unexpected behaviour for unordered containers in combination with `just`
### Fixed
- Made several parsers (`todo`, `unwrapped`, etc.) more useful by reporting the parser's location on panic
- Boxing a parser that is already boxed just gives you the original parser to avoid double indirection
- Improved compilation speeds
# [0.8.0] - 2022-02-07
### Added
- `then_with` combinator to allow limited support for parsing nested patterns
- impl From<&[T; N]> for Stream
- `SkipUntil/SkipThenRetryUntil::skip_start/consume_end` for more precise control over skip-based recovery
### Changed
- Allowed `Validate` to map the output type
- Switched to zero-size End Of Input spans for default implementations of `Stream`
- Made `delimited_by` take combinators instead of specific tokens
- Minor optimisations
- Documentation improvements
### Fixed
- Compilation error with `--no-default-features`
- Made default behaviour of `skip_until` more sensible
# [0.7.0] - 2021-12-16
### Added
- A new [tutorial](tutorial.md) to help new users
- `select` macro, a wrapper over `filter_map` that makes extracting data from specific tokens easy
- `choice` parser, a better alternative to long `or` chains (which sometimes have poor compilation performance)
- `todo` parser, that panics when used (but not when created) (akin to Rust's `todo!` macro, but for parsers)
- `keyword` parser, that parses *exact* identifiers
- `from_str` combinator to allow converting a pattern to a value inline, using `std::str::FromStr`
- `unwrapped` combinator, to automatically unwrap an output value inline
- `rewind` combinator, that allows reverting the input stream on success. It's most useful when requiring that a
pattern is followed by some terminating pattern without the first parser greedily consuming it
- `map_err_with_span` combinator, to allow fetching the span of the input that was parsed by a parser before an error
was encountered
- `or_else` combinator, to allow processing and potentially recovering from a parser error
- `SeparatedBy::at_most` to require that a separated pattern appear at most a specific number of times
- `SeparatedBy::exactly` to require that a separated pattern be repeated exactly a specific number of times
- `Repeated::exactly` to require that a pattern be repeated exactly a specific number of times
- More trait implementations for various things, making the crate more useful
### Changed
- Made `just`, `one_of`, and `none_of` significant more useful. They can now accept strings, arrays, slices, vectors,
sets, or just single tokens as before
- Added the return type of each parser to its documentation
- More explicit documentation of parser behaviour
- More doc examples
- Deprecated `seq` (`just` has been generalised and can now be used to parse specific input sequences)
- Sealed the `Character` trait so that future changes are not breaking
- Sealed the `Chain` trait and made it more powerful
- Moved trait constraints on `Parser` to where clauses for improved readability
### Fixed
- Fixed a subtle bug that allowed `separated_by` to parse an extra trailing separator when it shouldn't
- Filled a 'hole' in the `Error` trait's API that conflated a lack of expected tokens with expectation of end of input
- Made recursive parsers use weak reference-counting to avoid memory leaks
# [0.6.0] - 2021-11-22
### Added
- `skip_until` error recovery strategy
- `SeparatedBy::at_least` and `SeparatedBy::at_most` for parsing a specific number of separated items
- `Parser::validate` for integrated AST validation
- `Recursive::declare` and `Recursive::define` for more precise control over recursive declarations
### Changed
- Improved `separated_by` error messages
- Improved documentation
- Hid a new (probably) unused implementation details
# [0.5.0] - 2021-10-30
### Added
- `take_until` primitive
### Changed
- Added span to fallback output function in `nested_delimiters`
# [0.4.0] - 2021-10-28
### Added
- Support for LL(k) parsing
- Custom error recovery strategies
- Debug mode
- Nested input flattening
### Changed
- Radically improved error quality

247
vendor/chumsky/Cargo.lock generated vendored Normal file
View File

@@ -0,0 +1,247 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3
[[package]]
name = "ahash"
version = "0.8.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cd7d5a2cecb58716e47d67d5703a249964b14c7be1ec3cad3affc295b2d1c35d"
dependencies = [
"cfg-if",
"once_cell",
"version_check",
"zerocopy",
]
[[package]]
name = "allocator-api2"
version = "0.2.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5"
[[package]]
name = "ariadne"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72fe02fc62033df9ba41cba57ee19acf5e742511a140c7dbc3a873e19a19a1bd"
dependencies = [
"unicode-width",
"yansi",
]
[[package]]
name = "bstr"
version = "1.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c79ad7fb2dd38f3dabd76b09c6a5a20c038fc0213ef1e9afd30eb777f120f019"
dependencies = [
"memchr",
"regex-automata",
"serde",
]
[[package]]
name = "cc"
version = "1.0.83"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1174fb0b6ec23863f8b971027804a42614e347eafb0a95bf0b12cdae21fc4d0"
dependencies = [
"libc",
]
[[package]]
name = "cfg-if"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "chumsky"
version = "0.9.3"
dependencies = [
"ariadne",
"hashbrown",
"pom",
"stacker",
]
[[package]]
name = "hashbrown"
version = "0.14.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f93e7192158dbcda357bdec5fb5788eebf8bbac027f3f33e719d29135ae84156"
dependencies = [
"ahash",
"allocator-api2",
]
[[package]]
name = "libc"
version = "0.2.149"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a08173bc88b7955d1b3145aa561539096c421ac8debde8cbc3612ec635fee29b"
[[package]]
name = "memchr"
version = "2.6.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f665ee40bc4a3c5590afb1e9677db74a508659dfd71e126420da8274909a0167"
[[package]]
name = "once_cell"
version = "1.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dd8b5dd2ae5ed71462c540258bedcb51965123ad7e7ccf4b9a8cafaa4a63576d"
[[package]]
name = "pom"
version = "3.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c2d73a5fe10d458e77534589512104e5aa8ac480aa9ac30b74563274235cce4"
dependencies = [
"bstr",
]
[[package]]
name = "proc-macro2"
version = "1.0.69"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "134c189feb4956b20f6f547d2cf727d4c0fe06722b20a0eec87ed445a97f92da"
dependencies = [
"unicode-ident",
]
[[package]]
name = "psm"
version = "0.1.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5787f7cda34e3033a72192c018bc5883100330f362ef279a8cbccfce8bb4e874"
dependencies = [
"cc",
]
[[package]]
name = "quote"
version = "1.0.33"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5267fca4496028628a95160fc423a33e8b2e6af8a5302579e322e4b520293cae"
dependencies = [
"proc-macro2",
]
[[package]]
name = "regex-automata"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f804c7828047e88b2d32e2d7fe5a105da8ee3264f01902f796c8e067dc2483f"
[[package]]
name = "serde"
version = "1.0.189"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e422a44e74ad4001bdc8eede9a4570ab52f71190e9c076d14369f38b9200537"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.189"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e48d1f918009ce3145511378cf68d613e3b3d9137d67272562080d68a2b32d5"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "stacker"
version = "0.1.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c886bd4480155fd3ef527d45e9ac8dd7118a898a46530b7b94c3e21866259fce"
dependencies = [
"cc",
"cfg-if",
"libc",
"psm",
"winapi",
]
[[package]]
name = "syn"
version = "2.0.38"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e96b79aaa137db8f61e26363a0c9b47d8b4ec75da28b7d1d614c2303e232408b"
dependencies = [
"proc-macro2",
"quote",
"unicode-ident",
]
[[package]]
name = "unicode-ident"
version = "1.0.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3354b9ac3fae1ff6755cb6db53683adb661634f67557942dea4facebec0fee4b"
[[package]]
name = "unicode-width"
version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e51733f11c9c4f72aa0c160008246859e340b00807569a0da0e7a1079b27ba85"
[[package]]
name = "version_check"
version = "0.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "49874b5167b65d7193b8aba1567f5c7d93d001cafc34600cee003eda787e483f"
[[package]]
name = "winapi"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
dependencies = [
"winapi-i686-pc-windows-gnu",
"winapi-x86_64-pc-windows-gnu",
]
[[package]]
name = "winapi-i686-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
[[package]]
name = "winapi-x86_64-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
[[package]]
name = "yansi"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "09041cd90cf85f7f8b2df60c646f853b7f535ce68f85244eb6731cf89fa498ec"
[[package]]
name = "zerocopy"
version = "0.7.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4c19fae0c8a9efc6a8281f2e623db8af1db9e57852e04cde3e754dd2dc29340f"
dependencies = [
"zerocopy-derive",
]
[[package]]
name = "zerocopy-derive"
version = "0.7.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc56589e9ddd1f1c28d4b4b5c773ce232910a6bb67a70133d61c9e347585efe9"
dependencies = [
"proc-macro2",
"quote",
"syn",
]

62
vendor/chumsky/Cargo.toml vendored Normal file
View File

@@ -0,0 +1,62 @@
# THIS FILE IS AUTOMATICALLY GENERATED BY CARGO
#
# When uploading crates to the registry Cargo will automatically
# "normalize" Cargo.toml files for maximal compatibility
# with all versions of Cargo and also rewrite `path` dependencies
# to registry (e.g., crates.io) dependencies.
#
# If you are reading this file be aware that the original Cargo.toml
# will likely look very different (and much more reasonable).
# See Cargo.toml.orig for the original contents.
[package]
edition = "2018"
name = "chumsky"
version = "0.9.3"
authors = ["Joshua Barretto <joshua.s.barretto@gmail.com>"]
exclude = [
"/misc/*",
"/benches/*",
]
description = "A parser library for humans with powerful error recovery"
readme = "README.md"
keywords = [
"parser",
"combinator",
"token",
"language",
"syntax",
]
categories = [
"parsing",
"text-processing",
]
license = "MIT"
repository = "https://github.com/zesterer/chumsky"
[dependencies.hashbrown]
version = "0.14.2"
[dependencies.stacker]
version = "0.1"
optional = true
[dev-dependencies.ariadne]
version = "0.3.0"
[dev-dependencies.pom]
version = "3.0"
[features]
ahash = []
default = [
"ahash",
"std",
"spill-stack",
]
nightly = []
spill-stack = [
"stacker",
"std",
]
std = []

21
vendor/chumsky/LICENSE vendored Normal file
View File

@@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2021 Joshua Barretto
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

190
vendor/chumsky/README.md vendored Normal file
View File

@@ -0,0 +1,190 @@
# Chumsky
[![crates.io](https://img.shields.io/crates/v/chumsky.svg)](https://crates.io/crates/chumsky)
[![crates.io](https://docs.rs/chumsky/badge.svg)](https://docs.rs/chumsky)
[![License](https://img.shields.io/crates/l/chumsky.svg)](https://github.com/zesterer/chumsky)
[![actions-badge](https://github.com/zesterer/chumsky/workflows/Rust/badge.svg?branch=master)](https://github.com/zesterer/chumsky/actions)
A parser library for humans with powerful error recovery.
<a href = "https://www.github.com/zesterer/tao">
<img src="https://raw.githubusercontent.com/zesterer/chumsky/master/misc/example.png" alt="Example usage with my own language, Tao"/>
</a>
*Note: Error diagnostic rendering is performed by [Ariadne](https://github.com/zesterer/ariadne)*
## Contents
- [Features](#features)
- [Example Brainfuck Parser](#example-brainfuck-parser)
- [Tutorial](#tutorial)
- [*What* is a parser combinator?](#what-is-a-parser-combinator)
- [*Why* use parser combinators?](#why-use-parser-combinators)
- [Classification](#classification)
- [Error Recovery](#error-recovery)
- [Performance](#performance)
- [Planned Features](#planned-features)
- [Philosophy](#philosophy)
- [Notes](#notes)
- [License](#license)
## Features
- Lots of combinators!
- Generic across input, output, error, and span types
- Powerful error recovery strategies
- Inline mapping to your AST
- Text-specific parsers for both `u8`s and `char`s
- Recursive parsers
- Backtracking is fully supported, allowing the parsing of all known context-free grammars
- Parsing of nesting inputs, allowing you to move delimiter parsing to the lexical stage (as Rust does!)
- Built-in parser debugging
## Example [Brainfuck](https://en.wikipedia.org/wiki/Brainfuck) Parser
See [`examples/brainfuck.rs`](https://github.com/zesterer/chumsky/blob/master/examples/brainfuck.rs) for the full
interpreter (`cargo run --example brainfuck -- examples/sample.bf`).
```rust
use chumsky::prelude::*;
#[derive(Clone)]
enum Instr {
Left, Right,
Incr, Decr,
Read, Write,
Loop(Vec<Self>),
}
fn parser() -> impl Parser<char, Vec<Instr>, Error = Simple<char>> {
recursive(|bf| choice((
just('<').to(Instr::Left),
just('>').to(Instr::Right),
just('+').to(Instr::Incr),
just('-').to(Instr::Decr),
just(',').to(Instr::Read),
just('.').to(Instr::Write),
bf.delimited_by(just('['), just(']')).map(Instr::Loop),
))
.repeated())
}
```
Other examples include:
- A [JSON parser](https://github.com/zesterer/chumsky/blob/master/examples/json.rs) (`cargo run --example json --
examples/sample.json`)
- An [interpreter for a simple Rust-y language](https://github.com/zesterer/chumsky/blob/master/examples/nano_rust.rs)
(`cargo run --example nano_rust -- examples/sample.nrs`)
## Tutorial
Chumsky has [a tutorial](https://github.com/zesterer/chumsky/blob/master/tutorial.md) that teaches you how to write a
parser and interpreter for a simple dynamic language with unary and binary operators, operator precedence, functions,
let declarations, and calls.
## *What* is a parser combinator?
Parser combinators are a technique for implementing parsers by defining them in terms of other parsers. The resulting
parsers use a [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser) strategy to transform a stream
of tokens into an output. Using parser combinators to define parsers is roughly analogous to using Rust's
[`Iterator`](https://doc.rust-lang.org/std/iter/trait.Iterator.html) trait to define iterative algorithms: the
type-driven API of `Iterator` makes it more difficult to make mistakes and easier to encode complicated iteration logic
than if one were to write the same code by hand. The same is true of parser combinators.
## *Why* use parser combinators?
Writing parsers with good error recovery is conceptually difficult and time-consuming. It requires understanding the
intricacies of the recursive descent algorithm, and then implementing recovery strategies on top of it. If you're
developing a programming language, you'll almost certainly change your mind about syntax in the process, leading to some
slow and painful parser refactoring. Parser combinators solve both problems by providing an ergonomic API that allows
for rapidly iterating upon a syntax.
Parser combinators are also a great fit for domain-specific languages for which an existing parser does not exist.
Writing a reliable, fault-tolerant parser for such situations can go from being a multi-day task to a half-hour task
with the help of a decent parser combinator library.
## Classification
Chumsky's parsers are [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser) parsers and are
capable of parsing [parsing expression grammars (PEGs)](https://en.wikipedia.org/wiki/Parsing_expression_grammar), which
includes all known context-free languages. It is theoretically possible to extend Chumsky further to accept limited
context-sensitive grammars too, although this is rarely required.
## Error Recovery
Chumsky has support for error recovery, meaning that it can encounter a syntax error, report the error, and then
attempt to recover itself into a state in which it can continue parsing so that multiple errors can be produced at once
and a partial [AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree) can still be generated from the input for future
compilation stages to consume.
However, there is no silver bullet strategy for error recovery. By definition, if the input to a parser is invalid then
the parser can only make educated guesses as to the meaning of the input. Different recovery strategies will work better
for different languages, and for different patterns within those languages.
Chumsky provides a variety of recovery strategies (each implementing the `Strategy` trait), but it's important to
understand that all of
- which you apply
- where you apply them
- what order you apply them
will greatly affect the quality of the errors that Chumsky is able to produce, along with the extent to which it is able
to recover a useful AST. Where possible, you should attempt more 'specific' recovery strategies first rather than those
that mindlessly skip large swathes of the input.
It is recommended that you experiment with applying different strategies in different situations and at different levels
of the parser to find a configuration that you are happy with. If none of the provided error recovery strategies cover
the specific pattern you wish to catch, you can even create your own by digging into Chumsky's internals and
implementing your own strategies! If you come up with a useful strategy, feel free to open a PR against the
[main repository](https://github.com/zesterer/chumsky/)!
## Performance
Chumsky focuses on high-quality errors and ergonomics over performance. That said, it's important that Chumsky can keep
up with the rest of your compiler! Unfortunately, it's *extremely* difficult to come up with sensible benchmarks given
that exactly how Chumsky performs depends entirely on what you are parsing, how you structure your parser, which
patterns the parser attempts to match first, how complex your error type is, what is involved in constructing your AST,
etc. All that said, here are some numbers from the
[JSON benchmark](https://github.com/zesterer/chumsky/blob/master/benches/json.rs) included in the repository running on
my Ryzen 7 3700x.
```ignore
test chumsky ... bench: 4,782,390 ns/iter (+/- 997,208)
test pom ... bench: 12,793,490 ns/iter (+/- 1,954,583)
```
I've included results from [`pom`](https://github.com/J-F-Liu/pom), another parser combinator crate with a similar
design, as a point of reference. The sample file being parsed is broadly representative of typical JSON data and has
3,018 lines. This translates to a little over 630,000 lines of JSON per second.
Clearly, this is a little slower than a well-optimised hand-written parser: but that's okay! Chumsky's goal is to be
*fast enough*. If you've written enough code in your language that parsing performance even starts to be a problem,
you've already committed enough time and resources to your language that hand-writing a parser is the best choice going!
## Planned Features
- An optimised 'happy path' parser mode that skips error recovery & error generation
- An even faster 'validation' parser mode, guaranteed to not allocate, that doesn't generate outputs but just verifies
the validity of an input
## Philosophy
Chumsky should:
- Be easy to use, even if you don't understand exactly what the parser is doing under the hood
- Be type-driven, pushing users away from anti-patterns at compile-time
- Be a mature, 'batteries-included' solution for context-free parsing by default. If you need to implement either
`Parser` or `Strategy` by hand, that's a problem that needs fixing
- Be 'fast enough', but no faster (i.e: when there is a tradeoff between error quality and performance, Chumsky will
always take the former option)
- Be modular and extensible, allowing users to implement their own parsers, recovery strategies, error types, spans, and
be generic over both input tokens and the output AST
## Notes
My apologies to Noam for choosing such an absurd name.
## License
Chumsky is licensed under the MIT license (see `LICENSE` in the main repository).

73
vendor/chumsky/examples/brainfuck.rs vendored Normal file
View File

@@ -0,0 +1,73 @@
//! This is a Brainfuck parser and interpreter
//! Run it with the following command:
//! cargo run --example brainfuck -- examples/sample.bf
use chumsky::prelude::*;
use std::{
env, fs,
io::{self, Read},
};
#[derive(Clone)]
enum Instr {
Invalid,
Left,
Right,
Incr,
Decr,
Read,
Write,
Loop(Vec<Self>),
}
fn parser() -> impl Parser<char, Vec<Instr>, Error = Simple<char>> {
use Instr::*;
recursive(|bf| {
choice((
just('<').to(Left),
just('>').to(Right),
just('+').to(Incr),
just('-').to(Decr),
just(',').to(Read),
just('.').to(Write),
))
.or(bf.delimited_by(just('['), just(']')).map(Loop))
.recover_with(nested_delimiters('[', ']', [], |_| Invalid))
.recover_with(skip_then_retry_until([']']))
.repeated()
})
.then_ignore(end())
}
const TAPE_LEN: usize = 10_000;
fn execute(ast: &[Instr], ptr: &mut usize, tape: &mut [u8; TAPE_LEN]) {
use Instr::*;
for symbol in ast {
match symbol {
Invalid => unreachable!(),
Left => *ptr = (*ptr + TAPE_LEN - 1).rem_euclid(TAPE_LEN),
Right => *ptr = (*ptr + 1).rem_euclid(TAPE_LEN),
Incr => tape[*ptr] = tape[*ptr].wrapping_add(1),
Decr => tape[*ptr] = tape[*ptr].wrapping_sub(1),
Read => tape[*ptr] = io::stdin().bytes().next().unwrap().unwrap(),
Write => print!("{}", tape[*ptr] as char),
Loop(ast) => {
while tape[*ptr] != 0 {
execute(ast, ptr, tape)
}
}
}
}
}
fn main() {
let src = fs::read_to_string(env::args().nth(1).expect("Expected file argument"))
.expect("Failed to read file");
// let src = "[!]+";
match parser().parse(src.trim()) {
Ok(ast) => execute(&ast, &mut 0, &mut [0; TAPE_LEN]),
Err(errs) => errs.into_iter().for_each(|e| println!("{:?}", e)),
}
}

196
vendor/chumsky/examples/foo.rs vendored Normal file
View File

@@ -0,0 +1,196 @@
/// This is the parser and interpreter for the 'Foo' language. See `tutorial.md` in the repository's root to learn
/// about it.
use chumsky::prelude::*;
#[derive(Debug)]
enum Expr {
Num(f64),
Var(String),
Neg(Box<Expr>),
Add(Box<Expr>, Box<Expr>),
Sub(Box<Expr>, Box<Expr>),
Mul(Box<Expr>, Box<Expr>),
Div(Box<Expr>, Box<Expr>),
Call(String, Vec<Expr>),
Let {
name: String,
rhs: Box<Expr>,
then: Box<Expr>,
},
Fn {
name: String,
args: Vec<String>,
body: Box<Expr>,
then: Box<Expr>,
},
}
fn parser() -> impl Parser<char, Expr, Error = Simple<char>> {
let ident = text::ident().padded();
let expr = recursive(|expr| {
let int = text::int(10)
.map(|s: String| Expr::Num(s.parse().unwrap()))
.padded();
let call = ident
.then(
expr.clone()
.separated_by(just(','))
.allow_trailing()
.delimited_by(just('('), just(')')),
)
.map(|(f, args)| Expr::Call(f, args));
let atom = int
.or(expr.delimited_by(just('('), just(')')))
.or(call)
.or(ident.map(Expr::Var));
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.then(atom)
.foldr(|_op, rhs| Expr::Neg(Box::new(rhs)));
let product = unary
.clone()
.then(
op('*')
.to(Expr::Mul as fn(_, _) -> _)
.or(op('/').to(Expr::Div as fn(_, _) -> _))
.then(unary)
.repeated(),
)
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
let sum = product
.clone()
.then(
op('+')
.to(Expr::Add as fn(_, _) -> _)
.or(op('-').to(Expr::Sub as fn(_, _) -> _))
.then(product)
.repeated(),
)
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
sum
});
let decl = recursive(|decl| {
let r#let = text::keyword("let")
.ignore_then(ident)
.then_ignore(just('='))
.then(expr.clone())
.then_ignore(just(';'))
.then(decl.clone())
.map(|((name, rhs), then)| Expr::Let {
name,
rhs: Box::new(rhs),
then: Box::new(then),
});
let r#fn = text::keyword("fn")
.ignore_then(ident)
.then(ident.repeated())
.then_ignore(just('='))
.then(expr.clone())
.then_ignore(just(';'))
.then(decl)
.map(|(((name, args), body), then)| Expr::Fn {
name,
args,
body: Box::new(body),
then: Box::new(then),
});
r#let.or(r#fn).or(expr).padded()
});
decl.then_ignore(end())
}
fn eval<'a>(
expr: &'a Expr,
vars: &mut Vec<(&'a String, f64)>,
funcs: &mut Vec<(&'a String, &'a [String], &'a Expr)>,
) -> Result<f64, String> {
match expr {
Expr::Num(x) => Ok(*x),
Expr::Neg(a) => Ok(-eval(a, vars, funcs)?),
Expr::Add(a, b) => Ok(eval(a, vars, funcs)? + eval(b, vars, funcs)?),
Expr::Sub(a, b) => Ok(eval(a, vars, funcs)? - eval(b, vars, funcs)?),
Expr::Mul(a, b) => Ok(eval(a, vars, funcs)? * eval(b, vars, funcs)?),
Expr::Div(a, b) => Ok(eval(a, vars, funcs)? / eval(b, vars, funcs)?),
Expr::Var(name) => {
if let Some((_, val)) = vars.iter().rev().find(|(var, _)| *var == name) {
Ok(*val)
} else {
Err(format!("Cannot find variable `{}` in scope", name))
}
}
Expr::Let { name, rhs, then } => {
let rhs = eval(rhs, vars, funcs)?;
vars.push((name, rhs));
let output = eval(then, vars, funcs);
vars.pop();
output
}
Expr::Call(name, args) => {
if let Some((_, arg_names, body)) =
funcs.iter().rev().find(|(var, _, _)| *var == name).copied()
{
if arg_names.len() == args.len() {
let mut args = args
.iter()
.map(|arg| eval(arg, vars, funcs))
.zip(arg_names.iter())
.map(|(val, name)| Ok((name, val?)))
.collect::<Result<_, String>>()?;
vars.append(&mut args);
let output = eval(body, vars, funcs);
vars.truncate(vars.len() - args.len());
output
} else {
Err(format!(
"Wrong number of arguments for function `{}`: expected {}, found {}",
name,
arg_names.len(),
args.len(),
))
}
} else {
Err(format!("Cannot find function `{}` in scope", name))
}
}
Expr::Fn {
name,
args,
body,
then,
} => {
funcs.push((name, args, body));
let output = eval(then, vars, funcs);
funcs.pop();
output
}
}
}
fn main() {
let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap();
match parser().parse(src) {
Ok(ast) => match eval(&ast, &mut Vec::new(), &mut Vec::new()) {
Ok(output) => println!("{}", output),
Err(eval_err) => println!("Evaluation error: {}", eval_err),
},
Err(parse_errs) => parse_errs
.into_iter()
.for_each(|e| println!("Parse error: {}", e)),
}
}

175
vendor/chumsky/examples/json.rs vendored Normal file
View File

@@ -0,0 +1,175 @@
//! This is a parser for JSON.
//! Run it with the following command:
//! cargo run --example json -- examples/sample.json
use ariadne::{Color, Fmt, Label, Report, ReportKind, Source};
use chumsky::prelude::*;
use std::{collections::HashMap, env, fs};
#[derive(Clone, Debug)]
enum Json {
Invalid,
Null,
Bool(bool),
Str(String),
Num(f64),
Array(Vec<Json>),
Object(HashMap<String, Json>),
}
fn parser() -> impl Parser<char, Json, Error = Simple<char>> {
recursive(|value| {
let frac = just('.').chain(text::digits(10));
let exp = just('e')
.or(just('E'))
.chain(just('+').or(just('-')).or_not())
.chain::<char, _, _>(text::digits(10));
let number = just('-')
.or_not()
.chain::<char, _, _>(text::int(10))
.chain::<char, _, _>(frac.or_not().flatten())
.chain::<char, _, _>(exp.or_not().flatten())
.collect::<String>()
.from_str()
.unwrapped()
.labelled("number");
let escape = just('\\').ignore_then(
just('\\')
.or(just('/'))
.or(just('"'))
.or(just('b').to('\x08'))
.or(just('f').to('\x0C'))
.or(just('n').to('\n'))
.or(just('r').to('\r'))
.or(just('t').to('\t'))
.or(just('u').ignore_then(
filter(|c: &char| c.is_digit(16))
.repeated()
.exactly(4)
.collect::<String>()
.validate(|digits, span, emit| {
char::from_u32(u32::from_str_radix(&digits, 16).unwrap())
.unwrap_or_else(|| {
emit(Simple::custom(span, "invalid unicode character"));
'\u{FFFD}' // unicode replacement character
})
}),
)),
);
let string = just('"')
.ignore_then(filter(|c| *c != '\\' && *c != '"').or(escape).repeated())
.then_ignore(just('"'))
.collect::<String>()
.labelled("string");
let array = value
.clone()
.chain(just(',').ignore_then(value.clone()).repeated())
.or_not()
.flatten()
.delimited_by(just('['), just(']'))
.map(Json::Array)
.labelled("array");
let member = string.clone().then_ignore(just(':').padded()).then(value);
let object = member
.clone()
.chain(just(',').padded().ignore_then(member).repeated())
.or_not()
.flatten()
.padded()
.delimited_by(just('{'), just('}'))
.collect::<HashMap<String, Json>>()
.map(Json::Object)
.labelled("object");
just("null")
.to(Json::Null)
.labelled("null")
.or(just("true").to(Json::Bool(true)).labelled("true"))
.or(just("false").to(Json::Bool(false)).labelled("false"))
.or(number.map(Json::Num))
.or(string.map(Json::Str))
.or(array)
.or(object)
.recover_with(nested_delimiters('{', '}', [('[', ']')], |_| Json::Invalid))
.recover_with(nested_delimiters('[', ']', [('{', '}')], |_| Json::Invalid))
.recover_with(skip_then_retry_until(['}', ']']))
.padded()
})
.then_ignore(end().recover_with(skip_then_retry_until([])))
}
fn main() {
let src = fs::read_to_string(env::args().nth(1).expect("Expected file argument"))
.expect("Failed to read file");
let (json, errs) = parser().parse_recovery(src.trim());
println!("{:#?}", json);
errs.into_iter().for_each(|e| {
let msg = if let chumsky::error::SimpleReason::Custom(msg) = e.reason() {
msg.clone()
} else {
format!(
"{}{}, expected {}",
if e.found().is_some() {
"Unexpected token"
} else {
"Unexpected end of input"
},
if let Some(label) = e.label() {
format!(" while parsing {}", label)
} else {
String::new()
},
if e.expected().len() == 0 {
"something else".to_string()
} else {
e.expected()
.map(|expected| match expected {
Some(expected) => expected.to_string(),
None => "end of input".to_string(),
})
.collect::<Vec<_>>()
.join(", ")
},
)
};
let report = Report::build(ReportKind::Error, (), e.span().start)
.with_code(3)
.with_message(msg)
.with_label(
Label::new(e.span())
.with_message(match e.reason() {
chumsky::error::SimpleReason::Custom(msg) => msg.clone(),
_ => format!(
"Unexpected {}",
e.found()
.map(|c| format!("token {}", c.fg(Color::Red)))
.unwrap_or_else(|| "end of input".to_string())
),
})
.with_color(Color::Red),
);
let report = match e.reason() {
chumsky::error::SimpleReason::Unclosed { span, delimiter } => report.with_label(
Label::new(span.clone())
.with_message(format!(
"Unclosed delimiter {}",
delimiter.fg(Color::Yellow)
))
.with_color(Color::Yellow),
),
chumsky::error::SimpleReason::Unexpected => report,
chumsky::error::SimpleReason::Custom(_) => report,
};
report.finish().print(Source::from(&src)).unwrap();
});
}

639
vendor/chumsky/examples/nano_rust.rs vendored Normal file
View File

@@ -0,0 +1,639 @@
//! This is an entire parser and interpreter for a dynamically-typed Rust-like expression-oriented
//! programming language. See `sample.nrs` for sample source code.
//! Run it with the following command:
//! cargo run --example nano_rust -- examples/sample.nrs
use ariadne::{Color, Fmt, Label, Report, ReportKind, Source};
use chumsky::{prelude::*, stream::Stream};
use std::{collections::HashMap, env, fmt, fs};
pub type Span = std::ops::Range<usize>;
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
enum Token {
Null,
Bool(bool),
Num(String),
Str(String),
Op(String),
Ctrl(char),
Ident(String),
Fn,
Let,
Print,
If,
Else,
}
impl fmt::Display for Token {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Token::Null => write!(f, "null"),
Token::Bool(x) => write!(f, "{}", x),
Token::Num(n) => write!(f, "{}", n),
Token::Str(s) => write!(f, "{}", s),
Token::Op(s) => write!(f, "{}", s),
Token::Ctrl(c) => write!(f, "{}", c),
Token::Ident(s) => write!(f, "{}", s),
Token::Fn => write!(f, "fn"),
Token::Let => write!(f, "let"),
Token::Print => write!(f, "print"),
Token::If => write!(f, "if"),
Token::Else => write!(f, "else"),
}
}
}
fn lexer() -> impl Parser<char, Vec<(Token, Span)>, Error = Simple<char>> {
// A parser for numbers
let num = text::int(10)
.chain::<char, _, _>(just('.').chain(text::digits(10)).or_not().flatten())
.collect::<String>()
.map(Token::Num);
// A parser for strings
let str_ = just('"')
.ignore_then(filter(|c| *c != '"').repeated())
.then_ignore(just('"'))
.collect::<String>()
.map(Token::Str);
// A parser for operators
let op = one_of("+-*/!=")
.repeated()
.at_least(1)
.collect::<String>()
.map(Token::Op);
// A parser for control characters (delimiters, semicolons, etc.)
let ctrl = one_of("()[]{};,").map(|c| Token::Ctrl(c));
// A parser for identifiers and keywords
let ident = text::ident().map(|ident: String| match ident.as_str() {
"fn" => Token::Fn,
"let" => Token::Let,
"print" => Token::Print,
"if" => Token::If,
"else" => Token::Else,
"true" => Token::Bool(true),
"false" => Token::Bool(false),
"null" => Token::Null,
_ => Token::Ident(ident),
});
// A single token can be one of the above
let token = num
.or(str_)
.or(op)
.or(ctrl)
.or(ident)
.recover_with(skip_then_retry_until([]));
let comment = just("//").then(take_until(just('\n'))).padded();
token
.map_with_span(|tok, span| (tok, span))
.padded_by(comment.repeated())
.padded()
.repeated()
}
#[derive(Clone, Debug, PartialEq)]
enum Value {
Null,
Bool(bool),
Num(f64),
Str(String),
List(Vec<Value>),
Func(String),
}
impl Value {
fn num(self, span: Span) -> Result<f64, Error> {
if let Value::Num(x) = self {
Ok(x)
} else {
Err(Error {
span,
msg: format!("'{}' is not a number", self),
})
}
}
}
impl std::fmt::Display for Value {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self {
Self::Null => write!(f, "null"),
Self::Bool(x) => write!(f, "{}", x),
Self::Num(x) => write!(f, "{}", x),
Self::Str(x) => write!(f, "{}", x),
Self::List(xs) => write!(
f,
"[{}]",
xs.iter()
.map(|x| x.to_string())
.collect::<Vec<_>>()
.join(", ")
),
Self::Func(name) => write!(f, "<function: {}>", name),
}
}
}
#[derive(Clone, Debug)]
enum BinaryOp {
Add,
Sub,
Mul,
Div,
Eq,
NotEq,
}
pub type Spanned<T> = (T, Span);
// An expression node in the AST. Children are spanned so we can generate useful runtime errors.
#[derive(Debug)]
enum Expr {
Error,
Value(Value),
List(Vec<Spanned<Self>>),
Local(String),
Let(String, Box<Spanned<Self>>, Box<Spanned<Self>>),
Then(Box<Spanned<Self>>, Box<Spanned<Self>>),
Binary(Box<Spanned<Self>>, BinaryOp, Box<Spanned<Self>>),
Call(Box<Spanned<Self>>, Vec<Spanned<Self>>),
If(Box<Spanned<Self>>, Box<Spanned<Self>>, Box<Spanned<Self>>),
Print(Box<Spanned<Self>>),
}
// A function node in the AST.
#[derive(Debug)]
struct Func {
args: Vec<String>,
body: Spanned<Expr>,
}
fn expr_parser() -> impl Parser<Token, Spanned<Expr>, Error = Simple<Token>> + Clone {
recursive(|expr| {
let raw_expr = recursive(|raw_expr| {
let val = select! {
Token::Null => Expr::Value(Value::Null),
Token::Bool(x) => Expr::Value(Value::Bool(x)),
Token::Num(n) => Expr::Value(Value::Num(n.parse().unwrap())),
Token::Str(s) => Expr::Value(Value::Str(s)),
}
.labelled("value");
let ident = select! { Token::Ident(ident) => ident.clone() }.labelled("identifier");
// A list of expressions
let items = expr
.clone()
.separated_by(just(Token::Ctrl(',')))
.allow_trailing();
// A let expression
let let_ = just(Token::Let)
.ignore_then(ident)
.then_ignore(just(Token::Op("=".to_string())))
.then(raw_expr)
.then_ignore(just(Token::Ctrl(';')))
.then(expr.clone())
.map(|((name, val), body)| Expr::Let(name, Box::new(val), Box::new(body)));
let list = items
.clone()
.delimited_by(just(Token::Ctrl('[')), just(Token::Ctrl(']')))
.map(Expr::List);
// 'Atoms' are expressions that contain no ambiguity
let atom = val
.or(ident.map(Expr::Local))
.or(let_)
.or(list)
// In Nano Rust, `print` is just a keyword, just like Python 2, for simplicity
.or(just(Token::Print)
.ignore_then(
expr.clone()
.delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')'))),
)
.map(|expr| Expr::Print(Box::new(expr))))
.map_with_span(|expr, span| (expr, span))
// Atoms can also just be normal expressions, but surrounded with parentheses
.or(expr
.clone()
.delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')'))))
// Attempt to recover anything that looks like a parenthesised expression but contains errors
.recover_with(nested_delimiters(
Token::Ctrl('('),
Token::Ctrl(')'),
[
(Token::Ctrl('['), Token::Ctrl(']')),
(Token::Ctrl('{'), Token::Ctrl('}')),
],
|span| (Expr::Error, span),
))
// Attempt to recover anything that looks like a list but contains errors
.recover_with(nested_delimiters(
Token::Ctrl('['),
Token::Ctrl(']'),
[
(Token::Ctrl('('), Token::Ctrl(')')),
(Token::Ctrl('{'), Token::Ctrl('}')),
],
|span| (Expr::Error, span),
));
// Function calls have very high precedence so we prioritise them
let call = atom
.then(
items
.delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')')))
.map_with_span(|args, span: Span| (args, span))
.repeated(),
)
.foldl(|f, args| {
let span = f.1.start..args.1.end;
(Expr::Call(Box::new(f), args.0), span)
});
// Product ops (multiply and divide) have equal precedence
let op = just(Token::Op("*".to_string()))
.to(BinaryOp::Mul)
.or(just(Token::Op("/".to_string())).to(BinaryOp::Div));
let product = call
.clone()
.then(op.then(call).repeated())
.foldl(|a, (op, b)| {
let span = a.1.start..b.1.end;
(Expr::Binary(Box::new(a), op, Box::new(b)), span)
});
// Sum ops (add and subtract) have equal precedence
let op = just(Token::Op("+".to_string()))
.to(BinaryOp::Add)
.or(just(Token::Op("-".to_string())).to(BinaryOp::Sub));
let sum = product
.clone()
.then(op.then(product).repeated())
.foldl(|a, (op, b)| {
let span = a.1.start..b.1.end;
(Expr::Binary(Box::new(a), op, Box::new(b)), span)
});
// Comparison ops (equal, not-equal) have equal precedence
let op = just(Token::Op("==".to_string()))
.to(BinaryOp::Eq)
.or(just(Token::Op("!=".to_string())).to(BinaryOp::NotEq));
let compare = sum
.clone()
.then(op.then(sum).repeated())
.foldl(|a, (op, b)| {
let span = a.1.start..b.1.end;
(Expr::Binary(Box::new(a), op, Box::new(b)), span)
});
compare
});
// Blocks are expressions but delimited with braces
let block = expr
.clone()
.delimited_by(just(Token::Ctrl('{')), just(Token::Ctrl('}')))
// Attempt to recover anything that looks like a block but contains errors
.recover_with(nested_delimiters(
Token::Ctrl('{'),
Token::Ctrl('}'),
[
(Token::Ctrl('('), Token::Ctrl(')')),
(Token::Ctrl('['), Token::Ctrl(']')),
],
|span| (Expr::Error, span),
));
let if_ = recursive(|if_| {
just(Token::If)
.ignore_then(expr.clone())
.then(block.clone())
.then(
just(Token::Else)
.ignore_then(block.clone().or(if_))
.or_not(),
)
.map_with_span(|((cond, a), b), span: Span| {
(
Expr::If(
Box::new(cond),
Box::new(a),
Box::new(match b {
Some(b) => b,
// If an `if` expression has no trailing `else` block, we magic up one that just produces null
None => (Expr::Value(Value::Null), span.clone()),
}),
),
span,
)
})
});
// Both blocks and `if` are 'block expressions' and can appear in the place of statements
let block_expr = block.or(if_).labelled("block");
let block_chain = block_expr
.clone()
.then(block_expr.clone().repeated())
.foldl(|a, b| {
let span = a.1.start..b.1.end;
(Expr::Then(Box::new(a), Box::new(b)), span)
});
block_chain
// Expressions, chained by semicolons, are statements
.or(raw_expr.clone())
.then(just(Token::Ctrl(';')).ignore_then(expr.or_not()).repeated())
.foldl(|a, b| {
// This allows creating a span that covers the entire Then expression.
// b_end is the end of b if it exists, otherwise it is the end of a.
let a_start = a.1.start;
let b_end = b.as_ref().map(|b| b.1.end).unwrap_or(a.1.end);
(
Expr::Then(
Box::new(a),
Box::new(match b {
Some(b) => b,
// Since there is no b expression then its span is empty.
None => (Expr::Value(Value::Null), b_end..b_end),
}),
),
a_start..b_end,
)
})
})
}
fn funcs_parser() -> impl Parser<Token, HashMap<String, Func>, Error = Simple<Token>> + Clone {
let ident = filter_map(|span, tok| match tok {
Token::Ident(ident) => Ok(ident.clone()),
_ => Err(Simple::expected_input_found(span, Vec::new(), Some(tok))),
});
// Argument lists are just identifiers separated by commas, surrounded by parentheses
let args = ident
.clone()
.separated_by(just(Token::Ctrl(',')))
.allow_trailing()
.delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')')))
.labelled("function args");
let func = just(Token::Fn)
.ignore_then(
ident
.map_with_span(|name, span| (name, span))
.labelled("function name"),
)
.then(args)
.then(
expr_parser()
.delimited_by(just(Token::Ctrl('{')), just(Token::Ctrl('}')))
// Attempt to recover anything that looks like a function body but contains errors
.recover_with(nested_delimiters(
Token::Ctrl('{'),
Token::Ctrl('}'),
[
(Token::Ctrl('('), Token::Ctrl(')')),
(Token::Ctrl('['), Token::Ctrl(']')),
],
|span| (Expr::Error, span),
)),
)
.map(|((name, args), body)| (name, Func { args, body }))
.labelled("function");
func.repeated()
.try_map(|fs, _| {
let mut funcs = HashMap::new();
for ((name, name_span), f) in fs {
if funcs.insert(name.clone(), f).is_some() {
return Err(Simple::custom(
name_span.clone(),
format!("Function '{}' already exists", name),
));
}
}
Ok(funcs)
})
.then_ignore(end())
}
struct Error {
span: Span,
msg: String,
}
fn eval_expr(
expr: &Spanned<Expr>,
funcs: &HashMap<String, Func>,
stack: &mut Vec<(String, Value)>,
) -> Result<Value, Error> {
Ok(match &expr.0 {
Expr::Error => unreachable!(), // Error expressions only get created by parser errors, so cannot exist in a valid AST
Expr::Value(val) => val.clone(),
Expr::List(items) => Value::List(
items
.iter()
.map(|item| eval_expr(item, funcs, stack))
.collect::<Result<_, _>>()?,
),
Expr::Local(name) => stack
.iter()
.rev()
.find(|(l, _)| l == name)
.map(|(_, v)| v.clone())
.or_else(|| Some(Value::Func(name.clone())).filter(|_| funcs.contains_key(name)))
.ok_or_else(|| Error {
span: expr.1.clone(),
msg: format!("No such variable '{}' in scope", name),
})?,
Expr::Let(local, val, body) => {
let val = eval_expr(val, funcs, stack)?;
stack.push((local.clone(), val));
let res = eval_expr(body, funcs, stack)?;
stack.pop();
res
}
Expr::Then(a, b) => {
eval_expr(a, funcs, stack)?;
eval_expr(b, funcs, stack)?
}
Expr::Binary(a, BinaryOp::Add, b) => Value::Num(
eval_expr(a, funcs, stack)?.num(a.1.clone())?
+ eval_expr(b, funcs, stack)?.num(b.1.clone())?,
),
Expr::Binary(a, BinaryOp::Sub, b) => Value::Num(
eval_expr(a, funcs, stack)?.num(a.1.clone())?
- eval_expr(b, funcs, stack)?.num(b.1.clone())?,
),
Expr::Binary(a, BinaryOp::Mul, b) => Value::Num(
eval_expr(a, funcs, stack)?.num(a.1.clone())?
* eval_expr(b, funcs, stack)?.num(b.1.clone())?,
),
Expr::Binary(a, BinaryOp::Div, b) => Value::Num(
eval_expr(a, funcs, stack)?.num(a.1.clone())?
/ eval_expr(b, funcs, stack)?.num(b.1.clone())?,
),
Expr::Binary(a, BinaryOp::Eq, b) => {
Value::Bool(eval_expr(a, funcs, stack)? == eval_expr(b, funcs, stack)?)
}
Expr::Binary(a, BinaryOp::NotEq, b) => {
Value::Bool(eval_expr(a, funcs, stack)? != eval_expr(b, funcs, stack)?)
}
Expr::Call(func, args) => {
let f = eval_expr(func, funcs, stack)?;
match f {
Value::Func(name) => {
let f = &funcs[&name];
let mut stack = if f.args.len() != args.len() {
return Err(Error {
span: expr.1.clone(),
msg: format!("'{}' called with wrong number of arguments (expected {}, found {})", name, f.args.len(), args.len()),
});
} else {
f.args
.iter()
.zip(args.iter())
.map(|(name, arg)| Ok((name.clone(), eval_expr(arg, funcs, stack)?)))
.collect::<Result<_, _>>()?
};
eval_expr(&f.body, funcs, &mut stack)?
}
f => {
return Err(Error {
span: func.1.clone(),
msg: format!("'{:?}' is not callable", f),
})
}
}
}
Expr::If(cond, a, b) => {
let c = eval_expr(cond, funcs, stack)?;
match c {
Value::Bool(true) => eval_expr(a, funcs, stack)?,
Value::Bool(false) => eval_expr(b, funcs, stack)?,
c => {
return Err(Error {
span: cond.1.clone(),
msg: format!("Conditions must be booleans, found '{:?}'", c),
})
}
}
}
Expr::Print(a) => {
let val = eval_expr(a, funcs, stack)?;
println!("{}", val);
val
}
})
}
fn main() {
let src = fs::read_to_string(env::args().nth(1).expect("Expected file argument"))
.expect("Failed to read file");
let (tokens, mut errs) = lexer().parse_recovery(src.as_str());
let parse_errs = if let Some(tokens) = tokens {
//dbg!(tokens);
let len = src.chars().count();
let (ast, parse_errs) =
funcs_parser().parse_recovery(Stream::from_iter(len..len + 1, tokens.into_iter()));
//dbg!(ast);
if let Some(funcs) = ast.filter(|_| errs.len() + parse_errs.len() == 0) {
if let Some(main) = funcs.get("main") {
assert_eq!(main.args.len(), 0);
match eval_expr(&main.body, &funcs, &mut Vec::new()) {
Ok(val) => println!("Return value: {}", val),
Err(e) => errs.push(Simple::custom(e.span, e.msg)),
}
} else {
panic!("No main function!");
}
}
parse_errs
} else {
Vec::new()
};
errs.into_iter()
.map(|e| e.map(|c| c.to_string()))
.chain(parse_errs.into_iter().map(|e| e.map(|tok| tok.to_string())))
.for_each(|e| {
let report = Report::build(ReportKind::Error, (), e.span().start);
let report = match e.reason() {
chumsky::error::SimpleReason::Unclosed { span, delimiter } => report
.with_message(format!(
"Unclosed delimiter {}",
delimiter.fg(Color::Yellow)
))
.with_label(
Label::new(span.clone())
.with_message(format!(
"Unclosed delimiter {}",
delimiter.fg(Color::Yellow)
))
.with_color(Color::Yellow),
)
.with_label(
Label::new(e.span())
.with_message(format!(
"Must be closed before this {}",
e.found()
.unwrap_or(&"end of file".to_string())
.fg(Color::Red)
))
.with_color(Color::Red),
),
chumsky::error::SimpleReason::Unexpected => report
.with_message(format!(
"{}, expected {}",
if e.found().is_some() {
"Unexpected token in input"
} else {
"Unexpected end of input"
},
if e.expected().len() == 0 {
"something else".to_string()
} else {
e.expected()
.map(|expected| match expected {
Some(expected) => expected.to_string(),
None => "end of input".to_string(),
})
.collect::<Vec<_>>()
.join(", ")
}
))
.with_label(
Label::new(e.span())
.with_message(format!(
"Unexpected token {}",
e.found()
.unwrap_or(&"end of file".to_string())
.fg(Color::Red)
))
.with_color(Color::Red),
),
chumsky::error::SimpleReason::Custom(msg) => report.with_message(msg).with_label(
Label::new(e.span())
.with_message(format!("{}", msg.fg(Color::Red)))
.with_color(Color::Red),
),
};
report.finish().print(Source::from(&src)).unwrap();
});
}

100
vendor/chumsky/examples/pythonic.rs vendored Normal file
View File

@@ -0,0 +1,100 @@
use chumsky::{prelude::*, BoxStream, Flat};
use std::ops::Range;
// Represents the different kinds of delimiters we care about
#[derive(Copy, Clone, Debug)]
enum Delim {
Paren,
Block,
}
// An 'atomic' token (i.e: it has no child tokens)
#[derive(Clone, Debug)]
enum Token {
Int(u64),
Ident(String),
Op(String),
Open(Delim),
Close(Delim),
}
// The output of the lexer: a recursive tree of nested tokens
#[derive(Debug)]
enum TokenTree {
Token(Token),
Tree(Delim, Vec<Spanned<TokenTree>>),
}
type Span = Range<usize>;
type Spanned<T> = (T, Span);
// A parser that turns pythonic code with semantic whitespace into a token tree
fn lexer() -> impl Parser<char, Vec<Spanned<TokenTree>>, Error = Simple<char>> {
let tt = recursive(|tt| {
// Define some atomic tokens
let int = text::int(10).from_str().unwrapped().map(Token::Int);
let ident = text::ident().map(Token::Ident);
let op = one_of("=.:%,")
.repeated()
.at_least(1)
.collect()
.map(Token::Op);
let single_token = int.or(op).or(ident).map(TokenTree::Token);
// Tokens surrounded by parentheses get turned into parenthesised token trees
let token_tree = tt
.padded()
.repeated()
.delimited_by(just('('), just(')'))
.map(|tts| TokenTree::Tree(Delim::Paren, tts));
single_token
.or(token_tree)
.map_with_span(|tt, span| (tt, span))
});
// Whitespace indentation creates code block token trees
text::semantic_indentation(tt, |tts, span| (TokenTree::Tree(Delim::Block, tts), span))
.then_ignore(end())
}
/// Flatten a series of token trees into a single token stream, ready for feeding into the main parser
fn tts_to_stream(
eoi: Span,
token_trees: Vec<Spanned<TokenTree>>,
) -> BoxStream<'static, Token, Span> {
use std::iter::once;
BoxStream::from_nested(eoi, token_trees.into_iter(), |(tt, span)| match tt {
// Single tokens remain unchanged
TokenTree::Token(token) => Flat::Single((token, span)),
// Nested token trees get flattened into their inner contents, surrounded by `Open` and `Close` tokens
TokenTree::Tree(delim, tree) => Flat::Many(
once((TokenTree::Token(Token::Open(delim)), span.clone()))
.chain(tree.into_iter())
.chain(once((TokenTree::Token(Token::Close(delim)), span))),
),
})
}
fn main() {
let code = include_str!("sample.py");
// First, lex the code into some nested token trees
let tts = lexer().parse(code).unwrap();
println!("--- Token Trees ---\n{:#?}", tts);
// Next, flatten
let eoi = 0..code.chars().count();
let mut token_stream = tts_to_stream(eoi, tts);
// At this point, we have a token stream that can be fed into the main parser! Because this is just an example,
// we're instead going to just collect the token stream into a vector and print it.
let flattened_trees = token_stream.fetch_tokens().collect::<Vec<_>>();
println!("--- Flattened Token Trees ---\n{:?}", flattened_trees);
}

1
vendor/chumsky/examples/sample.bf vendored Normal file
View File

@@ -0,0 +1 @@
--[>--->->->++>-<<<<<-------]>--.>---------.>--..+++.>----.>+++++++++.<<.+++.------.<-.>>+.

4
vendor/chumsky/examples/sample.foo vendored Normal file
View File

@@ -0,0 +1,4 @@
let five = 5;
let eight = 3 + five;
fn add x y = x + y;
add(five, eight)

24
vendor/chumsky/examples/sample.json vendored Normal file
View File

@@ -0,0 +1,24 @@
{
"leaving": {
"tail": [{
-2063823378.8597813,
!true,
false,
!null,!
-153646.6402,
"board"
],
"fed": -283765067.9149623,
"cowboy": --355139449,
"although": 794127593.3922591,
"front": "college",
"origin": 981339097
},
"though": ~true,
"invalid": "\uDFFF",
"activity": "value",
"office": -342325541.1937506,
"noise": fallse,
"acres": "home",
"foo": [}]
}

37
vendor/chumsky/examples/sample.nrs vendored Normal file
View File

@@ -0,0 +1,37 @@
// Run this example with `cargo run --example nano_rust -- examples/sample.nrs`
// Feel free to play around with this sample to see what errors you can generate!
// Spans are propagated to the interpreted AST so you can even invoke runtime
// errors and still have an error message that points to source code emitted!
fn mul(x, y) {
x * y
}
// Calculate the factorial of a number
fn factorial(x) {
// Conditionals are supported!
if x == 0 {
1
} else {
mul(x, factorial(x - 1))
}
}
// The main function
fn main() {
let three = 3;
let meaning_of_life = three * 14 + 1;
print("Hello, world!");
print("The meaning of life is...");
if meaning_of_life == 42 {
print(meaning_of_life);
} else {
print("...something we cannot know");
print("However, I can tell you that the factorial of 10 is...");
// Function calling
print(factorial(10));
}
}

16
vendor/chumsky/examples/sample.py vendored Normal file
View File

@@ -0,0 +1,16 @@
import turtle
board = turtle.Turtle(
foo,
bar,
baz,
)
for i in range(6):
board.forward(50)
if i % 2 == 0:
board.right(144)
else:
board.left(72)
turtle.done()

129
vendor/chumsky/src/chain.rs vendored Normal file
View File

@@ -0,0 +1,129 @@
//! Traits that allow chaining parser outputs together.
//!
//! *“And whats happened to the Earth?” “Ah. Its been demolished.” “Has it,” said Arthur levelly. “Yes. It just
//! boiled away into space.” “Look,” said Arthur, “Im a bit upset about that.”*
//!
//! You usually don't need to interact with this trait, or even import it. It's only public so that you can see which
//! types implement it. See [`Parser::chain`](super::Parser) for examples of its usage.
use alloc::{string::String, vec::Vec};
mod private {
use super::*;
pub trait Sealed<T> {}
impl<T> Sealed<T> for T {}
impl<T, A: Sealed<T>> Sealed<T> for (A, T) {}
impl<T> Sealed<T> for Option<T> {}
impl<T> Sealed<T> for Vec<T> {}
impl<T> Sealed<T> for Option<Vec<T>> {}
impl<T> Sealed<T> for Vec<Option<T>> {}
impl<T, A: Sealed<T>> Sealed<T> for Vec<(A, T)> {}
impl Sealed<char> for String {}
impl Sealed<char> for Option<String> {}
}
/// A utility trait that facilitates chaining parser outputs together into [`Vec`]s.
///
/// See [`Parser::chain`](super::Parser).
#[allow(clippy::len_without_is_empty)]
pub trait Chain<T>: private::Sealed<T> {
/// The number of items that this chain link consists of.
fn len(&self) -> usize;
/// Append the elements in this link to the chain.
fn append_to(self, v: &mut Vec<T>);
}
impl<T> Chain<T> for T {
fn len(&self) -> usize {
1
}
fn append_to(self, v: &mut Vec<T>) {
v.push(self);
}
}
impl<T, A: Chain<T>> Chain<T> for (A, T) {
fn len(&self) -> usize {
1
}
fn append_to(self, v: &mut Vec<T>) {
self.0.append_to(v);
v.push(self.1);
}
}
impl<T> Chain<T> for Option<T> {
fn len(&self) -> usize {
self.is_some() as usize
}
fn append_to(self, v: &mut Vec<T>) {
if let Some(x) = self {
v.push(x);
}
}
}
impl<T> Chain<T> for Vec<T> {
fn len(&self) -> usize {
self.as_slice().len()
}
fn append_to(mut self, v: &mut Vec<T>) {
v.append(&mut self);
}
}
impl Chain<char> for String {
// TODO: Quite inefficient
fn len(&self) -> usize {
self.chars().count()
}
fn append_to(self, v: &mut Vec<char>) {
v.extend(self.chars());
}
}
impl<T> Chain<T> for Option<Vec<T>> {
fn len(&self) -> usize {
self.as_ref().map_or(0, Chain::<T>::len)
}
fn append_to(self, v: &mut Vec<T>) {
if let Some(x) = self {
x.append_to(v);
}
}
}
impl Chain<char> for Option<String> {
fn len(&self) -> usize {
self.as_ref().map_or(0, Chain::<char>::len)
}
fn append_to(self, v: &mut Vec<char>) {
if let Some(x) = self {
x.append_to(v);
}
}
}
impl<T> Chain<T> for Vec<Option<T>> {
fn len(&self) -> usize {
self.iter().map(Chain::<T>::len).sum()
}
fn append_to(self, v: &mut Vec<T>) {
self
.into_iter()
.for_each(|x| x.append_to(v));
}
}
impl<T, A: Chain<T>> Chain<T> for Vec<(A, T)> {
fn len(&self) -> usize {
self.iter().map(Chain::<T>::len).sum()
}
fn append_to(self, v: &mut Vec<T>) {
self
.into_iter()
.for_each(|x| x.append_to(v));
}
}

1643
vendor/chumsky/src/combinator.rs vendored Normal file

File diff suppressed because it is too large Load Diff

154
vendor/chumsky/src/debug.rs vendored Normal file
View File

@@ -0,0 +1,154 @@
//! Utilities for debugging parsers.
//!
//! *“He was staring at the instruments with the air of one who is trying to convert Fahrenheit to centigrade in his
//! head while his house is burning down.”*
use super::*;
use alloc::borrow::Cow;
use core::panic::Location;
/// Information about a specific parser.
#[allow(dead_code)]
pub struct ParserInfo {
name: Cow<'static, str>,
display: Rc<dyn fmt::Display>,
location: Location<'static>,
}
impl ParserInfo {
pub(crate) fn new(
name: impl Into<Cow<'static, str>>,
display: Rc<dyn fmt::Display>,
location: Location<'static>,
) -> Self {
Self {
name: name.into(),
display,
location,
}
}
}
/// An event that occurred during parsing.
pub enum ParseEvent {
/// Debugging information was emitted.
Info(String),
}
/// A trait implemented by parser debuggers.
#[deprecated(
note = "This trait is excluded from the semver guarantees of chumsky. If you decide to use it, broken builds are your fault."
)]
pub trait Debugger {
/// Create a new debugging scope.
fn scope<R, Info: FnOnce() -> ParserInfo, F: FnOnce(&mut Self) -> R>(
&mut self,
info: Info,
f: F,
) -> R;
/// Emit a parse event, if the debugger supports them.
fn emit_with<F: FnOnce() -> ParseEvent>(&mut self, f: F);
/// Invoke the given parser with a mode specific to this debugger.
fn invoke<I: Clone, O, P: Parser<I, O> + ?Sized>(
&mut self,
parser: &P,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error>;
}
/// A verbose debugger that emits debugging messages to the console.
pub struct Verbose {
// TODO: Don't use `Result`, that's silly
events: Vec<Result<ParseEvent, (ParserInfo, Self)>>,
}
impl Verbose {
pub(crate) fn new() -> Self {
Self { events: Vec::new() }
}
#[allow(unused_variables)]
fn print_inner(&self, depth: usize) {
// a no-op on no_std!
#[cfg(feature = "std")]
for event in &self.events {
for _ in 0..depth * 4 {
print!(" ");
}
match event {
Ok(ParseEvent::Info(s)) => println!("{}", s),
Err((info, scope)) => {
println!(
"Entered {} at line {} in {}",
info.display,
info.location.line(),
info.location.file()
);
scope.print_inner(depth + 1);
}
}
}
}
pub(crate) fn print(&self) {
self.print_inner(0)
}
}
impl Debugger for Verbose {
fn scope<R, Info: FnOnce() -> ParserInfo, F: FnOnce(&mut Self) -> R>(
&mut self,
info: Info,
f: F,
) -> R {
let mut verbose = Verbose { events: Vec::new() };
let res = f(&mut verbose);
self.events.push(Err((info(), verbose)));
res
}
fn emit_with<F: FnOnce() -> ParseEvent>(&mut self, f: F) {
self.events.push(Ok(f()));
}
fn invoke<I: Clone, O, P: Parser<I, O> + ?Sized>(
&mut self,
parser: &P,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error> {
parser.parse_inner_verbose(self, stream)
}
}
/// A silent debugger that emits no debugging messages nor collects any debugging data.
pub struct Silent {
phantom: PhantomData<()>,
}
impl Silent {
pub(crate) fn new() -> Self {
Self {
phantom: PhantomData,
}
}
}
impl Debugger for Silent {
fn scope<R, Info: FnOnce() -> ParserInfo, F: FnOnce(&mut Self) -> R>(
&mut self,
_: Info,
f: F,
) -> R {
f(self)
}
fn emit_with<F: FnOnce() -> ParseEvent>(&mut self, _: F) {}
fn invoke<I: Clone, O, P: Parser<I, O> + ?Sized>(
&mut self,
parser: &P,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error> {
parser.parse_inner_silent(self, stream)
}
}

511
vendor/chumsky/src/error.rs vendored Normal file
View File

@@ -0,0 +1,511 @@
//! Error types, traits and utilities.
//!
//! *“I like the cover," he said. "Don't Panic. It's the first helpful or intelligible thing anybody's said to me all
//! day.”*
//!
//! You can implement the [`Error`] trait to create your own parser errors, or you can use one provided by the crate
//! like [`Simple`] or [`Cheap`].
use super::*;
use alloc::{format, string::ToString};
use core::hash::Hash;
#[cfg(not(feature = "std"))]
use hashbrown::HashSet;
#[cfg(feature = "std")]
use std::collections::HashSet;
// (ahash + std) => ahash
// (ahash) => ahash
// (std) => std
// () => ahash
#[cfg(any(feature = "ahash", not(feature = "std")))]
type RandomState = hashbrown::hash_map::DefaultHashBuilder;
#[cfg(all(not(feature = "ahash"), feature = "std"))]
type RandomState = std::collections::hash_map::RandomState;
/// A trait that describes parser error types.
///
/// If you have a custom error type in your compiler, or your needs are not sufficiently met by [`Simple`], you should
/// implement this trait. If your error type has 'extra' features that allow for more specific error messages, you can
/// use the [`Parser::map_err`] or [`Parser::try_map`] functions to take advantage of these inline within your parser.
///
/// # Examples
///
/// ```
/// # use chumsky::{prelude::*, error::Cheap};
/// type Span = std::ops::Range<usize>;
///
/// // A custom error type
/// #[derive(Debug, PartialEq)]
/// enum MyError {
/// ExpectedFound(Span, Vec<Option<char>>, Option<char>),
/// NotADigit(Span, char),
/// }
///
/// impl chumsky::Error<char> for MyError {
/// type Span = Span;
/// type Label = ();
///
/// fn expected_input_found<Iter: IntoIterator<Item = Option<char>>>(
/// span: Span,
/// expected: Iter,
/// found: Option<char>,
/// ) -> Self {
/// Self::ExpectedFound(span, expected.into_iter().collect(), found)
/// }
///
/// fn with_label(mut self, label: Self::Label) -> Self { self }
///
/// fn merge(mut self, mut other: Self) -> Self {
/// if let (Self::ExpectedFound(_, expected, _), Self::ExpectedFound(_, expected_other, _)) = (
/// &mut self,
/// &mut other,
/// ) {
/// expected.append(expected_other);
/// }
/// self
/// }
/// }
///
/// let numeral = filter_map(|span, c: char| match c.to_digit(10) {
/// Some(x) => Ok(x),
/// None => Err(MyError::NotADigit(span, c)),
/// });
///
/// assert_eq!(numeral.parse("3"), Ok(3));
/// assert_eq!(numeral.parse("7"), Ok(7));
/// assert_eq!(numeral.parse("f"), Err(vec![MyError::NotADigit(0..1, 'f')]));
/// ```
pub trait Error<I>: Sized {
/// The type of spans to be used in the error.
type Span: Span; // TODO: Default to = Range<usize>;
/// The label used to describe a syntactic structure currently being parsed.
///
/// This can be used to generate errors that tell the user what syntactic structure was currently being parsed when
/// the error occurred.
type Label; // TODO: Default to = &'static str;
/// Create a new error describing a conflict between expected inputs and that which was actually found.
///
/// `found` having the value `None` indicates that the end of input was reached, but was not expected.
///
/// An expected input having the value `None` indicates that the end of input was expected.
fn expected_input_found<Iter: IntoIterator<Item = Option<I>>>(
span: Self::Span,
expected: Iter,
found: Option<I>,
) -> Self;
/// Create a new error describing a delimiter that was not correctly closed.
///
/// Provided to this function is the span of the unclosed delimiter, the delimiter itself, the span of the input
/// that was found in its place, the closing delimiter that was expected but not found, and the input that was
/// found in its place.
///
/// The default implementation of this function uses [`Error::expected_input_found`], but you'll probably want to
/// implement it yourself to take full advantage of the extra diagnostic information.
fn unclosed_delimiter(
unclosed_span: Self::Span,
unclosed: I,
span: Self::Span,
expected: I,
found: Option<I>,
) -> Self {
#![allow(unused_variables)]
Self::expected_input_found(span, Some(Some(expected)), found)
}
/// Indicate that the error occurred while parsing a particular syntactic structure.
///
/// How the error handles this information is up to it. It can append it to a list of structures to get a sort of
/// 'parse backtrace', or it can just keep only the most recent label. If the latter, this method should have no
/// effect when the error already has a label.
fn with_label(self, label: Self::Label) -> Self;
/// Merge two errors that point to the same input together, combining their information.
fn merge(self, other: Self) -> Self;
}
// /// A simple default input pattern that allows describing inputs and input patterns in error messages.
// #[derive(Clone, Debug, PartialEq, Eq, Hash)]
// pub enum SimplePattern<I> {
// /// A pattern with the given name was expected.
// Labelled(&'static str),
// /// A specific input was expected.
// Token(I),
// }
// impl<I> From<&'static str> for SimplePattern<I> {
// fn from(s: &'static str) -> Self { Self::Labelled(s) }
// }
// impl<I: fmt::Display> fmt::Display for SimplePattern<I> {
// fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// match self {
// Self::Labelled(s) => write!(f, "{}", s),
// Self::Token(x) => write!(f, "'{}'", x),
// }
// }
// }
/// A type representing possible reasons for an error.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum SimpleReason<I, S> {
/// An unexpected input was found.
Unexpected,
/// An unclosed delimiter was found.
Unclosed {
/// The span of the unclosed delimiter.
span: S,
/// The unclosed delimiter.
delimiter: I,
},
/// An error with a custom message occurred.
Custom(String),
}
impl<I: fmt::Display, S: fmt::Display> fmt::Display for SimpleReason<I, S> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
const DEFAULT_DISPLAY_UNEXPECTED: &str = "unexpected input";
match self {
Self::Unexpected => write!(f, "{}", DEFAULT_DISPLAY_UNEXPECTED),
Self::Unclosed { span, delimiter } => {
write!(f, "unclosed delimiter ({}) in {}", span, delimiter)
}
Self::Custom(string) => write!(f, "error {}", string),
}
}
}
/// A type representing zero, one, or many labels applied to an error
#[derive(Clone, Copy, Debug, PartialEq)]
enum SimpleLabel {
Some(&'static str),
None,
Multi,
}
impl SimpleLabel {
fn merge(self, other: Self) -> Self {
match (self, other) {
(SimpleLabel::Some(a), SimpleLabel::Some(b)) if a == b => SimpleLabel::Some(a),
(SimpleLabel::Some(_), SimpleLabel::Some(_)) => SimpleLabel::Multi,
(SimpleLabel::Multi, _) => SimpleLabel::Multi,
(_, SimpleLabel::Multi) => SimpleLabel::Multi,
(SimpleLabel::None, x) => x,
(x, SimpleLabel::None) => x,
}
}
}
impl From<SimpleLabel> for Option<&'static str> {
fn from(label: SimpleLabel) -> Self {
match label {
SimpleLabel::Some(s) => Some(s),
_ => None,
}
}
}
/// A simple default error type that tracks error spans, expected inputs, and the actual input found at an error site.
///
/// Please note that it uses a [`HashSet`] to remember expected symbols. If you find this to be too slow, you can
/// implement [`Error`] for your own error type or use [`Cheap`] instead.
#[derive(Clone, Debug)]
pub struct Simple<I: Hash + Eq, S = Range<usize>> {
span: S,
reason: SimpleReason<I, S>,
expected: HashSet<Option<I>, RandomState>,
found: Option<I>,
label: SimpleLabel,
}
impl<I: Hash + Eq, S: Clone> Simple<I, S> {
/// Create an error with a custom error message.
pub fn custom<M: ToString>(span: S, msg: M) -> Self {
Self {
span,
reason: SimpleReason::Custom(msg.to_string()),
expected: HashSet::default(),
found: None,
label: SimpleLabel::None,
}
}
/// Returns the span that the error occurred at.
pub fn span(&self) -> S {
self.span.clone()
}
/// Returns an iterator over possible expected patterns.
pub fn expected(&self) -> impl ExactSizeIterator<Item = &Option<I>> + '_ {
self.expected.iter()
}
/// Returns the input, if any, that was found instead of an expected pattern.
pub fn found(&self) -> Option<&I> {
self.found.as_ref()
}
/// Returns the reason for the error.
pub fn reason(&self) -> &SimpleReason<I, S> {
&self.reason
}
/// Returns the error's label, if any.
pub fn label(&self) -> Option<&'static str> {
self.label.into()
}
/// Map the error's inputs using the given function.
///
/// This can be used to unify the errors between parsing stages that operate upon two forms of input (for example,
/// the initial lexing stage and the parsing stage in most compilers).
pub fn map<U: Hash + Eq, F: FnMut(I) -> U>(self, mut f: F) -> Simple<U, S> {
Simple {
span: self.span,
reason: match self.reason {
SimpleReason::Unclosed { span, delimiter } => SimpleReason::Unclosed {
span,
delimiter: f(delimiter),
},
SimpleReason::Unexpected => SimpleReason::Unexpected,
SimpleReason::Custom(msg) => SimpleReason::Custom(msg),
},
expected: self.expected.into_iter().map(|e| e.map(&mut f)).collect(),
found: self.found.map(f),
label: self.label,
}
}
}
impl<I: Hash + Eq, S: Span + Clone + fmt::Debug> Error<I> for Simple<I, S> {
type Span = S;
type Label = &'static str;
fn expected_input_found<Iter: IntoIterator<Item = Option<I>>>(
span: Self::Span,
expected: Iter,
found: Option<I>,
) -> Self {
Self {
span,
reason: SimpleReason::Unexpected,
expected: expected.into_iter().collect(),
found,
label: SimpleLabel::None,
}
}
fn unclosed_delimiter(
unclosed_span: Self::Span,
delimiter: I,
span: Self::Span,
expected: I,
found: Option<I>,
) -> Self {
Self {
span,
reason: SimpleReason::Unclosed {
span: unclosed_span,
delimiter,
},
expected: core::iter::once(Some(expected)).collect(),
found,
label: SimpleLabel::None,
}
}
fn with_label(mut self, label: Self::Label) -> Self {
match self.label {
SimpleLabel::Some(_) => {}
_ => {
self.label = SimpleLabel::Some(label);
}
}
self
}
fn merge(mut self, other: Self) -> Self {
// TODO: Assert that `self.span == other.span` here?
self.reason = match (&self.reason, &other.reason) {
(SimpleReason::Unclosed { .. }, _) => self.reason,
(_, SimpleReason::Unclosed { .. }) => other.reason,
_ => self.reason,
};
self.label = self.label.merge(other.label);
for expected in other.expected {
self.expected.insert(expected);
}
self
}
}
impl<I: Hash + Eq, S: PartialEq> PartialEq for Simple<I, S> {
fn eq(&self, other: &Self) -> bool {
self.span == other.span
&& self.found == other.found
&& self.reason == other.reason
&& self.label == other.label
}
}
impl<I: Hash + Eq, S: Eq> Eq for Simple<I, S> {}
impl<I: fmt::Display + Hash + Eq, S: Span> fmt::Display for Simple<I, S> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// TODO: Take `self.reason` into account
if let Some(found) = &self.found {
write!(f, "found {:?}", found.to_string())?;
} else {
write!(f, "found end of input")?;
};
match self.expected.len() {
0 => {} //write!(f, " but end of input was expected")?,
1 => write!(
f,
" but expected {}",
match self.expected.iter().next().unwrap() {
Some(x) => format!("{:?}", x.to_string()),
None => "end of input".to_string(),
},
)?,
_ => {
write!(
f,
" but expected one of {}",
self.expected
.iter()
.map(|expected| match expected {
Some(x) => format!("{:?}", x.to_string()),
None => "end of input".to_string(),
})
.collect::<Vec<_>>()
.join(", ")
)?;
}
}
Ok(())
}
}
#[cfg(feature = "std")]
impl<I: fmt::Debug + fmt::Display + Hash + Eq, S: Span + fmt::Display + fmt::Debug>
std::error::Error for Simple<I, S>
{
}
/// A minimal error type that tracks only the error span and label. This type is most useful when you want fast parsing
/// but do not particularly care about the quality of error messages.
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Cheap<I, S = Range<usize>> {
span: S,
label: Option<&'static str>,
phantom: PhantomData<I>,
}
impl<I, S: Clone> Cheap<I, S> {
/// Returns the span that the error occurred at.
pub fn span(&self) -> S {
self.span.clone()
}
/// Returns the error's label, if any.
pub fn label(&self) -> Option<&'static str> {
self.label
}
}
impl<I, S: Span + Clone + fmt::Debug> Error<I> for Cheap<I, S> {
type Span = S;
type Label = &'static str;
fn expected_input_found<Iter: IntoIterator<Item = Option<I>>>(
span: Self::Span,
_: Iter,
_: Option<I>,
) -> Self {
Self {
span,
label: None,
phantom: PhantomData,
}
}
fn with_label(mut self, label: Self::Label) -> Self {
self.label.get_or_insert(label);
self
}
fn merge(self, _: Self) -> Self {
self
}
}
/// An internal type used to facilitate error prioritisation. You shouldn't need to interact with this type during
/// normal use of the crate.
pub struct Located<I, E> {
pub(crate) at: usize,
pub(crate) error: E,
pub(crate) phantom: PhantomData<I>,
}
impl<I, E: Error<I>> Located<I, E> {
/// Create a new [`Located`] with the give input position and error.
pub fn at(at: usize, error: E) -> Self {
Self {
at,
error,
phantom: PhantomData,
}
}
/// Get the maximum of two located errors. If they hold the same position in the input, merge them.
pub fn max(self, other: impl Into<Option<Self>>) -> Self {
let other = match other.into() {
Some(other) => other,
None => return self,
};
match self.at.cmp(&other.at) {
Ordering::Greater => self,
Ordering::Less => other,
Ordering::Equal => Self {
error: self.error.merge(other.error),
..self
},
}
}
/// Map the error with the given function.
pub fn map<U, F: FnOnce(E) -> U>(self, f: F) -> Located<I, U> {
Located {
at: self.at,
error: f(self.error),
phantom: PhantomData,
}
}
}
// Merge two alternative errors
pub(crate) fn merge_alts<I, E: Error<I>, T: IntoIterator<Item = Located<I, E>>>(
mut error: Option<Located<I, E>>,
errors: T,
) -> Option<Located<I, E>> {
for other in errors {
match (error, other) {
(Some(a), b) => {
error = Some(b.max(a));
}
(None, b) => {
error = Some(b);
}
}
}
error
}

1495
vendor/chumsky/src/lib.rs vendored Normal file

File diff suppressed because it is too large Load Diff

1142
vendor/chumsky/src/primitive.rs vendored Normal file

File diff suppressed because it is too large Load Diff

458
vendor/chumsky/src/recovery.rs vendored Normal file
View File

@@ -0,0 +1,458 @@
//! Types and traits that facilitate error recovery.
//!
//! *“Do you find coming to terms with the mindless tedium of it all presents an interesting challenge?”*
use super::*;
/// A trait implemented by error recovery strategies.
pub trait Strategy<I: Clone, O, E: Error<I>> {
/// Recover from a parsing failure.
fn recover<D: Debugger, P: Parser<I, O, Error = E>>(
&self,
recovered_errors: Vec<Located<I, P::Error>>,
fatal_error: Located<I, P::Error>,
parser: P,
debugger: &mut D,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error>;
}
/// See [`skip_then_retry_until`].
#[must_use]
#[derive(Copy, Clone)]
pub struct SkipThenRetryUntil<I, const N: usize>(
pub(crate) [I; N],
pub(crate) bool,
pub(crate) bool,
);
impl<I, const N: usize> SkipThenRetryUntil<I, N> {
/// Alters this recovery strategy so that the first token will always be skipped.
///
/// This is useful when the input being searched for also appears at the beginning of the pattern that failed to
/// parse.
pub fn skip_start(self) -> Self {
Self(self.0, self.1, true)
}
/// Alters this recovery strategy so that the synchronisation token will be consumed during recovery.
///
/// This is useful when the input being searched for is a delimiter of a prior pattern rather than the start of a
/// new pattern and hence is no longer important once recovery has occurred.
pub fn consume_end(self) -> Self {
Self(self.0, true, self.2)
}
}
impl<I: Clone + PartialEq, O, E: Error<I>, const N: usize> Strategy<I, O, E>
for SkipThenRetryUntil<I, N>
{
fn recover<D: Debugger, P: Parser<I, O, Error = E>>(
&self,
a_errors: Vec<Located<I, P::Error>>,
a_err: Located<I, P::Error>,
parser: P,
debugger: &mut D,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error> {
if self.2 {
let _ = stream.next();
}
loop {
#[allow(deprecated)]
let (mut errors, res) = stream.try_parse(|stream| {
#[allow(deprecated)]
debugger.invoke(&parser, stream)
});
if let Ok(out) = res {
errors.push(a_err);
break (errors, Ok(out));
}
#[allow(clippy::blocks_in_if_conditions)]
if !stream.attempt(
|stream| match stream.next().2.map(|tok| self.0.contains(&tok)) {
Some(true) => (self.1, false),
Some(false) => (true, true),
None => (false, false),
},
) {
break (a_errors, Err(a_err));
}
}
}
}
/// A recovery mode that simply skips to the next input on parser failure and tries again, until reaching one of
/// several inputs.
///
/// Also see [`SkipThenRetryUntil::consume_end`].
///
/// This strategy is very 'stupid' and can result in very poor error generation in some languages. Place this strategy
/// after others as a last resort, and be careful about over-using it.
pub fn skip_then_retry_until<I, const N: usize>(until: [I; N]) -> SkipThenRetryUntil<I, N> {
SkipThenRetryUntil(until, false, true)
}
/// See [`skip_until`].
#[must_use]
#[derive(Copy, Clone)]
pub struct SkipUntil<I, F, const N: usize>(
pub(crate) [I; N],
pub(crate) F,
pub(crate) bool,
pub(crate) bool,
);
impl<I, F, const N: usize> SkipUntil<I, F, N> {
/// Alters this recovery strategy so that the first token will always be skipped.
///
/// This is useful when the input being searched for also appears at the beginning of the pattern that failed to
/// parse.
pub fn skip_start(self) -> Self {
Self(self.0, self.1, self.2, true)
}
/// Alters this recovery strategy so that the synchronisation token will be consumed during recovery.
///
/// This is useful when the input being searched for is a delimiter of a prior pattern rather than the start of a
/// new pattern and hence is no longer important once recovery has occurred.
pub fn consume_end(self) -> Self {
Self(self.0, self.1, true, self.3)
}
}
impl<I: Clone + PartialEq, O, F: Fn(E::Span) -> O, E: Error<I>, const N: usize> Strategy<I, O, E>
for SkipUntil<I, F, N>
{
fn recover<D: Debugger, P: Parser<I, O, Error = E>>(
&self,
mut a_errors: Vec<Located<I, P::Error>>,
a_err: Located<I, P::Error>,
_parser: P,
_debugger: &mut D,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error> {
let pre_state = stream.save();
if self.3 {
let _ = stream.next();
}
a_errors.push(a_err);
loop {
match stream.attempt(|stream| {
let (at, span, tok) = stream.next();
match tok.map(|tok| self.0.contains(&tok)) {
Some(true) => (self.2, Ok(true)),
Some(false) => (true, Ok(false)),
None => (true, Err((at, span))),
}
}) {
Ok(true) => break (a_errors, Ok(((self.1)(stream.span_since(pre_state)), None))),
Ok(false) => {}
Err(_) if stream.save() > pre_state => {
break (a_errors, Ok(((self.1)(stream.span_since(pre_state)), None)))
}
Err((at, span)) => {
break (
a_errors,
Err(Located::at(
at,
E::expected_input_found(span, self.0.iter().cloned().map(Some), None),
)),
)
}
}
}
}
}
/// A recovery mode that skips input until one of several inputs is found.
///
/// Also see [`SkipUntil::consume_end`].
///
/// This strategy is very 'stupid' and can result in very poor error generation in some languages. Place this strategy
/// after others as a last resort, and be careful about over-using it.
pub fn skip_until<I, F, const N: usize>(until: [I; N], fallback: F) -> SkipUntil<I, F, N> {
SkipUntil(until, fallback, false, false)
}
/// See [`nested_delimiters`].
#[must_use]
#[derive(Copy, Clone)]
pub struct NestedDelimiters<I, F, const N: usize>(
pub(crate) I,
pub(crate) I,
pub(crate) [(I, I); N],
pub(crate) F,
);
impl<I: Clone + PartialEq, O, F: Fn(E::Span) -> O, E: Error<I>, const N: usize> Strategy<I, O, E>
for NestedDelimiters<I, F, N>
{
// This looks like something weird with clippy, it warns in a weird spot and isn't fixed by
// marking it at the spot.
#[allow(clippy::blocks_in_if_conditions)]
fn recover<D: Debugger, P: Parser<I, O, Error = E>>(
&self,
mut a_errors: Vec<Located<I, P::Error>>,
a_err: Located<I, P::Error>,
_parser: P,
_debugger: &mut D,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error> {
let mut balance = 0;
let mut balance_others = [0; N];
let mut starts = Vec::new();
let mut error = None;
let pre_state = stream.save();
let recovered = loop {
if match stream.next() {
(_, span, Some(t)) if t == self.0 => {
balance += 1;
starts.push(span);
true
}
(_, _, Some(t)) if t == self.1 => {
balance -= 1;
starts.pop();
true
}
(at, span, Some(t)) => {
for (balance_other, others) in balance_others.iter_mut().zip(self.2.iter()) {
if t == others.0 {
*balance_other += 1;
} else if t == others.1 {
*balance_other -= 1;
if *balance_other < 0 && balance == 1 {
// stream.revert(pre_state);
error.get_or_insert_with(|| {
Located::at(
at,
P::Error::unclosed_delimiter(
starts.pop().unwrap(),
self.0.clone(),
span.clone(),
self.1.clone(),
Some(t.clone()),
),
)
});
}
}
}
false
}
(at, span, None) => {
if balance > 0 && balance == 1 {
error.get_or_insert_with(|| match starts.pop() {
Some(start) => Located::at(
at,
P::Error::unclosed_delimiter(
start,
self.0.clone(),
span,
self.1.clone(),
None,
),
),
None => Located::at(
at,
P::Error::expected_input_found(
span,
Some(Some(self.1.clone())),
None,
),
),
});
}
break false;
}
} {
match balance.cmp(&0) {
Ordering::Equal => break true,
// The end of a delimited section is not a valid recovery pattern
Ordering::Less => break false,
Ordering::Greater => (),
}
} else if balance == 0 {
// A non-delimiter input before anything else is not a valid recovery pattern
break false;
}
};
if let Some(e) = error {
a_errors.push(e);
}
if recovered {
if a_errors.last().map_or(true, |e| a_err.at < e.at) {
a_errors.push(a_err);
}
(a_errors, Ok(((self.3)(stream.span_since(pre_state)), None)))
} else {
(a_errors, Err(a_err))
}
}
}
/// A recovery strategy that searches for a start and end delimiter, respecting nesting.
///
/// It is possible to specify additional delimiter pairs that are valid in the pattern's context for better errors. For
/// example, you might want to also specify `[('[', ']'), ('{', '}')]` when recovering a parenthesised expression as
/// this can aid in detecting delimiter mismatches.
///
/// A function that generates a fallback output on recovery is also required.
pub fn nested_delimiters<I: PartialEq, F, const N: usize>(
start: I,
end: I,
others: [(I, I); N],
fallback: F,
) -> NestedDelimiters<I, F, N> {
assert!(
start != end,
"Start and end delimiters cannot be the same when using `NestedDelimiters`"
);
NestedDelimiters(start, end, others, fallback)
}
/// See [`skip_parser`].
#[derive(Copy, Clone)]
pub struct SkipParser<R>(pub(crate) R);
impl<I: Clone + PartialEq, O, R: Parser<I, O, Error = E>, E: Error<I>> Strategy<I, O, E>
for SkipParser<R>
{
fn recover<D: Debugger, P: Parser<I, O, Error = E>>(
&self,
mut a_errors: Vec<Located<I, P::Error>>,
a_err: Located<I, P::Error>,
_parser: P,
debugger: &mut D,
stream: &mut StreamOf<I, P::Error>,
) -> PResult<I, O, P::Error> {
a_errors.push(a_err);
let (mut errors, res) = self.0.parse_inner(debugger, stream);
a_errors.append(&mut errors);
(a_errors, res)
}
}
/// A recovery mode that applies the provided recovery parser to determine the content to skip.
///
/// ```
/// # use chumsky::prelude::*;
/// #[derive(Clone, Debug, PartialEq, Eq, Hash)]
/// enum Token {
/// GoodKeyword,
/// BadKeyword,
/// Newline,
/// }
///
/// #[derive(Clone, Debug, PartialEq, Eq, Hash)]
/// enum AST {
/// GoodLine,
/// Error,
/// }
///
/// // The happy path...
/// let goodline = just::<Token, _, Simple<_>>(Token::GoodKeyword)
/// .ignore_then(none_of(Token::Newline).repeated().to(AST::GoodLine))
/// .then_ignore(just(Token::Newline));
///
/// // If it fails, swallow everything up to a newline, but only if the line
/// // didn't contain BadKeyword which marks an alternative parse route that
/// // we want to accept instead.
/// let goodline_with_recovery = goodline.recover_with(skip_parser(
/// none_of([Token::Newline, Token::BadKeyword])
/// .repeated()
/// .then_ignore(just(Token::Newline))
/// .to(AST::Error),
/// ));
/// ```
pub fn skip_parser<R>(recovery_parser: R) -> SkipParser<R> {
SkipParser(recovery_parser)
}
/// A parser that includes a fallback recovery strategy should parsing result in an error.
#[must_use]
#[derive(Copy, Clone)]
pub struct Recovery<A, S>(pub(crate) A, pub(crate) S);
impl<I: Clone, O, A: Parser<I, O, Error = E>, S: Strategy<I, O, E>, E: Error<I>> Parser<I, O>
for Recovery<A, S>
{
type Error = E;
fn parse_inner<D: Debugger>(
&self,
debugger: &mut D,
stream: &mut StreamOf<I, E>,
) -> PResult<I, O, E> {
match stream.try_parse(|stream| {
#[allow(deprecated)]
debugger.invoke(&self.0, stream)
}) {
(a_errors, Ok(a_out)) => (a_errors, Ok(a_out)),
(a_errors, Err(a_err)) => self.1.recover(a_errors, a_err, &self.0, debugger, stream),
}
}
fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf<I, E>) -> PResult<I, O, E> {
#[allow(deprecated)]
self.parse_inner(d, s)
}
fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf<I, E>) -> PResult<I, O, E> {
#[allow(deprecated)]
self.parse_inner(d, s)
}
}
#[cfg(test)]
mod tests {
use crate::error::Cheap;
use crate::prelude::*;
#[test]
fn recover_with_skip_then_retry_until() {
let parser = just::<_, _, Cheap<_>>('a')
.recover_with(skip_then_retry_until([',']))
.separated_by(just(','));
{
let (result, errors) = parser.parse_recovery("a,a,2a,a");
assert_eq!(result, Some(vec!['a', 'a', 'a', 'a']));
assert_eq!(errors.len(), 1)
}
{
let (result, errors) = parser.parse_recovery("a,a,2 a,a");
assert_eq!(result, Some(vec!['a', 'a', 'a', 'a']));
assert_eq!(errors.len(), 1)
}
{
let (result, errors) = parser.parse_recovery("a,a,2 a,a");
assert_eq!(result, Some(vec!['a', 'a', 'a', 'a']));
assert_eq!(errors.len(), 1)
}
}
#[test]
fn until_nothing() {
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum Token {
Foo,
Bar,
}
fn lexer() -> impl Parser<char, Token, Error = Simple<char>> {
let foo = just("foo").to(Token::Foo);
let bar = just("bar").to(Token::Bar);
choice((foo, bar)).recover_with(skip_then_retry_until([]))
}
let (result, errors) = lexer().parse_recovery("baz");
assert_eq!(result, None);
assert_eq!(errors.len(), 1);
}
}

226
vendor/chumsky/src/recursive.rs vendored Normal file
View File

@@ -0,0 +1,226 @@
//! Recursive parsers (parser that include themselves within their patterns).
//!
//! *“It's unpleasantly like being drunk."
//! "What's so unpleasant about being drunk?"
//! "You ask a glass of water.”*
//!
//! The [`recursive()`] function covers most cases, but sometimes it's necessary to manually control the declaration and
//! definition of parsers more corefully, particularly for mutually-recursive parsers. In such cases, the functions on
//! [`Recursive`] allow for this.
use super::*;
use alloc::rc::{Rc, Weak};
// TODO: Remove when `OnceCell` is stable
struct OnceCell<T>(core::cell::RefCell<Option<T>>);
impl<T> OnceCell<T> {
pub fn new() -> Self {
Self(core::cell::RefCell::new(None))
}
pub fn set(&self, x: T) -> Result<(), ()> {
let mut inner = self.0.try_borrow_mut().map_err(|_| ())?;
if inner.is_none() {
*inner = Some(x);
Ok(())
} else {
Err(())
}
}
pub fn get(&self) -> Option<core::cell::Ref<T>> {
Some(core::cell::Ref::map(self.0.borrow(), |x| {
x.as_ref().unwrap()
}))
}
}
enum RecursiveInner<T> {
Owned(Rc<T>),
Unowned(Weak<T>),
}
type OnceParser<'a, I, O, E> = OnceCell<Box<dyn Parser<I, O, Error = E> + 'a>>;
/// A parser that can be defined in terms of itself by separating its [declaration](Recursive::declare) from its
/// [definition](Recursive::define).
///
/// Prefer to use [`recursive()`], which exists as a convenient wrapper around both operations, if possible.
#[must_use]
pub struct Recursive<'a, I, O, E: Error<I>>(RecursiveInner<OnceParser<'a, I, O, E>>);
impl<'a, I: Clone, O, E: Error<I>> Recursive<'a, I, O, E> {
fn cell(&self) -> Rc<OnceParser<'a, I, O, E>> {
match &self.0 {
RecursiveInner::Owned(x) => x.clone(),
RecursiveInner::Unowned(x) => x
.upgrade()
.expect("Recursive parser used before being defined"),
}
}
/// Declare the existence of a recursive parser, allowing it to be used to construct parser combinators before
/// being fulled defined.
///
/// Declaring a parser before defining it is required for a parser to reference itself.
///
/// This should be followed by **exactly one** call to the [`Recursive::define`] method prior to using the parser
/// for parsing (i.e: via the [`Parser::parse`] method or similar).
///
/// Prefer to use [`recursive()`], which is a convenient wrapper around this method and [`Recursive::define`], if
/// possible.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// #[derive(Debug, PartialEq)]
/// enum Chain {
/// End,
/// Link(char, Box<Chain>),
/// }
///
/// // Declare the existence of the parser before defining it so that it can reference itself
/// let mut chain = Recursive::<_, _, Simple<char>>::declare();
///
/// // Define the parser in terms of itself.
/// // In this case, the parser parses a right-recursive list of '+' into a singly linked list
/// chain.define(just('+')
/// .then(chain.clone())
/// .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
/// .or_not()
/// .map(|chain| chain.unwrap_or(Chain::End)));
///
/// assert_eq!(chain.parse(""), Ok(Chain::End));
/// assert_eq!(
/// chain.parse("++"),
/// Ok(Chain::Link('+', Box::new(Chain::Link('+', Box::new(Chain::End))))),
/// );
/// ```
pub fn declare() -> Self {
Recursive(RecursiveInner::Owned(Rc::new(OnceCell::new())))
}
/// Defines the parser after declaring it, allowing it to be used for parsing.
pub fn define<P: Parser<I, O, Error = E> + 'a>(&mut self, parser: P) {
self.cell()
.set(Box::new(parser))
.unwrap_or_else(|_| panic!("Parser defined more than once"));
}
}
impl<'a, I: Clone, O, E: Error<I>> Clone for Recursive<'a, I, O, E> {
fn clone(&self) -> Self {
Self(match &self.0 {
RecursiveInner::Owned(x) => RecursiveInner::Owned(x.clone()),
RecursiveInner::Unowned(x) => RecursiveInner::Unowned(x.clone()),
})
}
}
impl<'a, I: Clone, O, E: Error<I>> Parser<I, O> for Recursive<'a, I, O, E> {
type Error = E;
fn parse_inner<D: Debugger>(
&self,
debugger: &mut D,
stream: &mut StreamOf<I, Self::Error>,
) -> PResult<I, O, Self::Error> {
#[cfg(feature = "stacker")]
#[inline(always)]
fn recurse<R, F: FnOnce() -> R>(f: F) -> R {
stacker::maybe_grow(1024 * 1024, 1024 * 1024, f)
}
#[cfg(not(feature = "stacker"))]
#[inline(always)]
fn recurse<R, F: FnOnce() -> R>(f: F) -> R {
f()
}
recurse(|| {
#[allow(deprecated)]
debugger.invoke(
self.cell()
.get()
.expect("Recursive parser used before being defined")
.as_ref(),
stream,
)
})
}
fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf<I, E>) -> PResult<I, O, E> {
#[allow(deprecated)]
self.parse_inner(d, s)
}
fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf<I, E>) -> PResult<I, O, E> {
#[allow(deprecated)]
self.parse_inner(d, s)
}
}
/// Construct a recursive parser (i.e: a parser that may contain itself as part of its pattern).
///
/// The given function must create the parser. The parser must not be used to parse input before this function returns.
///
/// This is a wrapper around [`Recursive::declare`] and [`Recursive::define`].
///
/// The output type of this parser is `O`, the same as the inner parser.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// #[derive(Debug, PartialEq)]
/// enum Tree {
/// Leaf(String),
/// Branch(Vec<Tree>),
/// }
///
/// // Parser that recursively parses nested lists
/// let tree = recursive::<_, _, _, _, Simple<char>>(|tree| tree
/// .separated_by(just(','))
/// .delimited_by(just('['), just(']'))
/// .map(Tree::Branch)
/// .or(text::ident().map(Tree::Leaf))
/// .padded());
///
/// assert_eq!(tree.parse("hello"), Ok(Tree::Leaf("hello".to_string())));
/// assert_eq!(tree.parse("[a, b, c]"), Ok(Tree::Branch(vec![
/// Tree::Leaf("a".to_string()),
/// Tree::Leaf("b".to_string()),
/// Tree::Leaf("c".to_string()),
/// ])));
/// // The parser can deal with arbitrarily complex nested lists
/// assert_eq!(tree.parse("[[a, b], c, [d, [e, f]]]"), Ok(Tree::Branch(vec![
/// Tree::Branch(vec![
/// Tree::Leaf("a".to_string()),
/// Tree::Leaf("b".to_string()),
/// ]),
/// Tree::Leaf("c".to_string()),
/// Tree::Branch(vec![
/// Tree::Leaf("d".to_string()),
/// Tree::Branch(vec![
/// Tree::Leaf("e".to_string()),
/// Tree::Leaf("f".to_string()),
/// ]),
/// ]),
/// ])));
/// ```
pub fn recursive<
'a,
I: Clone,
O,
P: Parser<I, O, Error = E> + 'a,
F: FnOnce(Recursive<'a, I, O, E>) -> P,
E: Error<I>,
>(
f: F,
) -> Recursive<'a, I, O, E> {
let mut parser = Recursive::declare();
parser.define(f(Recursive(match &parser.0 {
RecursiveInner::Owned(x) => RecursiveInner::Unowned(Rc::downgrade(x)),
RecursiveInner::Unowned(_) => unreachable!(),
})));
parser
}

80
vendor/chumsky/src/span.rs vendored Normal file
View File

@@ -0,0 +1,80 @@
//! Types and traits related to spans.
//!
//! *“We demand rigidly defined areas of doubt and uncertainty!”*
//!
//! You can use the [`Span`] trait to connect up chumsky to your compiler's knowledge of the input source.
use core::ops::Range;
/// A trait that describes a span over a particular range of inputs.
///
/// Spans typically consist of some context, such as the file they originated from, and a start/end offset. Spans are
/// permitted to overlap one-another. The end offset must always be greater than or equal to the start offset.
///
/// Span is automatically implemented for [`Range<T>`] and [`(C, Range<T>)`].
pub trait Span: Clone {
/// Extra context used in a span.
///
/// This is usually some way to uniquely identity the source file that a span originated in such as the file's
/// path, URL, etc.
///
/// NOTE: Span contexts have no inherent meaning to Chumsky and can be anything. For example, [`Range<usize>`]'s
/// implementation of [`Span`] simply uses [`()`] as its context.
type Context: Clone;
/// A type representing a span's start or end offset from the start of the input.
///
/// Typically, [`usize`] is used.
///
/// NOTE: Offsets have no inherently meaning to Chumsky and are not used to decide how to prioritise errors. This
/// means that it's perfectly fine for tokens to have non-continuous spans that bear no relation to their actual
/// location in the input stream. This is useful for languages with an AST-level macro system that need to
/// correctly point to symbols in the macro input when producing errors.
type Offset: Clone;
/// Create a new span given a context and an offset range.
fn new(context: Self::Context, range: Range<Self::Offset>) -> Self;
/// Return the span's context.
fn context(&self) -> Self::Context;
/// Return the start offset of the span.
fn start(&self) -> Self::Offset;
/// Return the end offset of the span.
fn end(&self) -> Self::Offset;
}
impl<T: Clone + Ord> Span for Range<T> {
type Context = ();
type Offset = T;
fn new((): Self::Context, range: Self) -> Self {
range
}
fn context(&self) -> Self::Context {}
fn start(&self) -> Self::Offset {
self.start.clone()
}
fn end(&self) -> Self::Offset {
self.end.clone()
}
}
impl<C: Clone, T: Clone> Span for (C, Range<T>) {
type Context = C;
type Offset = T;
fn new(context: Self::Context, range: Range<T>) -> Self {
(context, range)
}
fn context(&self) -> Self::Context {
self.0.clone()
}
fn start(&self) -> Self::Offset {
self.1.start.clone()
}
fn end(&self) -> Self::Offset {
self.1.end.clone()
}
}

342
vendor/chumsky/src/stream.rs vendored Normal file
View File

@@ -0,0 +1,342 @@
//! Token streams and tools converting to and from them..
//!
//! *“Whats up?” “I dont know,” said Marvin, “Ive never been there.”*
//!
//! [`Stream`] is the primary type used to feed input data into a chumsky parser. You can create them in a number of
//! ways: from strings, iterators, arrays, etc.
use super::*;
use alloc::vec;
trait StreamExtend<T>: Iterator<Item = T> {
/// Extend the vector with input. The actual amount can be more or less than `n`, but must be at least 1 (0 implies
/// that the stream has been exhausted.
fn extend(&mut self, v: &mut Vec<T>, n: usize);
}
#[allow(deprecated)]
impl<I: Iterator> StreamExtend<I::Item> for I {
fn extend(&mut self, v: &mut Vec<I::Item>, n: usize) {
v.reserve(n);
v.extend(self.take(n));
}
}
/// A utility type used to flatten input trees. See [`Stream::from_nested`].
pub enum Flat<I, Iter> {
/// The input tree flattens into a single input.
Single(I),
/// The input tree flattens into many sub-trees.
Many(Iter),
}
/// A type that represents a stream of input tokens. Unlike [`Iterator`], this type supports backtracking and a few
/// other features required by the crate.
#[allow(deprecated)]
pub struct Stream<
'a,
I,
S: Span,
Iter: Iterator<Item = (I, S)> + ?Sized = dyn Iterator<Item = (I, S)> + 'a,
> {
pub(crate) phantom: PhantomData<&'a ()>,
pub(crate) eoi: S,
pub(crate) offset: usize,
pub(crate) buffer: Vec<(I, S)>,
pub(crate) iter: Iter,
}
/// A [`Stream`] that pulls tokens from a boxed [`Iterator`].
pub type BoxStream<'a, I, S> = Stream<'a, I, S, Box<dyn Iterator<Item = (I, S)> + 'a>>;
impl<'a, I, S: Span, Iter: Iterator<Item = (I, S)>> Stream<'a, I, S, Iter> {
/// Create a new stream from an iterator of `(Token, Span)` pairs. A span representing the end of input must also
/// be provided.
///
/// There is no requirement that spans must map exactly to the position of inputs in the stream, but they should
/// be non-overlapping and should appear in a monotonically-increasing order.
pub fn from_iter(eoi: S, iter: Iter) -> Self {
Self {
phantom: PhantomData,
eoi,
offset: 0,
buffer: Vec::new(),
iter,
}
}
/// Eagerly evaluate the token stream, returning an iterator over the tokens in it (but without modifying the
/// stream's state so that it can still be used for parsing).
///
/// This is most useful when you wish to check the input of a parser during debugging.
pub fn fetch_tokens(&mut self) -> impl Iterator<Item = (I, S)> + '_
where
(I, S): Clone,
{
self.buffer.extend(&mut self.iter);
self.buffer.iter().cloned()
}
}
impl<'a, I: Clone, S: Span + 'a> BoxStream<'a, I, S> {
/// Create a new `Stream` from an iterator of nested tokens and a function that flattens them.
///
/// It's not uncommon for compilers to perform delimiter parsing during the lexing stage (Rust does this!). When
/// this is done, the output of the lexing stage is usually a series of nested token trees. This functions allows
/// you to easily flatten such token trees into a linear token stream so that they can be parsed (Chumsky currently
/// only support parsing linear streams of inputs).
///
/// For reference, [here](https://docs.rs/syn/0.11.1/syn/enum.TokenTree.html) is `syn`'s `TokenTree` type that it
/// uses when parsing Rust syntax.
///
/// # Examples
///
/// ```
/// # use chumsky::{Stream, BoxStream, Flat};
/// type Span = std::ops::Range<usize>;
///
/// fn span_at(at: usize) -> Span { at..at + 1 }
///
/// #[derive(Clone)]
/// enum Token {
/// Local(String),
/// Int(i64),
/// Bool(bool),
/// Add,
/// Sub,
/// OpenParen,
/// CloseParen,
/// OpenBrace,
/// CloseBrace,
/// // etc.
/// }
///
/// enum Delimiter {
/// Paren, // ( ... )
/// Brace, // { ... }
/// }
///
/// // The structure of this token tree is very similar to that which Rust uses.
/// // See: https://docs.rs/syn/0.11.1/syn/enum.TokenTree.html
/// enum TokenTree {
/// Token(Token),
/// Tree(Delimiter, Vec<(TokenTree, Span)>),
/// }
///
/// // A function that turns a series of nested token trees into a linear stream that can be used for parsing.
/// fn flatten_tts(eoi: Span, token_trees: Vec<(TokenTree, Span)>) -> BoxStream<'static, Token, Span> {
/// use std::iter::once;
/// // Currently, this is quite an explicit process: it will likely become easier in future versions of Chumsky.
/// Stream::from_nested(
/// eoi,
/// token_trees.into_iter(),
/// |(tt, span)| match tt {
/// // For token trees that contain just a single token, no flattening needs to occur!
/// TokenTree::Token(token) => Flat::Single((token, span)),
/// // Flatten a parenthesised token tree into an iterator of the inner token trees, surrounded by parenthesis tokens
/// TokenTree::Tree(Delimiter::Paren, tree) => Flat::Many(once((TokenTree::Token(Token::OpenParen), span_at(span.start)))
/// .chain(tree.into_iter())
/// .chain(once((TokenTree::Token(Token::CloseParen), span_at(span.end - 1))))),
/// // Flatten a braced token tree into an iterator of the inner token trees, surrounded by brace tokens
/// TokenTree::Tree(Delimiter::Brace, tree) => Flat::Many(once((TokenTree::Token(Token::OpenBrace), span_at(span.start)))
/// .chain(tree.into_iter())
/// .chain(once((TokenTree::Token(Token::CloseBrace), span_at(span.end - 1))))),
/// }
/// )
/// }
/// ```
pub fn from_nested<
P: 'a,
Iter: Iterator<Item = (P, S)>,
Many: Iterator<Item = (P, S)>,
F: FnMut((P, S)) -> Flat<(I, S), Many> + 'a,
>(
eoi: S,
iter: Iter,
mut flatten: F,
) -> Self {
let mut v: Vec<alloc::collections::VecDeque<(P, S)>> = vec![iter.collect()];
Self::from_iter(
eoi,
Box::new(core::iter::from_fn(move || loop {
if let Some(many) = v.last_mut() {
match many.pop_front().map(&mut flatten) {
Some(Flat::Single(input)) => break Some(input),
Some(Flat::Many(many)) => v.push(many.collect()),
None => {
v.pop();
}
}
} else {
break None;
}
})),
)
}
}
impl<'a, I: Clone, S: Span> Stream<'a, I, S> {
pub(crate) fn offset(&self) -> usize {
self.offset
}
pub(crate) fn save(&self) -> usize {
self.offset
}
pub(crate) fn revert(&mut self, offset: usize) {
self.offset = offset;
}
fn pull_until(&mut self, offset: usize) -> Option<&(I, S)> {
let additional = offset.saturating_sub(self.buffer.len()) + 1024;
#[allow(deprecated)]
(&mut &mut self.iter as &mut dyn StreamExtend<_>).extend(&mut self.buffer, additional);
self.buffer.get(offset)
}
pub(crate) fn skip_if(&mut self, f: impl FnOnce(&I) -> bool) -> bool {
match self.pull_until(self.offset).cloned() {
Some((out, _)) if f(&out) => {
self.offset += 1;
true
}
Some(_) => false,
None => false,
}
}
pub(crate) fn next(&mut self) -> (usize, S, Option<I>) {
match self.pull_until(self.offset).cloned() {
Some((out, span)) => {
self.offset += 1;
(self.offset - 1, span, Some(out))
}
None => (self.offset, self.eoi.clone(), None),
}
}
pub(crate) fn span_since(&mut self, start_offset: usize) -> S {
debug_assert!(
start_offset <= self.offset,
"{} > {}",
self.offset,
start_offset
);
let start = self
.pull_until(start_offset)
.as_ref()
.map(|(_, s)| s.start())
.unwrap_or_else(|| self.eoi.start());
let end = self
.pull_until(self.offset.saturating_sub(1).max(start_offset))
.as_ref()
.map(|(_, s)| s.end())
.unwrap_or_else(|| self.eoi.end());
S::new(self.eoi.context(), start..end)
}
pub(crate) fn attempt<R, F: FnOnce(&mut Self) -> (bool, R)>(&mut self, f: F) -> R {
let old_offset = self.offset;
let (commit, out) = f(self);
if !commit {
self.offset = old_offset;
}
out
}
pub(crate) fn try_parse<O, E, F: FnOnce(&mut Self) -> PResult<I, O, E>>(
&mut self,
f: F,
) -> PResult<I, O, E> {
self.attempt(move |stream| {
let out = f(stream);
(out.1.is_ok(), out)
})
}
}
impl<'a> From<&'a str>
for Stream<'a, char, Range<usize>, Box<dyn Iterator<Item = (char, Range<usize>)> + 'a>>
{
/// Please note that Chumsky currently uses character indices and not byte offsets in this impl. This is likely to
/// change in the future. If you wish to use byte offsets, you can do so with [`Stream::from_iter`].
fn from(s: &'a str) -> Self {
let len = s.chars().count();
Self::from_iter(
len..len,
Box::new(s.chars().enumerate().map(|(i, c)| (c, i..i + 1))),
)
}
}
impl<'a> From<String>
for Stream<'a, char, Range<usize>, Box<dyn Iterator<Item = (char, Range<usize>)>>>
{
/// Please note that Chumsky currently uses character indices and not byte offsets in this impl. This is likely to
/// change in the future. If you wish to use byte offsets, you can do so with [`Stream::from_iter`].
fn from(s: String) -> Self {
let chars = s.chars().collect::<Vec<_>>();
Self::from_iter(
chars.len()..chars.len(),
Box::new(chars.into_iter().enumerate().map(|(i, c)| (c, i..i + 1))),
)
}
}
impl<'a, T: Clone> From<&'a [T]>
for Stream<'a, T, Range<usize>, Box<dyn Iterator<Item = (T, Range<usize>)> + 'a>>
{
fn from(s: &'a [T]) -> Self {
let len = s.len();
Self::from_iter(
len..len,
Box::new(s.iter().cloned().enumerate().map(|(i, x)| (x, i..i + 1))),
)
}
}
impl<'a, T: Clone + 'a> From<Vec<T>>
for Stream<'a, T, Range<usize>, Box<dyn Iterator<Item = (T, Range<usize>)> + 'a>>
{
fn from(s: Vec<T>) -> Self {
let len = s.len();
Self::from_iter(
len..len,
Box::new(s.into_iter().enumerate().map(|(i, x)| (x, i..i + 1))),
)
}
}
impl<'a, T: Clone + 'a, const N: usize> From<[T; N]>
for Stream<'a, T, Range<usize>, Box<dyn Iterator<Item = (T, Range<usize>)> + 'a>>
{
fn from(s: [T; N]) -> Self {
Self::from_iter(
N..N,
Box::new(
core::array::IntoIter::new(s)
.enumerate()
.map(|(i, x)| (x, i..i + 1)),
),
)
}
}
impl<'a, T: Clone, const N: usize> From<&'a [T; N]>
for Stream<'a, T, Range<usize>, Box<dyn Iterator<Item = (T, Range<usize>)> + 'a>>
{
fn from(s: &'a [T; N]) -> Self {
Self::from_iter(
N..N,
Box::new(s.iter().cloned().enumerate().map(|(i, x)| (x, i..i + 1))),
)
}
}
// impl<'a, T: Clone, S: Clone + Span<Context = ()>> From<&'a [(T, S)]> for Stream<'a, T, S, Box<dyn Iterator<Item = (T, S)> + 'a>>
// where S::Offset: Default
// {
// fn from(s: &'a [(T, S)]) -> Self {
// Self::from_iter(Default::default(), Box::new(s.iter().cloned()))
// }
// }

440
vendor/chumsky/src/text.rs vendored Normal file
View File

@@ -0,0 +1,440 @@
//! Text-specific parsers and utilities.
//!
//! *“Ford!" he said, "there's an infinite number of monkeys outside who want to talk to us about this script for
//! Hamlet they've worked out.”*
//!
//! The parsers in this module are generic over both Unicode ([`char`]) and ASCII ([`u8`]) characters. Most parsers take
//! a type parameter, `C`, that can be either [`u8`] or [`char`] in order to handle either case.
//!
//! The [`TextParser`] trait is an extension on top of the main [`Parser`] trait that adds combinators unique to the
//! parsing of text.
use super::*;
use core::iter::FromIterator;
/// The type of a parser that accepts (and ignores) any number of whitespace characters.
pub type Padding<I, E> = Custom<fn(&mut StreamOf<I, E>) -> PResult<I, (), E>, E>;
/// The type of a parser that accepts (and ignores) any number of whitespace characters before or after another
/// pattern.
// pub type Padded<P, I, O> = ThenIgnore<
// IgnoreThen<Padding<I, <P as Parser<I, O>>::Error>, P, (), O>,
// Padding<I, <P as Parser<I, O>>::Error>,
// O,
// (),
// >;
/// A parser that accepts (and ignores) any number of whitespace characters before or after another pattern.
#[must_use]
#[derive(Copy, Clone)]
pub struct Padded<A>(A);
impl<C: Character, O, A: Parser<C, O, Error = E>, E: Error<C>> Parser<C, O> for Padded<A> {
type Error = E;
#[inline]
fn parse_inner<D: Debugger>(
&self,
debugger: &mut D,
stream: &mut StreamOf<C, E>,
) -> PResult<C, O, E> {
while stream.skip_if(|c| c.is_whitespace()) {}
match self.0.parse_inner(debugger, stream) {
(a_errors, Ok((a_out, a_alt))) => {
while stream.skip_if(|c| c.is_whitespace()) {}
(a_errors, Ok((a_out, a_alt)))
}
(a_errors, Err(err)) => (a_errors, Err(err)),
}
}
#[inline]
fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf<C, E>) -> PResult<C, O, E> {
#[allow(deprecated)]
self.parse_inner(d, s)
}
#[inline]
fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf<C, E>) -> PResult<C, O, E> {
#[allow(deprecated)]
self.parse_inner(d, s)
}
}
mod private {
pub trait Sealed {}
impl Sealed for u8 {}
impl Sealed for char {}
}
/// A trait implemented by textual character types (currently, [`u8`] and [`char`]).
///
/// Avoid implementing this trait yourself if you can: it's *very* likely to be expanded in future versions!
pub trait Character: private::Sealed + Copy + PartialEq {
/// The default unsized [`str`]-like type of a linear sequence of this character.
///
/// For [`char`], this is [`str`]. For [`u8`], this is [`[u8]`].
type Str: ?Sized + PartialEq;
/// The default type that this character collects into.
///
/// For [`char`], this is [`String`]. For [`u8`], this is [`Vec<u8>`].
type Collection: Chain<Self> + FromIterator<Self> + AsRef<Self::Str> + 'static;
/// Convert the given ASCII character to this character type.
fn from_ascii(c: u8) -> Self;
/// Returns true if the character is canonically considered to be inline whitespace (i.e: not part of a newline).
fn is_inline_whitespace(&self) -> bool;
/// Returns true if the character is canonically considered to be whitespace.
fn is_whitespace(&self) -> bool;
/// Return the '0' digit of the character.
fn digit_zero() -> Self;
/// Returns true if the character is canonically considered to be a numeric digit.
fn is_digit(&self, radix: u32) -> bool;
/// Returns this character as a [`char`].
fn to_char(&self) -> char;
}
impl Character for u8 {
type Str = [u8];
type Collection = Vec<u8>;
fn from_ascii(c: u8) -> Self {
c
}
fn is_inline_whitespace(&self) -> bool {
*self == b' ' || *self == b'\t'
}
fn is_whitespace(&self) -> bool {
self.is_ascii_whitespace()
}
fn digit_zero() -> Self {
b'0'
}
fn is_digit(&self, radix: u32) -> bool {
(*self as char).is_digit(radix)
}
fn to_char(&self) -> char {
*self as char
}
}
impl Character for char {
type Str = str;
type Collection = String;
fn from_ascii(c: u8) -> Self {
c as char
}
fn is_inline_whitespace(&self) -> bool {
*self == ' ' || *self == '\t'
}
fn is_whitespace(&self) -> bool {
char::is_whitespace(*self)
}
fn digit_zero() -> Self {
'0'
}
fn is_digit(&self, radix: u32) -> bool {
char::is_digit(*self, radix)
}
fn to_char(&self) -> char {
*self
}
}
/// A trait containing text-specific functionality that extends the [`Parser`] trait.
pub trait TextParser<I: Character, O>: Parser<I, O> {
/// Parse a pattern, ignoring any amount of whitespace both before and after the pattern.
///
/// The output type of this parser is `O`, the same as the original parser.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let ident = text::ident::<_, Simple<char>>().padded();
///
/// // A pattern with no whitespace surrounding it is accepted
/// assert_eq!(ident.parse("hello"), Ok("hello".to_string()));
/// // A pattern with arbitrary whitespace surrounding it is also accepted
/// assert_eq!(ident.parse(" \t \n \t world \t "), Ok("world".to_string()));
/// ```
fn padded(self) -> Padded<Self>
where
Self: Sized,
{
Padded(self)
// whitespace().ignore_then(self).then_ignore(whitespace())
}
}
impl<I: Character, O, P: Parser<I, O>> TextParser<I, O> for P {}
/// A parser that accepts (and ignores) any number of whitespace characters.
///
/// This parser is a `Parser::Repeated` and so methods such as `at_least()` can be called on it.
///
/// The output type of this parser is `Vec<()>`.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let whitespace = text::whitespace::<_, Simple<char>>();
///
/// // Any amount of whitespace is parsed...
/// assert_eq!(whitespace.parse("\t \n \r "), Ok(vec![(), (), (), (), (), (), ()]));
/// // ...including none at all!
/// assert_eq!(whitespace.parse(""), Ok(vec![]));
/// ```
pub fn whitespace<'a, C: Character + 'a, E: Error<C> + 'a>(
) -> Repeated<impl Parser<C, (), Error = E> + Copy + Clone + 'a> {
filter(|c: &C| c.is_whitespace()).ignored().repeated()
}
/// A parser that accepts (and ignores) any newline characters or character sequences.
///
/// The output type of this parser is `()`.
///
/// This parser is quite extensive, recognising:
///
/// - Line feed (`\n`)
/// - Carriage return (`\r`)
/// - Carriage return + line feed (`\r\n`)
/// - Vertical tab (`\x0B`)
/// - Form feed (`\x0C`)
/// - Next line (`\u{0085}`)
/// - Line separator (`\u{2028}`)
/// - Paragraph separator (`\u{2029}`)
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let newline = text::newline::<char, Simple<char>>()
/// .then_ignore(end());
///
/// assert_eq!(newline.parse("\n"), Ok(()));
/// assert_eq!(newline.parse("\r"), Ok(()));
/// assert_eq!(newline.parse("\r\n"), Ok(()));
/// assert_eq!(newline.parse("\x0B"), Ok(()));
/// assert_eq!(newline.parse("\x0C"), Ok(()));
/// assert_eq!(newline.parse("\u{0085}"), Ok(()));
/// assert_eq!(newline.parse("\u{2028}"), Ok(()));
/// assert_eq!(newline.parse("\u{2029}"), Ok(()));
/// ```
#[must_use]
pub fn newline<'a, C: Character + 'a, E: Error<C> + 'a>(
) -> impl Parser<C, (), Error = E> + Copy + Clone + 'a {
just(C::from_ascii(b'\r'))
.or_not()
.ignore_then(just(C::from_ascii(b'\n')))
.or(filter(|c: &C| {
[
'\r', // Carriage return
'\x0B', // Vertical tab
'\x0C', // Form feed
'\u{0085}', // Next line
'\u{2028}', // Line separator
'\u{2029}', // Paragraph separator
]
.contains(&c.to_char())
}))
.ignored()
}
/// A parser that accepts one or more ASCII digits.
///
/// The output type of this parser is [`Character::Collection`] (i.e: [`String`] when `C` is [`char`], and [`Vec<u8>`]
/// when `C` is [`u8`]).
///
/// The `radix` parameter functions identically to [`char::is_digit`]. If in doubt, choose `10`.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let digits = text::digits::<_, Simple<char>>(10);
///
/// assert_eq!(digits.parse("0"), Ok("0".to_string()));
/// assert_eq!(digits.parse("1"), Ok("1".to_string()));
/// assert_eq!(digits.parse("01234"), Ok("01234".to_string()));
/// assert_eq!(digits.parse("98345"), Ok("98345".to_string()));
/// // A string of zeroes is still valid. Use `int` if this is not desirable.
/// assert_eq!(digits.parse("0000"), Ok("0000".to_string()));
/// assert!(digits.parse("").is_err());
/// ```
#[must_use]
pub fn digits<C: Character, E: Error<C>>(
radix: u32,
) -> impl Parser<C, C::Collection, Error = E> + Copy + Clone {
filter(move |c: &C| c.is_digit(radix))
.repeated()
.at_least(1)
.collect()
}
/// A parser that accepts a non-negative integer.
///
/// An integer is defined as a non-empty sequence of ASCII digits, where the first digit is non-zero or the sequence
/// has length one.
///
/// The output type of this parser is [`Character::Collection`] (i.e: [`String`] when `C` is [`char`], and [`Vec<u8>`]
/// when `C` is [`u8`]).
///
/// The `radix` parameter functions identically to [`char::is_digit`]. If in doubt, choose `10`.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let dec = text::int::<_, Simple<char>>(10)
/// .then_ignore(end());
///
/// assert_eq!(dec.parse("0"), Ok("0".to_string()));
/// assert_eq!(dec.parse("1"), Ok("1".to_string()));
/// assert_eq!(dec.parse("1452"), Ok("1452".to_string()));
/// // No leading zeroes are permitted!
/// assert!(dec.parse("04").is_err());
///
/// let hex = text::int::<_, Simple<char>>(16)
/// .then_ignore(end());
///
/// assert_eq!(hex.parse("2A"), Ok("2A".to_string()));
/// assert_eq!(hex.parse("d"), Ok("d".to_string()));
/// assert_eq!(hex.parse("b4"), Ok("b4".to_string()));
/// assert!(hex.parse("0B").is_err());
/// ```
#[must_use]
pub fn int<C: Character, E: Error<C>>(
radix: u32,
) -> impl Parser<C, C::Collection, Error = E> + Copy + Clone {
filter(move |c: &C| c.is_digit(radix) && c != &C::digit_zero())
.map(Some)
.chain::<C, Vec<_>, _>(filter(move |c: &C| c.is_digit(radix)).repeated())
.collect()
.or(just(C::digit_zero()).map(|c| core::iter::once(c).collect()))
}
/// A parser that accepts a C-style identifier.
///
/// The output type of this parser is [`Character::Collection`] (i.e: [`String`] when `C` is [`char`], and [`Vec<u8>`]
/// when `C` is [`u8`]).
///
/// An identifier is defined as an ASCII alphabetic character or an underscore followed by any number of alphanumeric
/// characters or underscores. The regex pattern for it is `[a-zA-Z_][a-zA-Z0-9_]*`.
#[must_use]
pub fn ident<C: Character, E: Error<C>>() -> impl Parser<C, C::Collection, Error = E> + Copy + Clone
{
filter(|c: &C| c.to_char().is_ascii_alphabetic() || c.to_char() == '_')
.map(Some)
.chain::<C, Vec<_>, _>(
filter(|c: &C| c.to_char().is_ascii_alphanumeric() || c.to_char() == '_').repeated(),
)
.collect()
}
/// Like [`ident`], but only accepts an exact identifier while ignoring trailing identifier characters.
///
/// The output type of this parser is `()`.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let def = text::keyword::<_, _, Simple<char>>("def");
///
/// // Exactly 'def' was found
/// assert_eq!(def.parse("def"), Ok(()));
/// // Exactly 'def' was found, with non-identifier trailing characters
/// assert_eq!(def.parse("def(foo, bar)"), Ok(()));
/// // 'def' was found, but only as part of a larger identifier, so this fails to parse
/// assert!(def.parse("define").is_err());
/// ```
#[must_use]
pub fn keyword<'a, C: Character + 'a, S: AsRef<C::Str> + 'a + Clone, E: Error<C> + 'a>(
keyword: S,
) -> impl Parser<C, (), Error = E> + Clone + 'a {
// TODO: use .filter(...), improve error messages
ident().try_map(move |s: C::Collection, span| {
if s.as_ref() == keyword.as_ref() {
Ok(())
} else {
Err(E::expected_input_found(span, None, None))
}
})
}
/// A parser that consumes text and generates tokens using semantic whitespace rules and the given token parser.
///
/// Also required is a function that collects a [`Vec`] of tokens into a whitespace-indicated token tree.
#[must_use]
pub fn semantic_indentation<'a, C, Tok, T, F, E: Error<C> + 'a>(
token: T,
make_group: F,
) -> impl Parser<C, Vec<Tok>, Error = E> + Clone + 'a
where
C: Character + 'a,
Tok: 'a,
T: Parser<C, Tok, Error = E> + Clone + 'a,
F: Fn(Vec<Tok>, E::Span) -> Tok + Clone + 'a,
{
let line_ws = filter(|c: &C| c.is_inline_whitespace());
let line = token.padded_by(line_ws.ignored().repeated()).repeated();
let lines = line_ws
.repeated()
.then(line.map_with_span(|line, span| (line, span)))
.separated_by(newline())
.padded();
lines.map(move |lines| {
fn collapse<C, Tok, F, S>(
mut tree: Vec<(Vec<C>, Vec<Tok>, Option<S>)>,
make_group: &F,
) -> Option<Tok>
where
F: Fn(Vec<Tok>, S) -> Tok,
{
while let Some((_, tts, line_span)) = tree.pop() {
let tt = make_group(tts, line_span?);
if let Some(last) = tree.last_mut() {
last.1.push(tt);
} else {
return Some(tt);
}
}
None
}
let mut nesting = vec![(Vec::new(), Vec::new(), None)];
for (indent, (mut line, line_span)) in lines {
let mut indent = indent.as_slice();
let mut i = 0;
while let Some(tail) = nesting
.get(i)
.and_then(|(n, _, _)| indent.strip_prefix(n.as_slice()))
{
indent = tail;
i += 1;
}
if let Some(tail) = collapse(nesting.split_off(i), &make_group) {
nesting.last_mut().unwrap().1.push(tail);
}
if !indent.is_empty() {
nesting.push((indent.to_vec(), line, Some(line_span)));
} else {
nesting.last_mut().unwrap().1.append(&mut line);
}
}
nesting.remove(0).1
})
}

853
vendor/chumsky/tutorial.md vendored Normal file
View File

@@ -0,0 +1,853 @@
# Chumsky: A Tutorial
*Please note that this tutorial is kept up to date with the `master` branch and not the most stable release: small
details may differ!*
In this tutorial, we'll develop a parser (and interpreter!) for a programming language called 'Foo'.
Foo is a small language, but it's enough for us to have some fun. It isn't
[Turing-complete](https://en.wikipedia.org/wiki/Turing_completeness), but it is complex enough to
allow us to get to grips with parsing using Chumsky, containing many of the elements you'd find in a 'real' programming
language. Here's some sample code written in Foo:
```
let seven = 7;
fn add x y = x + y;
add(2, 3) * -seven
```
By the end of this tutorial, you'll have an interpreter that will let you run this code, and more.
This tutorial should take somewhere between 30 and 100 minutes to complete, depending on factors such as knowledge of Rust and compiler theory.
*You can find the source code for the full interpreter in [`examples/foo.rs`](https://github.com/zesterer/chumsky/blob/master/examples/foo.rs) in the main repository.*
## Assumptions
This tutorial is here to show you how to use Chumsky: it's not a general-purpose introduction to language development as a whole. For that reason, we make a few assumptions about things you should know before jumping in:
- You should be happy reading and writing Rust. Particularly obscure syntax will be explained, but you should already be reasonably confident with concepts like functions, types, pattern matching, and error handling (`Result`, `?`, etc.).
- You should be familiar with data structures like trees and vectors.
- You should have some awareness of basic compiler theory concepts like [Abstract Syntax Trees (ASTs)](https://en.wikipedia.org/wiki/Abstract_syntax_tree), the difference between parsing and evaluation, [Backus Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form), etc.
## Documentation
As we go, we'll be encountering many functions and concepts from Chumsky. I strongly recommend you keep [Chumsky's documentation](https://docs.rs/chumsky/) open in another browser tab and use it to cross-reference your understanding or gain more insight into specific things that you'd like more clarification on. In particular, most of the functions we'll be using come from the [`Parser`](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html) trait. Chumsky's docs include extensive doc examples for almost every function, so be sure to make use of them!
Chumsky also has [several longer examples](https://github.com/zesterer/chumsky/tree/master/examples) in the main repository: looking at these may help improve your understanding if you get stuck.
## A note on imperative vs declarative parsers
If you've tried hand-writing a parser before, you're probably expecting lots of flow control: splitting text by whitespace, matching/switching/branching on things, making a decision about whether to recurse into a function or expect another token, etc. This is an [*imperative*](https://en.wikipedia.org/wiki/Imperative_programming) approach to parser development and can be very time-consuming to write, maintain, and test.
In contrast, Chumsky parsers are [*declarative*](https://en.wikipedia.org/wiki/Declarative_programming): they still perform intricate flow control internally, but it's all hidden away so you don't need to think of it. Instead of describing *how* to parse a particular grammar, Chumsky parsers simply *describe* a grammar: and it is then Chumsky's job to figure out how to efficiency parse it.
If you've ever seen [Backus Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) used to describe a language's syntax, you'll have a good sense of what this means: if you squint, you'll find that a lot of parsers written in Chumsky look pretty close to the BNF definition.
Another consequence of creating parsers in a declarative style is that *defining* a parser and *using* a parser are two different things: once created, parsers won't do anything on their own unless you give them an input to parse.
## Similarities between `Parser` and `Iterator`
The most important API in Chumsky is the [`Parser`](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html) trait, implemented by all parsers. Because parsers don't do anything by themselves, writing Chumsky parsers often feels very similar to writing iterators in Rust using the [`Iterator`](https://doc.rust-lang.org/std/iter/trait.Iterator.html) trait. If you've enjoyed writing iterators in Rust before, you'll hopefully find the same satisfaction writing parsers with Chumsky. They even [share](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html#method.map) [several](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html#method.flatten) [functions](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html#method.collect) with each other!
## Setting up
Create a new project with `cargo new --bin foo`, add the latest version of Chumsky as a dependency, and place
the following in your `main.rs`:
```rust
use chumsky::prelude::*;
fn main() {
let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap();
println!("{}", src);
}
```
This code has one purpose: it treats the first command-line argument as a path, reads the corresponding file,
then prints the contents to the terminal. We don't really care for handling IO errors in this tutorial, so `.unwrap()`
will suffice.
Create a file named `test.foo` and run `cargo run -- test.foo` (the `--` tells cargo to pass the remaining
arguments to the program instead of cargo itself). You should see that the contents of `test.foo`, if any, get
printed to the console.
Next, we'll create a data type that represents a program written in Foo. All programs in Foo are expressions,
so we'll call it `Expr`.
```rust
#[derive(Debug)]
enum Expr {
Num(f64),
Var(String),
Neg(Box<Expr>),
Add(Box<Expr>, Box<Expr>),
Sub(Box<Expr>, Box<Expr>),
Mul(Box<Expr>, Box<Expr>),
Div(Box<Expr>, Box<Expr>),
Call(String, Vec<Expr>),
Let {
name: String,
rhs: Box<Expr>,
then: Box<Expr>,
},
Fn {
name: String,
args: Vec<String>,
body: Box<Expr>,
then: Box<Expr>,
},
}
```
This is Foo's [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (AST). It represents
all possible Foo programs and is defined recursively in terms of itself (`Box` is used to avoid the type being
infinitely large). Each expression may itself contain sub-expressions.
As an example, the expression `let x = 5; x * 3` is encoded as follows using the `Expr` type:
```rs
Expr::Let {
name: "x",
rhs: Expr::Num(5.0),
then: Expr::Mul(
Expr::Var("x"),
Expr::Num(3.0),
),
}
```
The purpose of our parser will be to perform this conversion, from source code to AST.
We're also going to create a function that creates Foo's parser. Our parser takes in a `char` stream and
produces an `Expr`, so we'll use those types for the `I` (input) and `O` (output) type parameters.
```rust
fn parser() -> impl Parser<char, Expr, Error = Simple<char>> {
// To be filled in later...
}
```
The `Error` associated type allows us to customise the error type that Chumsky uses. For now, we'll stick to
`Simple<I>`, a built-in error type that does everything we need.
In `main`, we'll alter the `println!` as follows:
```rust
println!("{:?}", parser().parse(src));
```
## Parsing digits
Chumsky is a 'parser combinator' library. It allows the creation of parsers by combining together many smaller
parsers. The very smallest parsers are called 'primitives' and live in the
[`primitive`](https://docs.rs/chumsky/latest/chumsky/primitive/index.html) module.
We're going to want to start by parsing the simplest element of Foo's syntax: numbers.
```rust
// In `parser`...
filter(|c: &char| c.is_ascii_digit())
```
The `filter` primitive allows us to read a single input and accept it if it passes a condition. In our case,
that condition simply checks that the character is a digit.
If we compile this code now, we'll encounter an error. Why?
Although we promised that our parser would produce an `Expr`, the `filter` primitive only outputs the input
it found. Right now, all we have is a parser from `char` to `char` instead of a parser from `char` to `Expr`!
To solve this, we need to crack open the 'combinator' part of parser combinators. We'll use Chumsky's `map`
method to convert the output of the parser to an `Expr`. This method is very similar to its namesake on
`Iterator`.
```rust
filter(|c: &char| c.is_ascii_digit())
.map(|c| Expr::Num(c.to_digit(10).unwrap() as f64))
```
Here, we're converting the `char` digit to an `f64` (unwrapping is fine: `map` only gets applied to outputs
that successfully parsed!) and then wrapping it in `Expr::Num(_)` to convert it to a Foo expression.
Try running the code. You'll see that you can type a digit into `test.foo` and have our interpreter generate
an AST like so:
```
Ok(Num(5.0))
```
## Parsing numbers
If you're more than a little adventurous, you'll quickly notice that typing in a multi-digit number doesn't
quite behave as expected. Inputting `42` will only produce a `Num(4.0)` AST.
This is because `filter` only accepts a *single* input. But now another question arises: why did our interpreter
*not* complain at the trailing digits that didn't get parsed?
The answer is that Chumsky's parsers are *lazy*: they will consume all of the input that they can and then stop.
If there's any trailing input, it'll be ignored.
This is obviously not always desirable. If the user places random nonsense at the end of the file, we want to be
able to generate an error about it! Worse still, that 'nonsense' could be input the user intended to be part of
the program, but that contained a syntax error and so was not properly parsed. How can we force the parser to consume
all of the input?
To do this, we can make use of two new parsers: the `then_ignore` combinator and the `end` primitive.
```rust
filter(|c: &char| c.is_ascii_digit())
.map(|c| Expr::Num(c.to_digit(10).unwrap() as f64))
.then_ignore(end())
```
The `then_ignore` combinator parses a second pattern after the first, but ignores its output in favour of that of the
first.
The `end` primitive succeeds if it encounters only the end of input.
Combining these together, we now get an error for longer inputs. Unfortunately, this just reveals another problem
(particularly if you're working on a Unix-like platform): any whitespace before or after our digit will upset our
parser and trigger an error.
We can handle whitespace by adding a call to `padded_by` (which ignores a given pattern before and after the first)
after our digit parser, and a repeating filter for any whitespace characters.
```rust
filter(|c: &char| c.is_ascii_digit())
.map(|c| Expr::Num(c.to_digit(10).unwrap() as f64))
.padded_by(filter(|c: &char| c.is_whitespace()).repeated())
.then_ignore(end())
```
This example should have taught you a few important things about Chumsky's parsers:
1. Parsers are lazy: trailing input is ignored
2. Whitespace is not automatically ignored. Chumsky is a general-purpose parsing library, and some languages care very
much about the structure of whitespace, so Chumsky does too
## Cleaning up and taking shortcuts
At this point, things are starting to look a little messy. We've ended up writing 4 lines of code to properly parse a
single digit. Let's clean things up a bit. We'll also make use of a bunch of text-based parser primitives that
come with Chumsky to get rid of some of this cruft.
```rust
let int = text::int(10)
.map(|s: String| Expr::Num(s.parse().unwrap()))
.padded();
int.then_ignore(end())
```
That's better. We've also swapped out our custom digit parser with a built-in parser that parses any non-negative
integer.
## Evaluating simple expressions
We'll now take a diversion away from the parser to create a function that can evaluate our AST. This is the 'heart' of
our interpreter and is the thing that actually performs the computation of programs.
```rust
fn eval(expr: &Expr) -> Result<f64, String> {
match expr {
Expr::Num(x) => Ok(*x),
Expr::Neg(a) => Ok(-eval(a)?),
Expr::Add(a, b) => Ok(eval(a)? + eval(b)?),
Expr::Sub(a, b) => Ok(eval(a)? - eval(b)?),
Expr::Mul(a, b) => Ok(eval(a)? * eval(b)?),
Expr::Div(a, b) => Ok(eval(a)? / eval(b)?),
_ => todo!(), // We'll handle other cases later
}
}
```
This function might look scary at first glance, but there's not too much going on here: it just recursively calls
itself, evaluating each node of the AST, combining the results via operators, until it has a final result. Any runtime
errors simply get thrown back down the stack using `?`.
We'll also change our `main` function a little so that we can pass our AST to `eval`.
```rust
fn main() {
let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap();
match parser().parse(src) {
Ok(ast) => match eval(&ast) {
Ok(output) => println!("{}", output),
Err(eval_err) => println!("Evaluation error: {}", eval_err),
},
Err(parse_errs) => parse_errs
.into_iter()
.for_each(|e| println!("Parse error: {}", e)),
}
}
```
This looks like a big change, but it's mostly just an extension of the previous code to pass the AST on to `eval` if
parsing is successful. If unsuccessful, we just print the errors generated by the parser. Right now, none of our
operators can produce errors when evaluated, but this will change in the future so we make sure to handle them in
preparation.
## Parsing unary operators
Jumping back to our parser, let's handle unary operators. Currently, our only unary operator is `-`, the negation
operator. We're looking to parse any number of `-`, followed by a number. More formally:
```
expr = op* + int
```
We'll also give our `int` parser a new name, 'atom', for reasons that will become clear later.
```rust
let int = text::int(10)
.map(|s: String| Expr::Num(s.parse().unwrap()))
.padded();
let atom = int;
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.then(atom)
.foldr(|_op, rhs| Expr::Neg(Box::new(rhs)));
unary.then_ignore(end())
```
Here, we meet a few new combinators:
- `repeated` will parse a given pattern any number of times (including zero!), collecting the outputs into a `Vec`
- `then` will parse one pattern and then another immediately afterwards, collecting both outputs into a tuple pair
- `foldr` will take an output of the form `(Vec<T>, U)` and will fold it into a single `U` by repeatedly applying
the given function to each element of the `Vec<T>`
This last combinator is worth a little more consideration. We're trying to parse *any number* of negation operators,
followed by a single atom (for now, just a number). For example, the input `---42` would generate the following input to `foldr`:
```rust
(['-', '-', '-'], Num(42.0))
```
The `foldr` function repeatedly applies the function to 'fold' the elements into a single element, like so:
```rust
(['-', '-', '-'], Num(42.0))
--- --- --- ---------
| | | |
| | \ /
| | Neg(Num(42.0))
| | |
| \ /
| Neg(Neg(Num(42.0)))
| |
\ /
Neg(Neg(Neg(Num(42.0))))
```
This may be a little hard to conceptualise for those used to imperative programming, but for functional programmers
it should come naturally: `foldr` is just equivalent to `reduce`!
Give the interpreter a try. You'll be able to enter inputs as before, but also values like `-17`. You can even apply
the negation operator multiple times: `--9` will yield a value of `9` in the command line.
This is exciting: we've finally started to see our interpreter perform useful (sort of) computations!
## Parsing binary operators
Let's keep the momentum going and move over to binary operators. Traditionally, these pose quite a problem for
parsers. To parse an expression like `3 + 4 * 2`, it's necessary to understand that multiplication
[binds more eagerly than addition](https://en.wikipedia.org/wiki/Order_of_operations) and hence is applied first.
Therefore, the result of this expression is `11` and not `14`.
Parsers employ a range of strategies to handle these cases, but for Chumsky things are simple: the most eagerly binding
(highest 'precedence') operators should be those that get considered first when parsing.
It's worth noting that summation operators (`+` and `-`) are typically considered to have the *same* precedence as
one-another. The same also applies to product operators (`*` and `/`). For this reason, we treat each group as a single
pattern.
At each stage, we're looking for a simple pattern: a unary expression, following by any number of a combination of an
operator and a unary expression. More formally:
```
expr = unary + (op + unary)*
```
Let's expand our parser.
```rust
let int = text::int(10)
.map(|s: String| Expr::Num(s.parse().unwrap()))
.padded();
let atom = int;
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.then(atom)
.foldr(|_op, rhs| Expr::Neg(Box::new(rhs)));
let product = unary.clone()
.then(op('*').to(Expr::Mul as fn(_, _) -> _)
.or(op('/').to(Expr::Div as fn(_, _) -> _))
.then(unary)
.repeated())
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
let sum = product.clone()
.then(op('+').to(Expr::Add as fn(_, _) -> _)
.or(op('-').to(Expr::Sub as fn(_, _) -> _))
.then(product)
.repeated())
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
sum.then_ignore(end())
```
The `Expr::Mul as fn(_, _) -> _` syntax might look a little unfamiliar, but don't worry! In Rust,
[tuple enum variants are implicitly functions](https://stackoverflow.com/questions/54802045/what-is-this-strange-syntax-where-an-enum-variant-is-used-as-a-function).
All we're doing here is making sure that Rust treats each of them as if they had the same type using the `as` cast, and
then letting type inference do the rest. Those functions then get passed through the internals of the parser and end up
in `op` within the `foldl` call.
Another three combinators are introduced here:
- `or` attempts to parse a pattern and, if unsuccessful, instead attempts another pattern
- `to` is similar to `map`, but instead of mapping the output, entirely overrides the output with a new value. In our
case, we use it to convert each binary operator to a function that produces the relevant AST node for that operator.
- `foldl` is very similar to `foldr` in the last section but, instead of operating on a `(Vec<_>, _)`, it operates
upon a `(_, Vec<_>)`, going backwards to combine values together with the function
In a similar manner to `foldr` in the previous section on unary expressions, `foldl` is used to fold chains of binary
operators into a single expression tree. For example, the input `2 + 3 - 7 + 5` would generate the following input to
`foldl`:
```rust
(Num(2.0), [(Expr::Add, Num(3.0)), (Expr::Sub, Num(7.0)), (Add, Num(5.0))])
```
This then gets folded together by `foldl` like so:
```rust
(Num(2.0), [(Add, Num(3.0)), (Sub, Num(7.0)), (Add, Num(5.0))])
-------- --------------- -------------- ---------------
| | | |
\ / | |
Add(Num(2.0), Num(3.0)) | |
| | |
\ / |
Sub(Add(Num(2.0), Num(3.0)), Num(7.0)) |
| |
\ /
Add(Sub(Add(Num(2.0), Num(3.0)), Num(7.0)), Num(5.0))
```
Give the interpreter a try. You should find that it can correctly handle both unary and binary operations combined in
arbitrary configurations, correctly handling precedence. You can use it as a pocket calculator!
## Parsing parentheses
A new challenger approaches: *nested expressions*. Sometimes, we want to override the default operator precedence rules
entirely. We can do this by nesting expressions within parentheses, like `(3 + 4) * 2`. How do we handle this?
The creation of the `atom` pattern a few sections before was no accident: parentheses have a greater precedence than
any operator, so we should treat a parenthesised expression as if it were equivalent to a single value. We call things
that behave like single values 'atoms' by convention.
We're going to hoist our entire parser up into a closure, allowing us to define it in terms of itself.
```rust
recursive(|expr| {
let int = text::int(10)
.map(|s: String| Expr::Num(s.parse().unwrap()))
.padded();
let atom = int
.or(expr.delimited_by(just('('), just(')'))).padded();
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.then(atom)
.foldr(|_op, rhs| Expr::Neg(Box::new(rhs)));
let product = unary.clone()
.then(op('*').to(Expr::Mul as fn(_, _) -> _)
.or(op('/').to(Expr::Div as fn(_, _) -> _))
.then(unary)
.repeated())
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
let sum = product.clone()
.then(op('+').to(Expr::Add as fn(_, _) -> _)
.or(op('-').to(Expr::Sub as fn(_, _) -> _))
.then(product)
.repeated())
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
sum
})
.then_ignore(end())
```
There are a few things worth paying attention to here.
1. `recursive` allows us to define a parser recursively in terms of itself by giving us a copy of it within the
closure's scope
2. We use the recursive definition of `expr` within the definition of `atom`. We use the new `delimited_by` combinator
to allow it to sit nested within a pair of parentheses
3. The `then_ignore(end())` call has *not* been hoisted inside the `recursive` call. This is because we only want to
parse an end of input on the outermost expression, not at every level of nesting
Try running the interpreter. You'll find that it can handle a surprising number of cases elegantly. Make sure that the
following cases work correctly:
| Expression | Expected result |
|---------------|-----------------|
| `3 * 4 + 2` | `14` |
| `3 * (4 + 2)` | `18` |
| `-4 + 2` | `-2` |
| `-(4 + 2)` | `-6` |
## Parsing lets
Our next step is to handle `let`. Unlike Rust and other imperative languages, `let` in Foo is an expression and not an
statement (Foo has no statements) that takes the following form:
```
let <ident> = <expr>; <expr>
```
We only want `let`s to appear at the outermost level of the expression, so we leave it out of the original recursive
expression definition. However, we also want to be able to chain `let`s together, so we put them in their own recursive
definition. We call it `decl` ('declaration') because we're eventually going to be adding `fn` syntax too.
```rust
let ident = text::ident()
.padded();
let expr = recursive(|expr| {
let int = text::int(10)
.map(|s: String| Expr::Num(s.parse().unwrap()))
.padded();
let atom = int
.or(expr.delimited_by(just('('), just(')')))
.or(ident.map(Expr::Var));
let op = |c| just(c).padded();
let unary = op('-')
.repeated()
.then(atom)
.foldr(|_op, rhs| Expr::Neg(Box::new(rhs)));
let product = unary.clone()
.then(op('*').to(Expr::Mul as fn(_, _) -> _)
.or(op('/').to(Expr::Div as fn(_, _) -> _))
.then(unary)
.repeated())
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
let sum = product.clone()
.then(op('+').to(Expr::Add as fn(_, _) -> _)
.or(op('-').to(Expr::Sub as fn(_, _) -> _))
.then(product)
.repeated())
.foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs)));
sum
});
let decl = recursive(|decl| {
let r#let = text::keyword("let")
.ignore_then(ident)
.then_ignore(just('='))
.then(expr.clone())
.then_ignore(just(';'))
.then(decl)
.map(|((name, rhs), then)| Expr::Let {
name,
rhs: Box::new(rhs),
then: Box::new(then),
});
r#let
// Must be later in the chain than `r#let` to avoid ambiguity
.or(expr)
.padded()
});
decl
.then_ignore(end())
```
`keyword` is simply a parser that looks for an exact identifier (i.e: it doesn't match identifiers that only start with
a keyword).
Other than that, there's nothing in the definition of `r#let` that you haven't seen before: familiar combinators, but
combined in different ways. It selectively ignores parts of the syntax that we don't care about after validating that
it exists, then uses those elements that it does care about to create an `Expr::Let` AST node.
Another thing to note is that the definition of `ident` will parse `"let"`. To avoid the parser accidentally deciding
that `"let"` is a variable, we place `r#let` earlier in the or chain than `expr` so that it prioritises the correct
interpretation. As mentioned in previous sections, Chumsky handles ambiguity simply by choosing the first successful
parse it encounters, so making sure that we declare things in the right order can sometimes be important.
You should now be able to run the interpreter and have it accept an input such as
```
let five = 5;
five * 3
```
Unfortunately, the `eval` function will panic because we've not yet handled `Expr::Var` or `Expr::Let`. Let's do that
now.
```rust
fn eval<'a>(expr: &'a Expr, vars: &mut Vec<(&'a String, f64)>) -> Result<f64, String> {
match expr {
Expr::Num(x) => Ok(*x),
Expr::Neg(a) => Ok(-eval(a, vars)?),
Expr::Add(a, b) => Ok(eval(a, vars)? + eval(b, vars)?),
Expr::Sub(a, b) => Ok(eval(a, vars)? - eval(b, vars)?),
Expr::Mul(a, b) => Ok(eval(a, vars)? * eval(b, vars)?),
Expr::Div(a, b) => Ok(eval(a, vars)? / eval(b, vars)?),
Expr::Var(name) => if let Some((_, val)) = vars.iter().rev().find(|(var, _)| *var == name) {
Ok(*val)
} else {
Err(format!("Cannot find variable `{}` in scope", name))
},
Expr::Let { name, rhs, then } => {
let rhs = eval(rhs, vars)?;
vars.push((name, rhs));
let output = eval(then, vars);
vars.pop();
output
},
_ => todo!(),
}
}
```
Woo! That got a bit more complicated. Don't fear, there are only 3 important changes:
1. Because we need to keep track of variables that were previously defined, we use a `Vec` to remember them. Because
`eval` is a recursive function, we also need to pass is to all recursive calls.
2. When we encounter an `Expr::Let`, we first evaluate the right-hand side (`rhs`). Once evaluated, we push it to the
`vars` stack and evaluate the trailing `then` expression (i.e: all of the remaining code that appears after the
semicolon). Popping it afterwards is not *technically* necessary because Foo does not permit nested declarations,
but we do it anyway because it's good practice and it's what we'd want to do if we ever decided to add nesting.
3. When we encounter an `Expr::Var` (i.e: an inline variable) we search the stack *backwards* (because Foo permits
[variable shadowing](https://en.wikipedia.org/wiki/Variable_shadowing) and we only want to find the most recently
declared variable with the same name) to find the variables's value. If we can't find a variable of that name, we
generate a runtime error which gets propagated back up the stack.
Obviously, the signature of `eval` has changed so we'll update the call in `main` to become:
```rust
eval(&ast, &mut Vec::new())
```
Make sure to test the interpreter. Try experimenting with `let` declarations to make sure things aren't broken. In
particular, it's worth testing variable shadowing by ensuring that the following program produces `8`:
```
let x = 5;
let x = 3 + x;
x
```
## Parsing functions
We're almost at a complete implementation of Foo. There's just one thing left: *functions*.
Surprisingly, parsing functions is the easy part. All we need to modify is the definition of `decl` to add `r#fn`. It
looks very much like the existing definition of `r#let`:
```rust
let decl = recursive(|decl| {
let r#let = text::keyword("let")
.ignore_then(ident)
.then_ignore(just('='))
.then(expr.clone())
.then_ignore(just(';'))
.then(decl.clone())
.map(|((name, rhs), then)| Expr::Let {
name,
rhs: Box::new(rhs),
then: Box::new(then),
});
let r#fn = text::keyword("fn")
.ignore_then(ident)
.then(ident.repeated())
.then_ignore(just('='))
.then(expr.clone())
.then_ignore(just(';'))
.then(decl)
.map(|(((name, args), body), then)| Expr::Fn {
name,
args,
body: Box::new(body),
then: Box::new(then),
});
r#let
.or(r#fn)
.or(expr)
.padded()
});
```
There's nothing new here, you understand this all already.
Obviously, we also need to add support for *calling* functions by modifying `atom`:
```rust
let call = ident
.then(expr.clone()
.separated_by(just(','))
.allow_trailing() // Foo is Rust-like, so allow trailing commas to appear in arg lists
.delimited_by(just('('), just(')')))
.map(|(f, args)| Expr::Call(f, args));
let atom = int
.or(expr.delimited_by(just('('), just(')')))
.or(call)
.or(ident.map(Expr::Var));
```
The only new combinator here is `separated_by` which behaves like `repeated`, but requires a separator pattern between
each element. It has a method called `allow_trailing` which allows for parsing a trailing separator at the end of the
elements.
Next, we modify our `eval` function to support a function stack.
```rust
fn eval<'a>(
expr: &'a Expr,
vars: &mut Vec<(&'a String, f64)>,
funcs: &mut Vec<(&'a String, &'a [String], &'a Expr)>,
) -> Result<f64, String> {
match expr {
Expr::Num(x) => Ok(*x),
Expr::Neg(a) => Ok(-eval(a, vars, funcs)?),
Expr::Add(a, b) => Ok(eval(a, vars, funcs)? + eval(b, vars, funcs)?),
Expr::Sub(a, b) => Ok(eval(a, vars, funcs)? - eval(b, vars, funcs)?),
Expr::Mul(a, b) => Ok(eval(a, vars, funcs)? * eval(b, vars, funcs)?),
Expr::Div(a, b) => Ok(eval(a, vars, funcs)? / eval(b, vars, funcs)?),
Expr::Var(name) => if let Some((_, val)) = vars.iter().rev().find(|(var, _)| *var == name) {
Ok(*val)
} else {
Err(format!("Cannot find variable `{}` in scope", name))
},
Expr::Let { name, rhs, then } => {
let rhs = eval(rhs, vars, funcs)?;
vars.push((name, rhs));
let output = eval(then, vars, funcs);
vars.pop();
output
},
Expr::Call(name, args) => if let Some((_, arg_names, body)) = funcs
.iter()
.rev()
.find(|(var, _, _)| *var == name)
.copied()
{
if arg_names.len() == args.len() {
let mut args = args
.iter()
.map(|arg| eval(arg, vars, funcs))
.zip(arg_names.iter())
.map(|(val, name)| Ok((name, val?)))
.collect::<Result<_, String>>()?;
vars.append(&mut args);
let output = eval(body, vars, funcs);
vars.truncate(vars.len() - args.len());
output
} else {
Err(format!(
"Wrong number of arguments for function `{}`: expected {}, found {}",
name,
arg_names.len(),
args.len(),
))
}
} else {
Err(format!("Cannot find function `{}` in scope", name))
},
Expr::Fn { name, args, body, then } => {
funcs.push((name, args, body));
let output = eval(then, vars, funcs);
funcs.pop();
output
},
}
}
```
Another big change! On closer inspection, however, this looks a lot like the change we made previously when we added
support for `let` declarations. Whenever we encounter an `Expr::Fn`, we just push the function to the `funcs` stack and
continue. Whenever we encounter an `Expr::Call`, we search the function stack backwards, as we did for variables, and
then execute the body of the function (making sure to evaluate and push the arguments!).
As before, we'll need to change the `eval` call in `main` to:
```rust
eval(&ast, &mut Vec::new(), &mut Vec::new())
```
Give the interpreter a test - see what you can do with it! Here's an example program to get you started:
```
let five = 5;
let eight = 3 + five;
fn add x y = x + y;
add(five, eight)
```
## Conclusion
Here ends our exploration into Chumsky's API. We only scratched the surface of what Chumsky can do, but now you'll need
to rely on the examples in the repository and the API doc examples for further help. Nonetheless, I hope it was an
interesting foray into the use of parser combinators for the development of parsers.
If nothing else, you've now got a neat little calculator language to play with.
Interestingly, there is a subtle bug in Foo's `eval` function that produces unexpected scoping behaviour with function
calls. I'll leave finding it as an exercise for the reader.
## Extension tasks
- Find the interesting function scoping bug and consider how it could be fixed
- Split token lexing into a separate compilation stage to avoid the need for `.padded()` in the parser
- Add more operators
- Add an `if <expr> then <expr> else <expr>` ternary operator
- Add values of different types by turning `f64` into an `enum`
- Add lambdas to the language
- Format the error message in a more useful way, perhaps by providing a reference to the original code