chumsky-0.9.3/.cargo_vcs_info.json0000644000000001360000000000100125300ustar { "git": { "sha1": "5101cc86a8568a6d33743145e5e8bd0b194332b8" }, "path_in_vcs": "" }chumsky-0.9.3/.github/FUNDING.yml000064400000000000000000000000231046102023000144700ustar 00000000000000github: [zesterer] chumsky-0.9.3/.github/workflows/rust.yml000064400000000000000000000015031046102023000164340ustar 00000000000000name: Rust on: push: branches: [ master ] pull_request: branches: [ master ] env: CARGO_TERM_COLOR: always jobs: check: name: Check Chumsky runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install latest nightly uses: dtolnay/rust-toolchain@master with: toolchain: stable components: rustfmt, clippy - name: Run cargo check run: cargo check --verbose --no-default-features test: name: Test Chumsky runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install latest nightly uses: dtolnay/rust-toolchain@master with: toolchain: nightly components: rustfmt, clippy - name: Run cargo check run: cargo test --verbose --all-features chumsky-0.9.3/.gitignore000064400000000000000000000000551046102023000133100ustar 00000000000000/target Cargo.lock flamegraph.svg perf.data* chumsky-0.9.3/CHANGELOG.md000064400000000000000000000143401046102023000131330ustar 00000000000000# Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). # Unreleased ### Added ### Removed ### Changed ### Fixed # [0.9.2] - 2023-03-02 ### Fixed - Properly fixed `skip_then_retry_until` regression # [0.9.1] - 2023-03-02 ### Fixed - Regression in `skip_then_retry_until` recovery strategy # [0.9.0] - 2023-02-07 ### Added - A `spill-stack` feature that uses `stacker` to avoid stack overflow errors for deeply recursive parsers - The ability to access the token span when using `select!` like `select! { |span| Token::Num(x) => (x, span) }` - Added a `skip_parser` recovery strategy that allows you to implement your own recovery strategies in terms of other parsers. For example, `.recover_with(skip_parser(take_until(just(';'))))` skips tokens until after the next semicolon - A `not` combinator that consumes a single token if it is *not* the start of a given pattern. For example, `just("\\n").or(just('"')).not()` matches any `char` that is not either the final quote of a string, and is not the start of a newline escape sequence - A `semantic_indentation` parser for parsing indentation-sensitive languages. Note that this is likely to be deprecated/removed in the future in favour of a more powerful solution - `#[must_use]` attribute for parsers to ensure that they're not accidentally created without being used - `Option>` and `Vec>` now implement `Chain` and `Option` implements `Chain` - `choice` now supports both arrays and vectors of parsers in addition to tuples - The `Simple` error type now implements `Eq` ### Changed - `text::whitespace` returns a `Repeated` instead of an `impl Parser`, allowing you to call methods like `at_least` and `exactly` on it. - Improved `no_std` support - Improved examples and documentation - Use zero-width spans for EoI by default - Don't allow defining a recursive parser more than once - Various minor bug fixes - Improved `Display` implementations for various built-in error types and `SimpleReason` - Use an `OrderedContainer` trait to avoid unexpected behaviour for unordered containers in combination with `just` ### Fixed - Made several parsers (`todo`, `unwrapped`, etc.) more useful by reporting the parser's location on panic - Boxing a parser that is already boxed just gives you the original parser to avoid double indirection - Improved compilation speeds # [0.8.0] - 2022-02-07 ### Added - `then_with` combinator to allow limited support for parsing nested patterns - impl From<&[T; N]> for Stream - `SkipUntil/SkipThenRetryUntil::skip_start/consume_end` for more precise control over skip-based recovery ### Changed - Allowed `Validate` to map the output type - Switched to zero-size End Of Input spans for default implementations of `Stream` - Made `delimited_by` take combinators instead of specific tokens - Minor optimisations - Documentation improvements ### Fixed - Compilation error with `--no-default-features` - Made default behaviour of `skip_until` more sensible # [0.7.0] - 2021-12-16 ### Added - A new [tutorial](tutorial.md) to help new users - `select` macro, a wrapper over `filter_map` that makes extracting data from specific tokens easy - `choice` parser, a better alternative to long `or` chains (which sometimes have poor compilation performance) - `todo` parser, that panics when used (but not when created) (akin to Rust's `todo!` macro, but for parsers) - `keyword` parser, that parses *exact* identifiers - `from_str` combinator to allow converting a pattern to a value inline, using `std::str::FromStr` - `unwrapped` combinator, to automatically unwrap an output value inline - `rewind` combinator, that allows reverting the input stream on success. It's most useful when requiring that a pattern is followed by some terminating pattern without the first parser greedily consuming it - `map_err_with_span` combinator, to allow fetching the span of the input that was parsed by a parser before an error was encountered - `or_else` combinator, to allow processing and potentially recovering from a parser error - `SeparatedBy::at_most` to require that a separated pattern appear at most a specific number of times - `SeparatedBy::exactly` to require that a separated pattern be repeated exactly a specific number of times - `Repeated::exactly` to require that a pattern be repeated exactly a specific number of times - More trait implementations for various things, making the crate more useful ### Changed - Made `just`, `one_of`, and `none_of` significant more useful. They can now accept strings, arrays, slices, vectors, sets, or just single tokens as before - Added the return type of each parser to its documentation - More explicit documentation of parser behaviour - More doc examples - Deprecated `seq` (`just` has been generalised and can now be used to parse specific input sequences) - Sealed the `Character` trait so that future changes are not breaking - Sealed the `Chain` trait and made it more powerful - Moved trait constraints on `Parser` to where clauses for improved readability ### Fixed - Fixed a subtle bug that allowed `separated_by` to parse an extra trailing separator when it shouldn't - Filled a 'hole' in the `Error` trait's API that conflated a lack of expected tokens with expectation of end of input - Made recursive parsers use weak reference-counting to avoid memory leaks # [0.6.0] - 2021-11-22 ### Added - `skip_until` error recovery strategy - `SeparatedBy::at_least` and `SeparatedBy::at_most` for parsing a specific number of separated items - `Parser::validate` for integrated AST validation - `Recursive::declare` and `Recursive::define` for more precise control over recursive declarations ### Changed - Improved `separated_by` error messages - Improved documentation - Hid a new (probably) unused implementation details # [0.5.0] - 2021-10-30 ### Added - `take_until` primitive ### Changed - Added span to fallback output function in `nested_delimiters` # [0.4.0] - 2021-10-28 ### Added - Support for LL(k) parsing - Custom error recovery strategies - Debug mode - Nested input flattening ### Changed - Radically improved error quality chumsky-0.9.3/Cargo.lock0000644000000144220000000000100105060ustar # This file is automatically @generated by Cargo. # It is not intended for manual editing. version = 3 [[package]] name = "ahash" version = "0.8.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "cd7d5a2cecb58716e47d67d5703a249964b14c7be1ec3cad3affc295b2d1c35d" dependencies = [ "cfg-if", "once_cell", "version_check", "zerocopy", ] [[package]] name = "allocator-api2" version = "0.2.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0942ffc6dcaadf03badf6e6a2d0228460359d5e34b57ccdc720b7382dfbd5ec5" [[package]] name = "ariadne" version = "0.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "72fe02fc62033df9ba41cba57ee19acf5e742511a140c7dbc3a873e19a19a1bd" dependencies = [ "unicode-width", "yansi", ] [[package]] name = "bstr" version = "1.7.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c79ad7fb2dd38f3dabd76b09c6a5a20c038fc0213ef1e9afd30eb777f120f019" dependencies = [ "memchr", "regex-automata", "serde", ] [[package]] name = "cc" version = "1.0.83" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f1174fb0b6ec23863f8b971027804a42614e347eafb0a95bf0b12cdae21fc4d0" dependencies = [ "libc", ] [[package]] name = "cfg-if" version = "1.0.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd" [[package]] name = "chumsky" version = "0.9.3" dependencies = [ "ariadne", "hashbrown", "pom", "stacker", ] [[package]] name = "hashbrown" version = "0.14.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f93e7192158dbcda357bdec5fb5788eebf8bbac027f3f33e719d29135ae84156" dependencies = [ "ahash", "allocator-api2", ] [[package]] name = "libc" version = "0.2.149" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a08173bc88b7955d1b3145aa561539096c421ac8debde8cbc3612ec635fee29b" [[package]] name = "memchr" version = "2.6.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f665ee40bc4a3c5590afb1e9677db74a508659dfd71e126420da8274909a0167" [[package]] name = "once_cell" version = "1.18.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "dd8b5dd2ae5ed71462c540258bedcb51965123ad7e7ccf4b9a8cafaa4a63576d" [[package]] name = "pom" version = "3.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5c2d73a5fe10d458e77534589512104e5aa8ac480aa9ac30b74563274235cce4" dependencies = [ "bstr", ] [[package]] name = "proc-macro2" version = "1.0.69" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "134c189feb4956b20f6f547d2cf727d4c0fe06722b20a0eec87ed445a97f92da" dependencies = [ "unicode-ident", ] [[package]] name = "psm" version = "0.1.21" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5787f7cda34e3033a72192c018bc5883100330f362ef279a8cbccfce8bb4e874" dependencies = [ "cc", ] [[package]] name = "quote" version = "1.0.33" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5267fca4496028628a95160fc423a33e8b2e6af8a5302579e322e4b520293cae" dependencies = [ "proc-macro2", ] [[package]] name = "regex-automata" version = "0.4.3" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5f804c7828047e88b2d32e2d7fe5a105da8ee3264f01902f796c8e067dc2483f" [[package]] name = "serde" version = "1.0.189" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8e422a44e74ad4001bdc8eede9a4570ab52f71190e9c076d14369f38b9200537" dependencies = [ "serde_derive", ] [[package]] name = "serde_derive" version = "1.0.189" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "1e48d1f918009ce3145511378cf68d613e3b3d9137d67272562080d68a2b32d5" dependencies = [ "proc-macro2", "quote", "syn", ] [[package]] name = "stacker" version = "0.1.15" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c886bd4480155fd3ef527d45e9ac8dd7118a898a46530b7b94c3e21866259fce" dependencies = [ "cc", "cfg-if", "libc", "psm", "winapi", ] [[package]] name = "syn" version = "2.0.38" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e96b79aaa137db8f61e26363a0c9b47d8b4ec75da28b7d1d614c2303e232408b" dependencies = [ "proc-macro2", "quote", "unicode-ident", ] [[package]] name = "unicode-ident" version = "1.0.12" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3354b9ac3fae1ff6755cb6db53683adb661634f67557942dea4facebec0fee4b" [[package]] name = "unicode-width" version = "0.1.11" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e51733f11c9c4f72aa0c160008246859e340b00807569a0da0e7a1079b27ba85" [[package]] name = "version_check" version = "0.9.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "49874b5167b65d7193b8aba1567f5c7d93d001cafc34600cee003eda787e483f" [[package]] name = "winapi" version = "0.3.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419" dependencies = [ "winapi-i686-pc-windows-gnu", "winapi-x86_64-pc-windows-gnu", ] [[package]] name = "winapi-i686-pc-windows-gnu" version = "0.4.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6" [[package]] name = "winapi-x86_64-pc-windows-gnu" version = "0.4.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f" [[package]] name = "yansi" version = "0.5.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "09041cd90cf85f7f8b2df60c646f853b7f535ce68f85244eb6731cf89fa498ec" [[package]] name = "zerocopy" version = "0.7.11" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "4c19fae0c8a9efc6a8281f2e623db8af1db9e57852e04cde3e754dd2dc29340f" dependencies = [ "zerocopy-derive", ] [[package]] name = "zerocopy-derive" version = "0.7.11" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "fc56589e9ddd1f1c28d4b4b5c773ce232910a6bb67a70133d61c9e347585efe9" dependencies = [ "proc-macro2", "quote", "syn", ] chumsky-0.9.3/Cargo.toml0000644000000023730000000000100105330ustar # THIS FILE IS AUTOMATICALLY GENERATED BY CARGO # # When uploading crates to the registry Cargo will automatically # "normalize" Cargo.toml files for maximal compatibility # with all versions of Cargo and also rewrite `path` dependencies # to registry (e.g., crates.io) dependencies. # # If you are reading this file be aware that the original Cargo.toml # will likely look very different (and much more reasonable). # See Cargo.toml.orig for the original contents. [package] edition = "2018" name = "chumsky" version = "0.9.3" authors = ["Joshua Barretto "] exclude = [ "/misc/*", "/benches/*", ] description = "A parser library for humans with powerful error recovery" readme = "README.md" keywords = [ "parser", "combinator", "token", "language", "syntax", ] categories = [ "parsing", "text-processing", ] license = "MIT" repository = "https://github.com/zesterer/chumsky" [dependencies.hashbrown] version = "0.14.2" [dependencies.stacker] version = "0.1" optional = true [dev-dependencies.ariadne] version = "0.3.0" [dev-dependencies.pom] version = "3.0" [features] ahash = [] default = [ "ahash", "std", "spill-stack", ] nightly = [] spill-stack = [ "stacker", "std", ] std = [] chumsky-0.9.3/Cargo.toml.orig000064400000000000000000000021721046102023000142110ustar 00000000000000[package] name = "chumsky" version = "0.9.3" description = "A parser library for humans with powerful error recovery" authors = ["Joshua Barretto "] repository = "https://github.com/zesterer/chumsky" license = "MIT" keywords = ["parser", "combinator", "token", "language", "syntax"] categories = ["parsing", "text-processing"] edition = "2018" exclude = ["/misc/*", "/benches/*"] [features] default = ["ahash", "std", "spill-stack"] # Use `ahash` instead of the standard hasher for maintaining sets of expected inputs # (Also used if `std` is disabled) ahash = [] # Integrate with the standard library std = [] # Enable nightly-only features like better compiler diagnostics nightly = [] # Allows deeper recursion by dynamically spilling stack state on to the heap spill-stack = ["stacker", "std"] [dependencies] # Used if `std` is disabled. # Provides `ahash` for the corresponding feature as it uses it by default. # Due to https://github.com/rust-lang/cargo/issues/1839, this can't be optional hashbrown = "0.14.2" stacker = { version = "0.1", optional = true } [dev-dependencies] ariadne = "0.3.0" pom = "3.0" chumsky-0.9.3/LICENSE000064400000000000000000000020721046102023000123260ustar 00000000000000The MIT License (MIT) Copyright (c) 2021 Joshua Barretto Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. chumsky-0.9.3/README.md000064400000000000000000000222731046102023000126050ustar 00000000000000# Chumsky [![crates.io](https://img.shields.io/crates/v/chumsky.svg)](https://crates.io/crates/chumsky) [![crates.io](https://docs.rs/chumsky/badge.svg)](https://docs.rs/chumsky) [![License](https://img.shields.io/crates/l/chumsky.svg)](https://github.com/zesterer/chumsky) [![actions-badge](https://github.com/zesterer/chumsky/workflows/Rust/badge.svg?branch=master)](https://github.com/zesterer/chumsky/actions) A parser library for humans with powerful error recovery. Example usage with my own language, Tao *Note: Error diagnostic rendering is performed by [Ariadne](https://github.com/zesterer/ariadne)* ## Contents - [Features](#features) - [Example Brainfuck Parser](#example-brainfuck-parser) - [Tutorial](#tutorial) - [*What* is a parser combinator?](#what-is-a-parser-combinator) - [*Why* use parser combinators?](#why-use-parser-combinators) - [Classification](#classification) - [Error Recovery](#error-recovery) - [Performance](#performance) - [Planned Features](#planned-features) - [Philosophy](#philosophy) - [Notes](#notes) - [License](#license) ## Features - Lots of combinators! - Generic across input, output, error, and span types - Powerful error recovery strategies - Inline mapping to your AST - Text-specific parsers for both `u8`s and `char`s - Recursive parsers - Backtracking is fully supported, allowing the parsing of all known context-free grammars - Parsing of nesting inputs, allowing you to move delimiter parsing to the lexical stage (as Rust does!) - Built-in parser debugging ## Example [Brainfuck](https://en.wikipedia.org/wiki/Brainfuck) Parser See [`examples/brainfuck.rs`](https://github.com/zesterer/chumsky/blob/master/examples/brainfuck.rs) for the full interpreter (`cargo run --example brainfuck -- examples/sample.bf`). ```rust use chumsky::prelude::*; #[derive(Clone)] enum Instr { Left, Right, Incr, Decr, Read, Write, Loop(Vec), } fn parser() -> impl Parser, Error = Simple> { recursive(|bf| choice(( just('<').to(Instr::Left), just('>').to(Instr::Right), just('+').to(Instr::Incr), just('-').to(Instr::Decr), just(',').to(Instr::Read), just('.').to(Instr::Write), bf.delimited_by(just('['), just(']')).map(Instr::Loop), )) .repeated()) } ``` Other examples include: - A [JSON parser](https://github.com/zesterer/chumsky/blob/master/examples/json.rs) (`cargo run --example json -- examples/sample.json`) - An [interpreter for a simple Rust-y language](https://github.com/zesterer/chumsky/blob/master/examples/nano_rust.rs) (`cargo run --example nano_rust -- examples/sample.nrs`) ## Tutorial Chumsky has [a tutorial](https://github.com/zesterer/chumsky/blob/master/tutorial.md) that teaches you how to write a parser and interpreter for a simple dynamic language with unary and binary operators, operator precedence, functions, let declarations, and calls. ## *What* is a parser combinator? Parser combinators are a technique for implementing parsers by defining them in terms of other parsers. The resulting parsers use a [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser) strategy to transform a stream of tokens into an output. Using parser combinators to define parsers is roughly analogous to using Rust's [`Iterator`](https://doc.rust-lang.org/std/iter/trait.Iterator.html) trait to define iterative algorithms: the type-driven API of `Iterator` makes it more difficult to make mistakes and easier to encode complicated iteration logic than if one were to write the same code by hand. The same is true of parser combinators. ## *Why* use parser combinators? Writing parsers with good error recovery is conceptually difficult and time-consuming. It requires understanding the intricacies of the recursive descent algorithm, and then implementing recovery strategies on top of it. If you're developing a programming language, you'll almost certainly change your mind about syntax in the process, leading to some slow and painful parser refactoring. Parser combinators solve both problems by providing an ergonomic API that allows for rapidly iterating upon a syntax. Parser combinators are also a great fit for domain-specific languages for which an existing parser does not exist. Writing a reliable, fault-tolerant parser for such situations can go from being a multi-day task to a half-hour task with the help of a decent parser combinator library. ## Classification Chumsky's parsers are [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser) parsers and are capable of parsing [parsing expression grammars (PEGs)](https://en.wikipedia.org/wiki/Parsing_expression_grammar), which includes all known context-free languages. It is theoretically possible to extend Chumsky further to accept limited context-sensitive grammars too, although this is rarely required. ## Error Recovery Chumsky has support for error recovery, meaning that it can encounter a syntax error, report the error, and then attempt to recover itself into a state in which it can continue parsing so that multiple errors can be produced at once and a partial [AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree) can still be generated from the input for future compilation stages to consume. However, there is no silver bullet strategy for error recovery. By definition, if the input to a parser is invalid then the parser can only make educated guesses as to the meaning of the input. Different recovery strategies will work better for different languages, and for different patterns within those languages. Chumsky provides a variety of recovery strategies (each implementing the `Strategy` trait), but it's important to understand that all of - which you apply - where you apply them - what order you apply them will greatly affect the quality of the errors that Chumsky is able to produce, along with the extent to which it is able to recover a useful AST. Where possible, you should attempt more 'specific' recovery strategies first rather than those that mindlessly skip large swathes of the input. It is recommended that you experiment with applying different strategies in different situations and at different levels of the parser to find a configuration that you are happy with. If none of the provided error recovery strategies cover the specific pattern you wish to catch, you can even create your own by digging into Chumsky's internals and implementing your own strategies! If you come up with a useful strategy, feel free to open a PR against the [main repository](https://github.com/zesterer/chumsky/)! ## Performance Chumsky focuses on high-quality errors and ergonomics over performance. That said, it's important that Chumsky can keep up with the rest of your compiler! Unfortunately, it's *extremely* difficult to come up with sensible benchmarks given that exactly how Chumsky performs depends entirely on what you are parsing, how you structure your parser, which patterns the parser attempts to match first, how complex your error type is, what is involved in constructing your AST, etc. All that said, here are some numbers from the [JSON benchmark](https://github.com/zesterer/chumsky/blob/master/benches/json.rs) included in the repository running on my Ryzen 7 3700x. ```ignore test chumsky ... bench: 4,782,390 ns/iter (+/- 997,208) test pom ... bench: 12,793,490 ns/iter (+/- 1,954,583) ``` I've included results from [`pom`](https://github.com/J-F-Liu/pom), another parser combinator crate with a similar design, as a point of reference. The sample file being parsed is broadly representative of typical JSON data and has 3,018 lines. This translates to a little over 630,000 lines of JSON per second. Clearly, this is a little slower than a well-optimised hand-written parser: but that's okay! Chumsky's goal is to be *fast enough*. If you've written enough code in your language that parsing performance even starts to be a problem, you've already committed enough time and resources to your language that hand-writing a parser is the best choice going! ## Planned Features - An optimised 'happy path' parser mode that skips error recovery & error generation - An even faster 'validation' parser mode, guaranteed to not allocate, that doesn't generate outputs but just verifies the validity of an input ## Philosophy Chumsky should: - Be easy to use, even if you don't understand exactly what the parser is doing under the hood - Be type-driven, pushing users away from anti-patterns at compile-time - Be a mature, 'batteries-included' solution for context-free parsing by default. If you need to implement either `Parser` or `Strategy` by hand, that's a problem that needs fixing - Be 'fast enough', but no faster (i.e: when there is a tradeoff between error quality and performance, Chumsky will always take the former option) - Be modular and extensible, allowing users to implement their own parsers, recovery strategies, error types, spans, and be generic over both input tokens and the output AST ## Notes My apologies to Noam for choosing such an absurd name. ## License Chumsky is licensed under the MIT license (see `LICENSE` in the main repository). chumsky-0.9.3/examples/brainfuck.rs000064400000000000000000000037411046102023000154550ustar 00000000000000//! This is a Brainfuck parser and interpreter //! Run it with the following command: //! cargo run --example brainfuck -- examples/sample.bf use chumsky::prelude::*; use std::{ env, fs, io::{self, Read}, }; #[derive(Clone)] enum Instr { Invalid, Left, Right, Incr, Decr, Read, Write, Loop(Vec), } fn parser() -> impl Parser, Error = Simple> { use Instr::*; recursive(|bf| { choice(( just('<').to(Left), just('>').to(Right), just('+').to(Incr), just('-').to(Decr), just(',').to(Read), just('.').to(Write), )) .or(bf.delimited_by(just('['), just(']')).map(Loop)) .recover_with(nested_delimiters('[', ']', [], |_| Invalid)) .recover_with(skip_then_retry_until([']'])) .repeated() }) .then_ignore(end()) } const TAPE_LEN: usize = 10_000; fn execute(ast: &[Instr], ptr: &mut usize, tape: &mut [u8; TAPE_LEN]) { use Instr::*; for symbol in ast { match symbol { Invalid => unreachable!(), Left => *ptr = (*ptr + TAPE_LEN - 1).rem_euclid(TAPE_LEN), Right => *ptr = (*ptr + 1).rem_euclid(TAPE_LEN), Incr => tape[*ptr] = tape[*ptr].wrapping_add(1), Decr => tape[*ptr] = tape[*ptr].wrapping_sub(1), Read => tape[*ptr] = io::stdin().bytes().next().unwrap().unwrap(), Write => print!("{}", tape[*ptr] as char), Loop(ast) => { while tape[*ptr] != 0 { execute(ast, ptr, tape) } } } } } fn main() { let src = fs::read_to_string(env::args().nth(1).expect("Expected file argument")) .expect("Failed to read file"); // let src = "[!]+"; match parser().parse(src.trim()) { Ok(ast) => execute(&ast, &mut 0, &mut [0; TAPE_LEN]), Err(errs) => errs.into_iter().for_each(|e| println!("{:?}", e)), } } chumsky-0.9.3/examples/foo.rs000064400000000000000000000136031046102023000142720ustar 00000000000000/// This is the parser and interpreter for the 'Foo' language. See `tutorial.md` in the repository's root to learn /// about it. use chumsky::prelude::*; #[derive(Debug)] enum Expr { Num(f64), Var(String), Neg(Box), Add(Box, Box), Sub(Box, Box), Mul(Box, Box), Div(Box, Box), Call(String, Vec), Let { name: String, rhs: Box, then: Box, }, Fn { name: String, args: Vec, body: Box, then: Box, }, } fn parser() -> impl Parser> { let ident = text::ident().padded(); let expr = recursive(|expr| { let int = text::int(10) .map(|s: String| Expr::Num(s.parse().unwrap())) .padded(); let call = ident .then( expr.clone() .separated_by(just(',')) .allow_trailing() .delimited_by(just('('), just(')')), ) .map(|(f, args)| Expr::Call(f, args)); let atom = int .or(expr.delimited_by(just('('), just(')'))) .or(call) .or(ident.map(Expr::Var)); let op = |c| just(c).padded(); let unary = op('-') .repeated() .then(atom) .foldr(|_op, rhs| Expr::Neg(Box::new(rhs))); let product = unary .clone() .then( op('*') .to(Expr::Mul as fn(_, _) -> _) .or(op('/').to(Expr::Div as fn(_, _) -> _)) .then(unary) .repeated(), ) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); let sum = product .clone() .then( op('+') .to(Expr::Add as fn(_, _) -> _) .or(op('-').to(Expr::Sub as fn(_, _) -> _)) .then(product) .repeated(), ) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); sum }); let decl = recursive(|decl| { let r#let = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(just(';')) .then(decl.clone()) .map(|((name, rhs), then)| Expr::Let { name, rhs: Box::new(rhs), then: Box::new(then), }); let r#fn = text::keyword("fn") .ignore_then(ident) .then(ident.repeated()) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(just(';')) .then(decl) .map(|(((name, args), body), then)| Expr::Fn { name, args, body: Box::new(body), then: Box::new(then), }); r#let.or(r#fn).or(expr).padded() }); decl.then_ignore(end()) } fn eval<'a>( expr: &'a Expr, vars: &mut Vec<(&'a String, f64)>, funcs: &mut Vec<(&'a String, &'a [String], &'a Expr)>, ) -> Result { match expr { Expr::Num(x) => Ok(*x), Expr::Neg(a) => Ok(-eval(a, vars, funcs)?), Expr::Add(a, b) => Ok(eval(a, vars, funcs)? + eval(b, vars, funcs)?), Expr::Sub(a, b) => Ok(eval(a, vars, funcs)? - eval(b, vars, funcs)?), Expr::Mul(a, b) => Ok(eval(a, vars, funcs)? * eval(b, vars, funcs)?), Expr::Div(a, b) => Ok(eval(a, vars, funcs)? / eval(b, vars, funcs)?), Expr::Var(name) => { if let Some((_, val)) = vars.iter().rev().find(|(var, _)| *var == name) { Ok(*val) } else { Err(format!("Cannot find variable `{}` in scope", name)) } } Expr::Let { name, rhs, then } => { let rhs = eval(rhs, vars, funcs)?; vars.push((name, rhs)); let output = eval(then, vars, funcs); vars.pop(); output } Expr::Call(name, args) => { if let Some((_, arg_names, body)) = funcs.iter().rev().find(|(var, _, _)| *var == name).copied() { if arg_names.len() == args.len() { let mut args = args .iter() .map(|arg| eval(arg, vars, funcs)) .zip(arg_names.iter()) .map(|(val, name)| Ok((name, val?))) .collect::>()?; vars.append(&mut args); let output = eval(body, vars, funcs); vars.truncate(vars.len() - args.len()); output } else { Err(format!( "Wrong number of arguments for function `{}`: expected {}, found {}", name, arg_names.len(), args.len(), )) } } else { Err(format!("Cannot find function `{}` in scope", name)) } } Expr::Fn { name, args, body, then, } => { funcs.push((name, args, body)); let output = eval(then, vars, funcs); funcs.pop(); output } } } fn main() { let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap(); match parser().parse(src) { Ok(ast) => match eval(&ast, &mut Vec::new(), &mut Vec::new()) { Ok(output) => println!("{}", output), Err(eval_err) => println!("Evaluation error: {}", eval_err), }, Err(parse_errs) => parse_errs .into_iter() .for_each(|e| println!("Parse error: {}", e)), } } chumsky-0.9.3/examples/json.rs000064400000000000000000000142141046102023000144570ustar 00000000000000//! This is a parser for JSON. //! Run it with the following command: //! cargo run --example json -- examples/sample.json use ariadne::{Color, Fmt, Label, Report, ReportKind, Source}; use chumsky::prelude::*; use std::{collections::HashMap, env, fs}; #[derive(Clone, Debug)] enum Json { Invalid, Null, Bool(bool), Str(String), Num(f64), Array(Vec), Object(HashMap), } fn parser() -> impl Parser> { recursive(|value| { let frac = just('.').chain(text::digits(10)); let exp = just('e') .or(just('E')) .chain(just('+').or(just('-')).or_not()) .chain::(text::digits(10)); let number = just('-') .or_not() .chain::(text::int(10)) .chain::(frac.or_not().flatten()) .chain::(exp.or_not().flatten()) .collect::() .from_str() .unwrapped() .labelled("number"); let escape = just('\\').ignore_then( just('\\') .or(just('/')) .or(just('"')) .or(just('b').to('\x08')) .or(just('f').to('\x0C')) .or(just('n').to('\n')) .or(just('r').to('\r')) .or(just('t').to('\t')) .or(just('u').ignore_then( filter(|c: &char| c.is_digit(16)) .repeated() .exactly(4) .collect::() .validate(|digits, span, emit| { char::from_u32(u32::from_str_radix(&digits, 16).unwrap()) .unwrap_or_else(|| { emit(Simple::custom(span, "invalid unicode character")); '\u{FFFD}' // unicode replacement character }) }), )), ); let string = just('"') .ignore_then(filter(|c| *c != '\\' && *c != '"').or(escape).repeated()) .then_ignore(just('"')) .collect::() .labelled("string"); let array = value .clone() .chain(just(',').ignore_then(value.clone()).repeated()) .or_not() .flatten() .delimited_by(just('['), just(']')) .map(Json::Array) .labelled("array"); let member = string.clone().then_ignore(just(':').padded()).then(value); let object = member .clone() .chain(just(',').padded().ignore_then(member).repeated()) .or_not() .flatten() .padded() .delimited_by(just('{'), just('}')) .collect::>() .map(Json::Object) .labelled("object"); just("null") .to(Json::Null) .labelled("null") .or(just("true").to(Json::Bool(true)).labelled("true")) .or(just("false").to(Json::Bool(false)).labelled("false")) .or(number.map(Json::Num)) .or(string.map(Json::Str)) .or(array) .or(object) .recover_with(nested_delimiters('{', '}', [('[', ']')], |_| Json::Invalid)) .recover_with(nested_delimiters('[', ']', [('{', '}')], |_| Json::Invalid)) .recover_with(skip_then_retry_until(['}', ']'])) .padded() }) .then_ignore(end().recover_with(skip_then_retry_until([]))) } fn main() { let src = fs::read_to_string(env::args().nth(1).expect("Expected file argument")) .expect("Failed to read file"); let (json, errs) = parser().parse_recovery(src.trim()); println!("{:#?}", json); errs.into_iter().for_each(|e| { let msg = if let chumsky::error::SimpleReason::Custom(msg) = e.reason() { msg.clone() } else { format!( "{}{}, expected {}", if e.found().is_some() { "Unexpected token" } else { "Unexpected end of input" }, if let Some(label) = e.label() { format!(" while parsing {}", label) } else { String::new() }, if e.expected().len() == 0 { "something else".to_string() } else { e.expected() .map(|expected| match expected { Some(expected) => expected.to_string(), None => "end of input".to_string(), }) .collect::>() .join(", ") }, ) }; let report = Report::build(ReportKind::Error, (), e.span().start) .with_code(3) .with_message(msg) .with_label( Label::new(e.span()) .with_message(match e.reason() { chumsky::error::SimpleReason::Custom(msg) => msg.clone(), _ => format!( "Unexpected {}", e.found() .map(|c| format!("token {}", c.fg(Color::Red))) .unwrap_or_else(|| "end of input".to_string()) ), }) .with_color(Color::Red), ); let report = match e.reason() { chumsky::error::SimpleReason::Unclosed { span, delimiter } => report.with_label( Label::new(span.clone()) .with_message(format!( "Unclosed delimiter {}", delimiter.fg(Color::Yellow) )) .with_color(Color::Yellow), ), chumsky::error::SimpleReason::Unexpected => report, chumsky::error::SimpleReason::Custom(_) => report, }; report.finish().print(Source::from(&src)).unwrap(); }); } chumsky-0.9.3/examples/nano_rust.rs000064400000000000000000000554131046102023000155240ustar 00000000000000//! This is an entire parser and interpreter for a dynamically-typed Rust-like expression-oriented //! programming language. See `sample.nrs` for sample source code. //! Run it with the following command: //! cargo run --example nano_rust -- examples/sample.nrs use ariadne::{Color, Fmt, Label, Report, ReportKind, Source}; use chumsky::{prelude::*, stream::Stream}; use std::{collections::HashMap, env, fmt, fs}; pub type Span = std::ops::Range; #[derive(Clone, Debug, PartialEq, Eq, Hash)] enum Token { Null, Bool(bool), Num(String), Str(String), Op(String), Ctrl(char), Ident(String), Fn, Let, Print, If, Else, } impl fmt::Display for Token { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { Token::Null => write!(f, "null"), Token::Bool(x) => write!(f, "{}", x), Token::Num(n) => write!(f, "{}", n), Token::Str(s) => write!(f, "{}", s), Token::Op(s) => write!(f, "{}", s), Token::Ctrl(c) => write!(f, "{}", c), Token::Ident(s) => write!(f, "{}", s), Token::Fn => write!(f, "fn"), Token::Let => write!(f, "let"), Token::Print => write!(f, "print"), Token::If => write!(f, "if"), Token::Else => write!(f, "else"), } } } fn lexer() -> impl Parser, Error = Simple> { // A parser for numbers let num = text::int(10) .chain::(just('.').chain(text::digits(10)).or_not().flatten()) .collect::() .map(Token::Num); // A parser for strings let str_ = just('"') .ignore_then(filter(|c| *c != '"').repeated()) .then_ignore(just('"')) .collect::() .map(Token::Str); // A parser for operators let op = one_of("+-*/!=") .repeated() .at_least(1) .collect::() .map(Token::Op); // A parser for control characters (delimiters, semicolons, etc.) let ctrl = one_of("()[]{};,").map(|c| Token::Ctrl(c)); // A parser for identifiers and keywords let ident = text::ident().map(|ident: String| match ident.as_str() { "fn" => Token::Fn, "let" => Token::Let, "print" => Token::Print, "if" => Token::If, "else" => Token::Else, "true" => Token::Bool(true), "false" => Token::Bool(false), "null" => Token::Null, _ => Token::Ident(ident), }); // A single token can be one of the above let token = num .or(str_) .or(op) .or(ctrl) .or(ident) .recover_with(skip_then_retry_until([])); let comment = just("//").then(take_until(just('\n'))).padded(); token .map_with_span(|tok, span| (tok, span)) .padded_by(comment.repeated()) .padded() .repeated() } #[derive(Clone, Debug, PartialEq)] enum Value { Null, Bool(bool), Num(f64), Str(String), List(Vec), Func(String), } impl Value { fn num(self, span: Span) -> Result { if let Value::Num(x) = self { Ok(x) } else { Err(Error { span, msg: format!("'{}' is not a number", self), }) } } } impl std::fmt::Display for Value { fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { match self { Self::Null => write!(f, "null"), Self::Bool(x) => write!(f, "{}", x), Self::Num(x) => write!(f, "{}", x), Self::Str(x) => write!(f, "{}", x), Self::List(xs) => write!( f, "[{}]", xs.iter() .map(|x| x.to_string()) .collect::>() .join(", ") ), Self::Func(name) => write!(f, "", name), } } } #[derive(Clone, Debug)] enum BinaryOp { Add, Sub, Mul, Div, Eq, NotEq, } pub type Spanned = (T, Span); // An expression node in the AST. Children are spanned so we can generate useful runtime errors. #[derive(Debug)] enum Expr { Error, Value(Value), List(Vec>), Local(String), Let(String, Box>, Box>), Then(Box>, Box>), Binary(Box>, BinaryOp, Box>), Call(Box>, Vec>), If(Box>, Box>, Box>), Print(Box>), } // A function node in the AST. #[derive(Debug)] struct Func { args: Vec, body: Spanned, } fn expr_parser() -> impl Parser, Error = Simple> + Clone { recursive(|expr| { let raw_expr = recursive(|raw_expr| { let val = select! { Token::Null => Expr::Value(Value::Null), Token::Bool(x) => Expr::Value(Value::Bool(x)), Token::Num(n) => Expr::Value(Value::Num(n.parse().unwrap())), Token::Str(s) => Expr::Value(Value::Str(s)), } .labelled("value"); let ident = select! { Token::Ident(ident) => ident.clone() }.labelled("identifier"); // A list of expressions let items = expr .clone() .separated_by(just(Token::Ctrl(','))) .allow_trailing(); // A let expression let let_ = just(Token::Let) .ignore_then(ident) .then_ignore(just(Token::Op("=".to_string()))) .then(raw_expr) .then_ignore(just(Token::Ctrl(';'))) .then(expr.clone()) .map(|((name, val), body)| Expr::Let(name, Box::new(val), Box::new(body))); let list = items .clone() .delimited_by(just(Token::Ctrl('[')), just(Token::Ctrl(']'))) .map(Expr::List); // 'Atoms' are expressions that contain no ambiguity let atom = val .or(ident.map(Expr::Local)) .or(let_) .or(list) // In Nano Rust, `print` is just a keyword, just like Python 2, for simplicity .or(just(Token::Print) .ignore_then( expr.clone() .delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')'))), ) .map(|expr| Expr::Print(Box::new(expr)))) .map_with_span(|expr, span| (expr, span)) // Atoms can also just be normal expressions, but surrounded with parentheses .or(expr .clone() .delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')')))) // Attempt to recover anything that looks like a parenthesised expression but contains errors .recover_with(nested_delimiters( Token::Ctrl('('), Token::Ctrl(')'), [ (Token::Ctrl('['), Token::Ctrl(']')), (Token::Ctrl('{'), Token::Ctrl('}')), ], |span| (Expr::Error, span), )) // Attempt to recover anything that looks like a list but contains errors .recover_with(nested_delimiters( Token::Ctrl('['), Token::Ctrl(']'), [ (Token::Ctrl('('), Token::Ctrl(')')), (Token::Ctrl('{'), Token::Ctrl('}')), ], |span| (Expr::Error, span), )); // Function calls have very high precedence so we prioritise them let call = atom .then( items .delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')'))) .map_with_span(|args, span: Span| (args, span)) .repeated(), ) .foldl(|f, args| { let span = f.1.start..args.1.end; (Expr::Call(Box::new(f), args.0), span) }); // Product ops (multiply and divide) have equal precedence let op = just(Token::Op("*".to_string())) .to(BinaryOp::Mul) .or(just(Token::Op("/".to_string())).to(BinaryOp::Div)); let product = call .clone() .then(op.then(call).repeated()) .foldl(|a, (op, b)| { let span = a.1.start..b.1.end; (Expr::Binary(Box::new(a), op, Box::new(b)), span) }); // Sum ops (add and subtract) have equal precedence let op = just(Token::Op("+".to_string())) .to(BinaryOp::Add) .or(just(Token::Op("-".to_string())).to(BinaryOp::Sub)); let sum = product .clone() .then(op.then(product).repeated()) .foldl(|a, (op, b)| { let span = a.1.start..b.1.end; (Expr::Binary(Box::new(a), op, Box::new(b)), span) }); // Comparison ops (equal, not-equal) have equal precedence let op = just(Token::Op("==".to_string())) .to(BinaryOp::Eq) .or(just(Token::Op("!=".to_string())).to(BinaryOp::NotEq)); let compare = sum .clone() .then(op.then(sum).repeated()) .foldl(|a, (op, b)| { let span = a.1.start..b.1.end; (Expr::Binary(Box::new(a), op, Box::new(b)), span) }); compare }); // Blocks are expressions but delimited with braces let block = expr .clone() .delimited_by(just(Token::Ctrl('{')), just(Token::Ctrl('}'))) // Attempt to recover anything that looks like a block but contains errors .recover_with(nested_delimiters( Token::Ctrl('{'), Token::Ctrl('}'), [ (Token::Ctrl('('), Token::Ctrl(')')), (Token::Ctrl('['), Token::Ctrl(']')), ], |span| (Expr::Error, span), )); let if_ = recursive(|if_| { just(Token::If) .ignore_then(expr.clone()) .then(block.clone()) .then( just(Token::Else) .ignore_then(block.clone().or(if_)) .or_not(), ) .map_with_span(|((cond, a), b), span: Span| { ( Expr::If( Box::new(cond), Box::new(a), Box::new(match b { Some(b) => b, // If an `if` expression has no trailing `else` block, we magic up one that just produces null None => (Expr::Value(Value::Null), span.clone()), }), ), span, ) }) }); // Both blocks and `if` are 'block expressions' and can appear in the place of statements let block_expr = block.or(if_).labelled("block"); let block_chain = block_expr .clone() .then(block_expr.clone().repeated()) .foldl(|a, b| { let span = a.1.start..b.1.end; (Expr::Then(Box::new(a), Box::new(b)), span) }); block_chain // Expressions, chained by semicolons, are statements .or(raw_expr.clone()) .then(just(Token::Ctrl(';')).ignore_then(expr.or_not()).repeated()) .foldl(|a, b| { // This allows creating a span that covers the entire Then expression. // b_end is the end of b if it exists, otherwise it is the end of a. let a_start = a.1.start; let b_end = b.as_ref().map(|b| b.1.end).unwrap_or(a.1.end); ( Expr::Then( Box::new(a), Box::new(match b { Some(b) => b, // Since there is no b expression then its span is empty. None => (Expr::Value(Value::Null), b_end..b_end), }), ), a_start..b_end, ) }) }) } fn funcs_parser() -> impl Parser, Error = Simple> + Clone { let ident = filter_map(|span, tok| match tok { Token::Ident(ident) => Ok(ident.clone()), _ => Err(Simple::expected_input_found(span, Vec::new(), Some(tok))), }); // Argument lists are just identifiers separated by commas, surrounded by parentheses let args = ident .clone() .separated_by(just(Token::Ctrl(','))) .allow_trailing() .delimited_by(just(Token::Ctrl('(')), just(Token::Ctrl(')'))) .labelled("function args"); let func = just(Token::Fn) .ignore_then( ident .map_with_span(|name, span| (name, span)) .labelled("function name"), ) .then(args) .then( expr_parser() .delimited_by(just(Token::Ctrl('{')), just(Token::Ctrl('}'))) // Attempt to recover anything that looks like a function body but contains errors .recover_with(nested_delimiters( Token::Ctrl('{'), Token::Ctrl('}'), [ (Token::Ctrl('('), Token::Ctrl(')')), (Token::Ctrl('['), Token::Ctrl(']')), ], |span| (Expr::Error, span), )), ) .map(|((name, args), body)| (name, Func { args, body })) .labelled("function"); func.repeated() .try_map(|fs, _| { let mut funcs = HashMap::new(); for ((name, name_span), f) in fs { if funcs.insert(name.clone(), f).is_some() { return Err(Simple::custom( name_span.clone(), format!("Function '{}' already exists", name), )); } } Ok(funcs) }) .then_ignore(end()) } struct Error { span: Span, msg: String, } fn eval_expr( expr: &Spanned, funcs: &HashMap, stack: &mut Vec<(String, Value)>, ) -> Result { Ok(match &expr.0 { Expr::Error => unreachable!(), // Error expressions only get created by parser errors, so cannot exist in a valid AST Expr::Value(val) => val.clone(), Expr::List(items) => Value::List( items .iter() .map(|item| eval_expr(item, funcs, stack)) .collect::>()?, ), Expr::Local(name) => stack .iter() .rev() .find(|(l, _)| l == name) .map(|(_, v)| v.clone()) .or_else(|| Some(Value::Func(name.clone())).filter(|_| funcs.contains_key(name))) .ok_or_else(|| Error { span: expr.1.clone(), msg: format!("No such variable '{}' in scope", name), })?, Expr::Let(local, val, body) => { let val = eval_expr(val, funcs, stack)?; stack.push((local.clone(), val)); let res = eval_expr(body, funcs, stack)?; stack.pop(); res } Expr::Then(a, b) => { eval_expr(a, funcs, stack)?; eval_expr(b, funcs, stack)? } Expr::Binary(a, BinaryOp::Add, b) => Value::Num( eval_expr(a, funcs, stack)?.num(a.1.clone())? + eval_expr(b, funcs, stack)?.num(b.1.clone())?, ), Expr::Binary(a, BinaryOp::Sub, b) => Value::Num( eval_expr(a, funcs, stack)?.num(a.1.clone())? - eval_expr(b, funcs, stack)?.num(b.1.clone())?, ), Expr::Binary(a, BinaryOp::Mul, b) => Value::Num( eval_expr(a, funcs, stack)?.num(a.1.clone())? * eval_expr(b, funcs, stack)?.num(b.1.clone())?, ), Expr::Binary(a, BinaryOp::Div, b) => Value::Num( eval_expr(a, funcs, stack)?.num(a.1.clone())? / eval_expr(b, funcs, stack)?.num(b.1.clone())?, ), Expr::Binary(a, BinaryOp::Eq, b) => { Value::Bool(eval_expr(a, funcs, stack)? == eval_expr(b, funcs, stack)?) } Expr::Binary(a, BinaryOp::NotEq, b) => { Value::Bool(eval_expr(a, funcs, stack)? != eval_expr(b, funcs, stack)?) } Expr::Call(func, args) => { let f = eval_expr(func, funcs, stack)?; match f { Value::Func(name) => { let f = &funcs[&name]; let mut stack = if f.args.len() != args.len() { return Err(Error { span: expr.1.clone(), msg: format!("'{}' called with wrong number of arguments (expected {}, found {})", name, f.args.len(), args.len()), }); } else { f.args .iter() .zip(args.iter()) .map(|(name, arg)| Ok((name.clone(), eval_expr(arg, funcs, stack)?))) .collect::>()? }; eval_expr(&f.body, funcs, &mut stack)? } f => { return Err(Error { span: func.1.clone(), msg: format!("'{:?}' is not callable", f), }) } } } Expr::If(cond, a, b) => { let c = eval_expr(cond, funcs, stack)?; match c { Value::Bool(true) => eval_expr(a, funcs, stack)?, Value::Bool(false) => eval_expr(b, funcs, stack)?, c => { return Err(Error { span: cond.1.clone(), msg: format!("Conditions must be booleans, found '{:?}'", c), }) } } } Expr::Print(a) => { let val = eval_expr(a, funcs, stack)?; println!("{}", val); val } }) } fn main() { let src = fs::read_to_string(env::args().nth(1).expect("Expected file argument")) .expect("Failed to read file"); let (tokens, mut errs) = lexer().parse_recovery(src.as_str()); let parse_errs = if let Some(tokens) = tokens { //dbg!(tokens); let len = src.chars().count(); let (ast, parse_errs) = funcs_parser().parse_recovery(Stream::from_iter(len..len + 1, tokens.into_iter())); //dbg!(ast); if let Some(funcs) = ast.filter(|_| errs.len() + parse_errs.len() == 0) { if let Some(main) = funcs.get("main") { assert_eq!(main.args.len(), 0); match eval_expr(&main.body, &funcs, &mut Vec::new()) { Ok(val) => println!("Return value: {}", val), Err(e) => errs.push(Simple::custom(e.span, e.msg)), } } else { panic!("No main function!"); } } parse_errs } else { Vec::new() }; errs.into_iter() .map(|e| e.map(|c| c.to_string())) .chain(parse_errs.into_iter().map(|e| e.map(|tok| tok.to_string()))) .for_each(|e| { let report = Report::build(ReportKind::Error, (), e.span().start); let report = match e.reason() { chumsky::error::SimpleReason::Unclosed { span, delimiter } => report .with_message(format!( "Unclosed delimiter {}", delimiter.fg(Color::Yellow) )) .with_label( Label::new(span.clone()) .with_message(format!( "Unclosed delimiter {}", delimiter.fg(Color::Yellow) )) .with_color(Color::Yellow), ) .with_label( Label::new(e.span()) .with_message(format!( "Must be closed before this {}", e.found() .unwrap_or(&"end of file".to_string()) .fg(Color::Red) )) .with_color(Color::Red), ), chumsky::error::SimpleReason::Unexpected => report .with_message(format!( "{}, expected {}", if e.found().is_some() { "Unexpected token in input" } else { "Unexpected end of input" }, if e.expected().len() == 0 { "something else".to_string() } else { e.expected() .map(|expected| match expected { Some(expected) => expected.to_string(), None => "end of input".to_string(), }) .collect::>() .join(", ") } )) .with_label( Label::new(e.span()) .with_message(format!( "Unexpected token {}", e.found() .unwrap_or(&"end of file".to_string()) .fg(Color::Red) )) .with_color(Color::Red), ), chumsky::error::SimpleReason::Custom(msg) => report.with_message(msg).with_label( Label::new(e.span()) .with_message(format!("{}", msg.fg(Color::Red))) .with_color(Color::Red), ), }; report.finish().print(Source::from(&src)).unwrap(); }); } chumsky-0.9.3/examples/pythonic.rs000064400000000000000000000061561046102023000153510ustar 00000000000000use chumsky::{prelude::*, BoxStream, Flat}; use std::ops::Range; // Represents the different kinds of delimiters we care about #[derive(Copy, Clone, Debug)] enum Delim { Paren, Block, } // An 'atomic' token (i.e: it has no child tokens) #[derive(Clone, Debug)] enum Token { Int(u64), Ident(String), Op(String), Open(Delim), Close(Delim), } // The output of the lexer: a recursive tree of nested tokens #[derive(Debug)] enum TokenTree { Token(Token), Tree(Delim, Vec>), } type Span = Range; type Spanned = (T, Span); // A parser that turns pythonic code with semantic whitespace into a token tree fn lexer() -> impl Parser>, Error = Simple> { let tt = recursive(|tt| { // Define some atomic tokens let int = text::int(10).from_str().unwrapped().map(Token::Int); let ident = text::ident().map(Token::Ident); let op = one_of("=.:%,") .repeated() .at_least(1) .collect() .map(Token::Op); let single_token = int.or(op).or(ident).map(TokenTree::Token); // Tokens surrounded by parentheses get turned into parenthesised token trees let token_tree = tt .padded() .repeated() .delimited_by(just('('), just(')')) .map(|tts| TokenTree::Tree(Delim::Paren, tts)); single_token .or(token_tree) .map_with_span(|tt, span| (tt, span)) }); // Whitespace indentation creates code block token trees text::semantic_indentation(tt, |tts, span| (TokenTree::Tree(Delim::Block, tts), span)) .then_ignore(end()) } /// Flatten a series of token trees into a single token stream, ready for feeding into the main parser fn tts_to_stream( eoi: Span, token_trees: Vec>, ) -> BoxStream<'static, Token, Span> { use std::iter::once; BoxStream::from_nested(eoi, token_trees.into_iter(), |(tt, span)| match tt { // Single tokens remain unchanged TokenTree::Token(token) => Flat::Single((token, span)), // Nested token trees get flattened into their inner contents, surrounded by `Open` and `Close` tokens TokenTree::Tree(delim, tree) => Flat::Many( once((TokenTree::Token(Token::Open(delim)), span.clone())) .chain(tree.into_iter()) .chain(once((TokenTree::Token(Token::Close(delim)), span))), ), }) } fn main() { let code = include_str!("sample.py"); // First, lex the code into some nested token trees let tts = lexer().parse(code).unwrap(); println!("--- Token Trees ---\n{:#?}", tts); // Next, flatten let eoi = 0..code.chars().count(); let mut token_stream = tts_to_stream(eoi, tts); // At this point, we have a token stream that can be fed into the main parser! Because this is just an example, // we're instead going to just collect the token stream into a vector and print it. let flattened_trees = token_stream.fetch_tokens().collect::>(); println!("--- Flattened Token Trees ---\n{:?}", flattened_trees); } chumsky-0.9.3/examples/sample.bf000064400000000000000000000001341046102023000147260ustar 00000000000000--[>--->->->++>-<<<<<-------]>--.>---------.>--..+++.>----.>+++++++++.<<.+++.------.<-.>>+. chumsky-0.9.3/examples/sample.foo000064400000000000000000000001111046102023000151150ustar 00000000000000let five = 5; let eight = 3 + five; fn add x y = x + y; add(five, eight) chumsky-0.9.3/examples/sample.json000064400000000000000000000006641046102023000153200ustar 00000000000000{ "leaving": { "tail": [{ -2063823378.8597813, !true, false, !null,! -153646.6402, "board" ], "fed": -283765067.9149623, "cowboy": --355139449, "although": 794127593.3922591, "front": "college", "origin": 981339097 }, "though": ~true, "invalid": "\uDFFF", "activity": "value", "office": -342325541.1937506, "noise": fallse, "acres": "home", "foo": [}] } chumsky-0.9.3/examples/sample.nrs000064400000000000000000000016621046102023000151500ustar 00000000000000// Run this example with `cargo run --example nano_rust -- examples/sample.nrs` // Feel free to play around with this sample to see what errors you can generate! // Spans are propagated to the interpreted AST so you can even invoke runtime // errors and still have an error message that points to source code emitted! fn mul(x, y) { x * y } // Calculate the factorial of a number fn factorial(x) { // Conditionals are supported! if x == 0 { 1 } else { mul(x, factorial(x - 1)) } } // The main function fn main() { let three = 3; let meaning_of_life = three * 14 + 1; print("Hello, world!"); print("The meaning of life is..."); if meaning_of_life == 42 { print(meaning_of_life); } else { print("...something we cannot know"); print("However, I can tell you that the factorial of 10 is..."); // Function calling print(factorial(10)); } } chumsky-0.9.3/examples/sample.py000064400000000000000000000003111046102023000147640ustar 00000000000000import turtle board = turtle.Turtle( foo, bar, baz, ) for i in range(6): board.forward(50) if i % 2 == 0: board.right(144) else: board.left(72) turtle.done() chumsky-0.9.3/src/chain.rs000064400000000000000000000063631046102023000135470ustar 00000000000000//! Traits that allow chaining parser outputs together. //! //! *“And what’s happened to the Earth?” “Ah. It’s been demolished.” “Has it,” said Arthur levelly. “Yes. It just //! boiled away into space.” “Look,” said Arthur, “I’m a bit upset about that.”* //! //! You usually don't need to interact with this trait, or even import it. It's only public so that you can see which //! types implement it. See [`Parser::chain`](super::Parser) for examples of its usage. use alloc::{string::String, vec::Vec}; mod private { use super::*; pub trait Sealed {} impl Sealed for T {} impl> Sealed for (A, T) {} impl Sealed for Option {} impl Sealed for Vec {} impl Sealed for Option> {} impl Sealed for Vec> {} impl> Sealed for Vec<(A, T)> {} impl Sealed for String {} impl Sealed for Option {} } /// A utility trait that facilitates chaining parser outputs together into [`Vec`]s. /// /// See [`Parser::chain`](super::Parser). #[allow(clippy::len_without_is_empty)] pub trait Chain: private::Sealed { /// The number of items that this chain link consists of. fn len(&self) -> usize; /// Append the elements in this link to the chain. fn append_to(self, v: &mut Vec); } impl Chain for T { fn len(&self) -> usize { 1 } fn append_to(self, v: &mut Vec) { v.push(self); } } impl> Chain for (A, T) { fn len(&self) -> usize { 1 } fn append_to(self, v: &mut Vec) { self.0.append_to(v); v.push(self.1); } } impl Chain for Option { fn len(&self) -> usize { self.is_some() as usize } fn append_to(self, v: &mut Vec) { if let Some(x) = self { v.push(x); } } } impl Chain for Vec { fn len(&self) -> usize { self.as_slice().len() } fn append_to(mut self, v: &mut Vec) { v.append(&mut self); } } impl Chain for String { // TODO: Quite inefficient fn len(&self) -> usize { self.chars().count() } fn append_to(self, v: &mut Vec) { v.extend(self.chars()); } } impl Chain for Option> { fn len(&self) -> usize { self.as_ref().map_or(0, Chain::::len) } fn append_to(self, v: &mut Vec) { if let Some(x) = self { x.append_to(v); } } } impl Chain for Option { fn len(&self) -> usize { self.as_ref().map_or(0, Chain::::len) } fn append_to(self, v: &mut Vec) { if let Some(x) = self { x.append_to(v); } } } impl Chain for Vec> { fn len(&self) -> usize { self.iter().map(Chain::::len).sum() } fn append_to(self, v: &mut Vec) { self .into_iter() .for_each(|x| x.append_to(v)); } } impl> Chain for Vec<(A, T)> { fn len(&self) -> usize { self.iter().map(Chain::::len).sum() } fn append_to(self, v: &mut Vec) { self .into_iter() .for_each(|x| x.append_to(v)); } } chumsky-0.9.3/src/combinator.rs000064400000000000000000001403311046102023000146140ustar 00000000000000//! Combinators that allow combining and extending existing parsers. //! //! *“Ford... you're turning into a penguin. Stop it.”* //! //! Although it's *sometimes* useful to be able to name their type, most of these parsers are much easier to work with //! when accessed through their respective methods on [`Parser`]. use super::*; /// See [`Parser::ignored`]. pub type Ignored = To; /// See [`Parser::ignore_then`]. pub type IgnoreThen = Map, fn((O, U)) -> U, (O, U)>; /// See [`Parser::then_ignore`]. pub type ThenIgnore = Map, fn((O, U)) -> O, (O, U)>; /// See [`Parser::or`]. #[must_use] #[derive(Copy, Clone)] pub struct Or(pub(crate) A, pub(crate) B); impl, B: Parser, E: Error> Parser for Or { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let pre_state = stream.save(); #[allow(deprecated)] let a_res = debugger.invoke(&self.0, stream); let a_state = stream.save(); // If the first parser succeeded and produced no secondary errors, don't bother trying the second parser // TODO: Perhaps we should *alwaus* take this route, even if recoverable errors did occur? Seems like an // inconsistent application of PEG rules... if a_res.0.is_empty() { if let (a_errors, Ok(a_out)) = a_res { return (a_errors, Ok(a_out)); } } stream.revert(pre_state); #[allow(deprecated)] let b_res = debugger.invoke(&self.1, stream); let b_state = stream.save(); if b_res.0.is_empty() { if let (b_errors, Ok(b_out)) = b_res { return (b_errors, Ok(b_out)); } } #[inline] fn choose_between>( a_res: PResult, a_state: usize, b_res: PResult, b_state: usize, stream: &mut StreamOf, ) -> PResult { fn zip_with R>( a: Option, b: Option, f: F, ) -> Option { match (a, b) { (Some(a), Some(b)) => Some(f(a, b)), _ => None, } } let is_a = match (&a_res, &b_res) { ((a_errors, Ok(a_out)), (b_errors, Ok(b_out))) => { match a_errors.len().cmp(&b_errors.len()) { Ordering::Greater => false, Ordering::Less => true, Ordering::Equal => { match zip_with(a_errors.last(), b_errors.last(), |a, b| a.at.cmp(&b.at)) { Some(Ordering::Greater) => true, Some(Ordering::Less) => false, _ => match zip_with(a_out.1.as_ref(), b_out.1.as_ref(), |a, b| { a.at.cmp(&b.at) }) { Some(Ordering::Greater) => true, Some(Ordering::Less) => false, _ => true, }, } } } } // ((a_errors, Ok(_)), (b_errors, Err(_))) if !a_errors.is_empty() => panic!("a_errors = {:?}", a_errors.iter().map(|e| e.debug()).collect::>()), ((_a_errors, Ok(_)), (_b_errors, Err(_))) => true, // ((a_errors, Err(_)), (b_errors, Ok(_))) if !b_errors.is_empty() => panic!("b_errors = {:?}", b_errors.iter().map(|e| e.debug()).collect::>()), ((_a_errors, Err(_)), (_b_errors, Ok(_))) => false, ((a_errors, Err(a_err)), (b_errors, Err(b_err))) => match a_err.at.cmp(&b_err.at) { Ordering::Greater => true, Ordering::Less => false, Ordering::Equal => match a_errors.len().cmp(&b_errors.len()) { Ordering::Greater => false, Ordering::Less => true, Ordering::Equal => { match zip_with(a_errors.last(), b_errors.last(), |a, b| a.at.cmp(&b.at)) { Some(Ordering::Greater) => true, Some(Ordering::Less) => false, // If the branches really do seem to be equally valid as parse options, try to unify them // We already know that both parsers produces hard errors, so unwrapping cannot fail here _ => { return ( a_res.0, Err(a_res.1.err().unwrap().max(b_res.1.err().unwrap())), ) } } } }, }, }; if is_a { stream.revert(a_state); ( a_res.0, a_res.1.map(|(out, alt)| { ( out, merge_alts(alt, b_res.1.map(|(_, alt)| alt).unwrap_or_else(Some)), ) }), ) } else { stream.revert(b_state); ( b_res.0, b_res.1.map(|(out, alt)| { ( out, merge_alts(alt, a_res.1.map(|(_, alt)| alt).unwrap_or_else(Some)), ) }), ) } } choose_between(a_res, a_state, b_res, b_state, stream) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::or_not`]. #[must_use] #[derive(Copy, Clone)] pub struct OrNot(pub(crate) A); impl, E: Error> Parser> for OrNot { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult, E> { match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(&self.0, stream) }) { (errors, Ok((out, alt))) => (errors, Ok((Some(out), alt))), (_, Err(err)) => (Vec::new(), Ok((None, Some(err)))), } } #[inline] fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult, E> { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult, E> { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::not`]. #[must_use] pub struct Not(pub(crate) A, pub(crate) PhantomData); impl Copy for Not {} impl Clone for Not { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl, E: Error> Parser for Not { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let before = stream.save(); match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(&self.0, stream) }) { (_, Ok(_)) => { stream.revert(before); let (at, span, found) = stream.next(); ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Vec::new(), found), )), ) } (_, Err(_)) => { stream.revert(before); let (at, span, found) = stream.next(); ( Vec::new(), if let Some(found) = found { Ok((found, None)) } else { Err(Located::at( at, E::expected_input_found(span, Vec::new(), None), )) }, ) } } } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::then`]. #[must_use] #[derive(Copy, Clone)] pub struct Then(pub(crate) A, pub(crate) B); impl, B: Parser, E: Error> Parser for Then { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { match { #[allow(deprecated)] debugger.invoke(&self.0, stream) } { (mut a_errors, Ok((a_out, a_alt))) => match { #[allow(deprecated)] debugger.invoke(&self.1, stream) } { (mut b_errors, Ok((b_out, b_alt))) => { a_errors.append(&mut b_errors); (a_errors, Ok(((a_out, b_out), merge_alts(a_alt, b_alt)))) } (mut b_errors, Err(b_err)) => { a_errors.append(&mut b_errors); (a_errors, Err(b_err.max(a_alt))) } }, (a_errors, Err(a_err)) => (a_errors, Err(a_err)), } } #[inline] fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::then_with`] #[must_use] pub struct ThenWith( pub(crate) A, pub(crate) F, pub(crate) PhantomData<(I, O1, O2, B)>, ); impl Clone for ThenWith { fn clone(&self) -> Self { ThenWith(self.0.clone(), self.1.clone(), PhantomData) } } impl Copy for ThenWith {} impl< I: Clone, O1, O2, A: Parser, B: Parser, F: Fn(O1) -> B, E: Error, > Parser for ThenWith { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let state = stream.save(); match { #[allow(deprecated)] debugger.invoke(&self.0, stream) } { (mut first_errs, Ok((first_out, first_alt))) => { let second_out = self.1(first_out); match { #[allow(deprecated)] debugger.invoke(&second_out, stream) } { (second_errs, Ok((second_out, second_alt))) => { first_errs.extend(second_errs); (first_errs, Ok((second_out, first_alt.or(second_alt)))) } (second_errs, Err(e)) => { stream.revert(state); first_errs.extend(second_errs); (first_errs, Err(e)) } } } (errs, Err(e)) => { stream.revert(state); (errs, Err(e)) } } } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::delimited_by`]. #[must_use] #[derive(Copy, Clone)] pub struct DelimitedBy { pub(crate) item: A, pub(crate) start: L, pub(crate) end: R, pub(crate) phantom: PhantomData<(U, V)>, } impl< I: Clone, O, A: Parser, L: Parser + Clone, R: Parser + Clone, U, V, E: Error, > Parser for DelimitedBy { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { // TODO: Don't clone! #[allow(deprecated)] let (errors, res) = debugger.invoke( &self .start .clone() .ignore_then(&self.item) .then_ignore(self.end.clone()), stream, ); (errors, res) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::repeated`]. #[must_use] #[derive(Copy, Clone)] pub struct Repeated(pub(crate) A, pub(crate) usize, pub(crate) Option); impl Repeated { /// Require that the pattern appear at least a minimum number of times. pub fn at_least(mut self, min: usize) -> Self { self.1 = min; self } /// Require that the pattern appear at most a maximum number of times. pub fn at_most(mut self, max: usize) -> Self { self.2 = Some(max); self } /// Require that the pattern appear exactly the given number of times. /// /// ``` /// # use chumsky::prelude::*; /// let ring = just::<_, _, Simple>('O'); /// /// let for_the_elves = ring /// .repeated() /// .exactly(3); /// /// let for_the_dwarves = ring /// .repeated() /// .exactly(6); /// /// let for_the_humans = ring /// .repeated() /// .exactly(9); /// /// let for_sauron = ring /// .repeated() /// .exactly(1); /// /// let rings = for_the_elves /// .then(for_the_dwarves) /// .then(for_the_humans) /// .then(for_sauron) /// .then_ignore(end()); /// /// assert!(rings.parse("OOOOOOOOOOOOOOOOOO").is_err()); // Too few rings! /// assert!(rings.parse("OOOOOOOOOOOOOOOOOOOO").is_err()); // Too many rings! /// // The perfect number of rings /// assert_eq!( /// rings.parse("OOOOOOOOOOOOOOOOOOO"), /// Ok(((((vec!['O'; 3]), vec!['O'; 6]), vec!['O'; 9]), vec!['O'; 1])), /// ); /// ```` pub fn exactly(mut self, n: usize) -> Self { self.1 = n; self.2 = Some(n); self } } impl, E: Error> Parser> for Repeated { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult, E> { let mut errors = Vec::new(); let mut outputs = Vec::new(); let mut alt = None; let mut old_offset = None; loop { if self.2.map_or(false, |max| outputs.len() >= max) { break (errors, Ok((outputs, alt))); } if let ControlFlow::Break(b) = stream.attempt(|stream| match { #[allow(deprecated)] debugger.invoke(&self.0, stream) } { (mut a_errors, Ok((a_out, a_alt))) => { errors.append(&mut a_errors); alt = merge_alts(alt.take(), a_alt); outputs.push(a_out); if old_offset == Some(stream.offset()) { panic!("Repeated parser iteration succeeded but consumed no inputs (i.e: continuing \ iteration would likely lead to an infinite loop, if the parser is pure). This is \ likely indicative of a parser bug. Consider using a more specific error recovery \ strategy."); } else { old_offset = Some(stream.offset()); } (true, ControlFlow::Continue(())) }, (mut a_errors, Err(a_err)) if outputs.len() < self.1 => { errors.append(&mut a_errors); (true, ControlFlow::Break(( core::mem::take(&mut errors), Err(a_err), ))) }, (a_errors, Err(a_err)) => { // Find furthest alternative error // TODO: Handle multiple alternative errors // TODO: Should we really be taking *all* of these into consideration? let alt = merge_alts( alt.take(), merge_alts( Some(a_err), a_errors.into_iter().next(), ), ); (false, ControlFlow::Break(( core::mem::take(&mut errors), Ok((core::mem::take(&mut outputs), alt)), ))) }, }) { break b; } } } #[inline] fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult, E> { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult, E> { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::separated_by`]. #[must_use] pub struct SeparatedBy { pub(crate) item: A, pub(crate) delimiter: B, pub(crate) at_least: usize, pub(crate) at_most: Option, pub(crate) allow_leading: bool, pub(crate) allow_trailing: bool, pub(crate) phantom: PhantomData, } impl SeparatedBy { /// Allow a leading separator to appear before the first item. /// /// Note that even if no items are parsed, a leading separator *is* permitted. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let r#enum = text::keyword::<_, _, Simple>("enum") /// .padded() /// .ignore_then(text::ident() /// .padded() /// .separated_by(just('|')) /// .allow_leading()); /// /// assert_eq!(r#enum.parse("enum True | False"), Ok(vec!["True".to_string(), "False".to_string()])); /// assert_eq!(r#enum.parse(" /// enum /// | True /// | False /// "), Ok(vec!["True".to_string(), "False".to_string()])); /// ``` pub fn allow_leading(mut self) -> Self { self.allow_leading = true; self } /// Allow a trailing separator to appear after the last item. /// /// Note that if no items are parsed, no leading separator is permitted. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let numbers = text::int::<_, Simple>(10) /// .padded() /// .separated_by(just(',')) /// .allow_trailing() /// .delimited_by(just('('), just(')')); /// /// assert_eq!(numbers.parse("(1, 2)"), Ok(vec!["1".to_string(), "2".to_string()])); /// assert_eq!(numbers.parse("(1, 2,)"), Ok(vec!["1".to_string(), "2".to_string()])); /// ``` pub fn allow_trailing(mut self) -> Self { self.allow_trailing = true; self } /// Require that the pattern appear at least a minimum number of times. /// /// ``` /// # use chumsky::prelude::*; /// let numbers = just::<_, _, Simple>('-') /// .separated_by(just('.')) /// .at_least(2); /// /// assert!(numbers.parse("").is_err()); /// assert!(numbers.parse("-").is_err()); /// assert_eq!(numbers.parse("-.-"), Ok(vec!['-', '-'])); /// ```` pub fn at_least(mut self, n: usize) -> Self { self.at_least = n; self } /// Require that the pattern appear at most a maximum number of times. /// /// ``` /// # use chumsky::prelude::*; /// let row_4 = text::int::<_, Simple>(10) /// .padded() /// .separated_by(just(',')) /// .at_most(4); /// /// let matrix_4x4 = row_4 /// .separated_by(just(',')) /// .at_most(4); /// /// assert_eq!( /// matrix_4x4.parse("0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15"), /// Ok(vec![ /// vec!["0".to_string(), "1".to_string(), "2".to_string(), "3".to_string()], /// vec!["4".to_string(), "5".to_string(), "6".to_string(), "7".to_string()], /// vec!["8".to_string(), "9".to_string(), "10".to_string(), "11".to_string()], /// vec!["12".to_string(), "13".to_string(), "14".to_string(), "15".to_string()], /// ]), /// ); /// ```` pub fn at_most(mut self, n: usize) -> Self { self.at_most = Some(n); self } /// Require that the pattern appear exactly the given number of times. /// /// ``` /// # use chumsky::prelude::*; /// let coordinate_3d = text::int::<_, Simple>(10) /// .padded() /// .separated_by(just(',')) /// .exactly(3) /// .then_ignore(end()); /// /// // Not enough elements /// assert!(coordinate_3d.parse("4, 3").is_err()); /// // Too many elements /// assert!(coordinate_3d.parse("7, 2, 13, 4").is_err()); /// // Just the right number of elements /// assert_eq!(coordinate_3d.parse("5, 0, 12"), Ok(vec!["5".to_string(), "0".to_string(), "12".to_string()])); /// ```` pub fn exactly(mut self, n: usize) -> Self { self.at_least = n; self.at_most = Some(n); self } } impl Copy for SeparatedBy {} impl Clone for SeparatedBy { fn clone(&self) -> Self { Self { item: self.item.clone(), delimiter: self.delimiter.clone(), at_least: self.at_least, at_most: self.at_most, allow_leading: self.allow_leading, allow_trailing: self.allow_trailing, phantom: PhantomData, } } } impl, B: Parser, E: Error> Parser> for SeparatedBy { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult, E> { if let Some(at_most) = self.at_most { assert!( self.at_least <= at_most, "SeparatedBy cannot parse at least {} and at most {}", self.at_least, at_most ); } enum State { Terminated(Located), Continue, } fn parse_or_not, I: Clone, E: Error, D: Debugger>( delimiter: &B, stream: &mut StreamOf, debugger: &mut D, alt: Option>, ) -> Option> { match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(&delimiter, stream) }) { // These two paths are successful path so the furthest errors are merged with the alt. (d_errors, Ok((_, d_alt))) => merge_alts(alt, merge_alts(d_alt, d_errors)), (d_errors, Err(d_err)) => merge_alts(alt, merge_alts(Some(d_err), d_errors)), } } fn parse, I: Clone, E: Error, D: Debugger>( item: &A, stream: &mut StreamOf, debugger: &mut D, outputs: &mut Vec, errors: &mut Vec>, alt: Option>, ) -> (State, Option>) { match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(item, stream) }) { (mut i_errors, Ok((i_out, i_alt))) => { outputs.push(i_out); errors.append(&mut i_errors); (State::Continue, merge_alts(alt, i_alt)) } (mut i_errors, Err(i_err)) => { errors.append(&mut i_errors); (State::Terminated(i_err), alt) } } } let mut outputs = Vec::new(); let mut errors = Vec::new(); let mut alt = None; if self.allow_leading { alt = parse_or_not(&self.delimiter, stream, debugger, alt); } let (mut state, mut alt) = parse(&self.item, stream, debugger, &mut outputs, &mut errors, alt); let mut offset = stream.save(); let error: Option>; loop { if let State::Terminated(err) = state { error = Some(err); break; } offset = stream.save(); if self .at_most .map_or(false, |at_most| outputs.len() >= at_most) { error = None; break; } match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(&self.delimiter, stream) }) { (mut d_errors, Ok((_, d_alt))) => { errors.append(&mut d_errors); alt = merge_alts(alt, d_alt); let (i_state, i_alt) = parse(&self.item, stream, debugger, &mut outputs, &mut errors, alt); state = i_state; alt = i_alt; } (mut d_errors, Err(d_err)) => { errors.append(&mut d_errors); state = State::Terminated(d_err); } } } stream.revert(offset); if self.allow_trailing && !outputs.is_empty() { alt = parse_or_not(&self.delimiter, stream, debugger, alt); } if outputs.len() >= self.at_least { alt = merge_alts(alt, error); (errors, Ok((outputs, alt))) } else if let Some(error) = error { // In all paths where `State = State::Terminated`, Some(err) is inserted into alt. (errors, Err(error)) } else { (errors, Ok((outputs, alt))) } } #[inline] fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult, E> { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult, E> { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::debug`]. #[must_use] pub struct Debug( pub(crate) A, pub(crate) Rc, pub(crate) core::panic::Location<'static>, ); impl Clone for Debug { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), self.2) } } impl, E: Error> Parser for Debug { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { debugger.scope( || ParserInfo::new("Name", self.1.clone(), self.2), |debugger| { #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); (errors, res) }, ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::map`]. #[must_use] pub struct Map(pub(crate) A, pub(crate) F, pub(crate) PhantomData); impl Copy for Map {} impl Clone for Map { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl, U, F: Fn(O) -> U, E: Error> Parser for Map { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); (errors, res.map(|(out, alt)| ((&self.1)(out), alt))) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::map_with_span`]. #[must_use] pub struct MapWithSpan(pub(crate) A, pub(crate) F, pub(crate) PhantomData); impl Copy for MapWithSpan {} impl Clone for MapWithSpan { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl, U, F: Fn(O, E::Span) -> U, E: Error> Parser for MapWithSpan { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let start = stream.save(); #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); ( errors, res.map(|(out, alt)| ((self.1)(out, stream.span_since(start)), alt)), ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::validate`]. #[must_use] pub struct Validate(pub(crate) A, pub(crate) F, pub(crate) PhantomData); impl Copy for Validate {} impl Clone for Validate { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl< I: Clone, O, U, A: Parser, F: Fn(O, E::Span, &mut dyn FnMut(E)) -> U, E: Error, > Parser for Validate { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let start = stream.save(); #[allow(deprecated)] let (mut errors, res) = debugger.invoke(&self.0, stream); let pos = stream.save(); let span = stream.span_since(start); let res = res.map(|(out, alt)| { ( (&self.1)(out, span, &mut |e| errors.push(Located::at(pos, e))), alt, ) }); (errors, res) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::foldl`]. #[must_use] pub struct Foldl(pub(crate) A, pub(crate) F, pub(crate) PhantomData<(O, U)>); impl Copy for Foldl {} impl Clone for Foldl { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl< I: Clone, O, A: Parser, U: IntoIterator, F: Fn(O, U::Item) -> O, E: Error, > Parser for Foldl { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] debugger.invoke( &(&self.0).map(|(head, tail)| tail.into_iter().fold(head, &self.1)), stream, ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::foldr`]. #[must_use] pub struct Foldr(pub(crate) A, pub(crate) F, pub(crate) PhantomData<(O, U)>); impl Copy for Foldr {} impl Clone for Foldr { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl< I: Clone, O: IntoIterator, A: Parser, U, F: Fn(O::Item, U) -> U, E: Error, > Parser for Foldr where O::IntoIter: DoubleEndedIterator, { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] debugger.invoke( &(&self.0).map(|(init, end)| init.into_iter().rev().fold(end, |b, a| (&self.1)(a, b))), stream, ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::map_err`]. #[must_use] #[derive(Copy, Clone)] pub struct MapErr(pub(crate) A, pub(crate) F); impl, F: Fn(E) -> E, E: Error> Parser for MapErr { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); let mapper = |e: Located| e.map(&self.1); ( errors, //errors.into_iter().map(mapper).collect(), res /*.map(|(out, alt)| (out, alt.map(mapper)))*/ .map_err(mapper), ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::map_err_with_span`]. #[must_use] #[derive(Copy, Clone)] pub struct MapErrWithSpan(pub(crate) A, pub(crate) F); impl, F: Fn(E, E::Span) -> E, E: Error> Parser for MapErrWithSpan { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let start = stream.save(); #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); let mapper = |e: Located| { let at = e.at; e.map(|e| { let span = stream.attempt(|stream| { stream.revert(at); (false, stream.span_since(start)) }); (self.1)(e, span) }) }; (errors, res.map_err(mapper)) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::try_map`]. #[must_use] pub struct TryMap(pub(crate) A, pub(crate) F, pub(crate) PhantomData); impl Copy for TryMap {} impl Clone for TryMap { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl< I: Clone, O, A: Parser, U, F: Fn(O, E::Span) -> Result, E: Error, > Parser for TryMap { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let start = stream.save(); #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); let res = match res.map(|(out, alt)| ((&self.1)(out, stream.span_since(start)), alt)) { Ok((Ok(out), alt)) => Ok((out, alt)), Ok((Err(a_err), _)) => Err(Located::at(stream.save(), a_err)), Err(err) => Err(err), }; (errors, res) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::or_else`]. #[must_use] #[derive(Copy, Clone)] pub struct OrElse(pub(crate) A, pub(crate) F); impl, F: Fn(E) -> Result, E: Error> Parser for OrElse { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let start = stream.save(); #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); let res = match res { Ok(out) => Ok(out), Err(err) => match (&self.1)(err.error) { Err(e) => Err(Located { at: err.at, error: e, phantom: PhantomData, }), Ok(out) => { stream.revert(start); Ok((out, None)) }, }, }; (errors, res) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::labelled`]. #[must_use] #[derive(Copy, Clone)] pub struct Label(pub(crate) A, pub(crate) L); impl, L: Into + Clone, E: Error> Parser for Label { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { // let pre_state = stream.save(); #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.0, stream); let res = res .map(|(o, alt)| { ( o, alt.map(|l| l.map(|e| e.with_label(self.1.clone().into()))), ) }) .map_err(|e| { /* TODO: Not this? */ /*if e.at > pre_state {*/ // Only add the label if we committed to this pattern somewhat e.map(|e| e.with_label(self.1.clone().into())) /*} else { e }*/ }); ( errors .into_iter() .map(|e| e.map(|e| e.with_label(self.1.clone().into()))) .collect(), res, ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::to`]. #[must_use] pub struct To(pub(crate) A, pub(crate) U, pub(crate) PhantomData); impl Copy for To {} impl Clone for To { fn clone(&self) -> Self { Self(self.0.clone(), self.1.clone(), PhantomData) } } impl, U: Clone, E: Error> Parser for To { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] debugger.invoke(&(&self.0).map(|_| self.1.clone()), stream) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::rewind`]. #[must_use] #[derive(Copy, Clone)] pub struct Rewind(pub(crate) A); impl, A> Parser for Rewind where A: Parser, { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult where Self: Sized, { let rewind_from = stream.save(); match { #[allow(deprecated)] debugger.invoke(&self.0, stream) } { (errors, Ok((out, alt))) => { stream.revert(rewind_from); (errors, Ok((out, alt))) } (errors, Err(err)) => (errors, Err(err)), } } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::unwrapped`] #[must_use] pub struct Unwrapped( pub(crate) &'static Location<'static>, pub(crate) A, pub(crate) PhantomData<(U, E)>, ); impl Clone for Unwrapped { fn clone(&self) -> Self { Unwrapped(self.0, self.1.clone(), PhantomData) } } impl Copy for Unwrapped {} impl, Error = E>, U: fmt::Debug, E: Error> Parser for Unwrapped { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] let (errors, res) = debugger.invoke(&self.1, stream); ( errors, res.map(|(out, alt)| { ( out.unwrap_or_else(|err| { panic!( "Parser defined at {} failed to unwrap. Error: {:?}", self.0, err ) }), alt, ) }), ) } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } #[cfg(test)] mod tests { use alloc::vec; use error::Simple; use text::TextParser; use super::*; #[test] fn delimited_by_complex() { let parser = just::<_, _, Simple>('-') .delimited_by(text::ident().padded(), text::int(10).padded()) .separated_by(just(',')); assert_eq!( parser.parse("one - 1,two - 2,three - 3"), Ok(vec!['-', '-', '-']) ); } #[test] fn separated_by_at_least() { let parser = just::<_, _, Simple>('-') .separated_by(just(',')) .at_least(3); assert_eq!(parser.parse("-,-,-"), Ok(vec!['-', '-', '-'])); } #[test] fn separated_by_at_least_without_leading() { let parser = just::<_, _, Simple>('-') .separated_by(just(',')) .at_least(3); assert!(parser.parse(",-,-,-").is_err()); } #[test] fn separated_by_at_least_without_trailing() { let parser = just::<_, _, Simple>('-') .separated_by(just(',')) .at_least(3) .then(end()); assert!(parser.parse("-,-,-,").is_err()); } #[test] fn separated_by_at_least_with_leading() { let parser = just::<_, _, Simple>('-') .separated_by(just(',')) .allow_leading() .at_least(3); assert_eq!(parser.parse(",-,-,-"), Ok(vec!['-', '-', '-'])); assert!(parser.parse(",-,-").is_err()); } #[test] fn separated_by_at_least_with_trailing() { let parser = just::<_, _, Simple>('-') .separated_by(just(',')) .allow_trailing() .at_least(3); assert_eq!(parser.parse("-,-,-,"), Ok(vec!['-', '-', '-'])); assert!(parser.parse("-,-,").is_err()); } #[test] fn separated_by_leaves_last_separator() { let parser = just::<_, _, Simple>('-') .separated_by(just(',')) .chain(just(',')); assert_eq!(parser.parse("-,-,-,"), Ok(vec!['-', '-', '-', ','])) } } chumsky-0.9.3/src/debug.rs000064400000000000000000000100061046102023000135400ustar 00000000000000//! Utilities for debugging parsers. //! //! *“He was staring at the instruments with the air of one who is trying to convert Fahrenheit to centigrade in his //! head while his house is burning down.”* use super::*; use alloc::borrow::Cow; use core::panic::Location; /// Information about a specific parser. #[allow(dead_code)] pub struct ParserInfo { name: Cow<'static, str>, display: Rc, location: Location<'static>, } impl ParserInfo { pub(crate) fn new( name: impl Into>, display: Rc, location: Location<'static>, ) -> Self { Self { name: name.into(), display, location, } } } /// An event that occurred during parsing. pub enum ParseEvent { /// Debugging information was emitted. Info(String), } /// A trait implemented by parser debuggers. #[deprecated( note = "This trait is excluded from the semver guarantees of chumsky. If you decide to use it, broken builds are your fault." )] pub trait Debugger { /// Create a new debugging scope. fn scope ParserInfo, F: FnOnce(&mut Self) -> R>( &mut self, info: Info, f: F, ) -> R; /// Emit a parse event, if the debugger supports them. fn emit_with ParseEvent>(&mut self, f: F); /// Invoke the given parser with a mode specific to this debugger. fn invoke + ?Sized>( &mut self, parser: &P, stream: &mut StreamOf, ) -> PResult; } /// A verbose debugger that emits debugging messages to the console. pub struct Verbose { // TODO: Don't use `Result`, that's silly events: Vec>, } impl Verbose { pub(crate) fn new() -> Self { Self { events: Vec::new() } } #[allow(unused_variables)] fn print_inner(&self, depth: usize) { // a no-op on no_std! #[cfg(feature = "std")] for event in &self.events { for _ in 0..depth * 4 { print!(" "); } match event { Ok(ParseEvent::Info(s)) => println!("{}", s), Err((info, scope)) => { println!( "Entered {} at line {} in {}", info.display, info.location.line(), info.location.file() ); scope.print_inner(depth + 1); } } } } pub(crate) fn print(&self) { self.print_inner(0) } } impl Debugger for Verbose { fn scope ParserInfo, F: FnOnce(&mut Self) -> R>( &mut self, info: Info, f: F, ) -> R { let mut verbose = Verbose { events: Vec::new() }; let res = f(&mut verbose); self.events.push(Err((info(), verbose))); res } fn emit_with ParseEvent>(&mut self, f: F) { self.events.push(Ok(f())); } fn invoke + ?Sized>( &mut self, parser: &P, stream: &mut StreamOf, ) -> PResult { parser.parse_inner_verbose(self, stream) } } /// A silent debugger that emits no debugging messages nor collects any debugging data. pub struct Silent { phantom: PhantomData<()>, } impl Silent { pub(crate) fn new() -> Self { Self { phantom: PhantomData, } } } impl Debugger for Silent { fn scope ParserInfo, F: FnOnce(&mut Self) -> R>( &mut self, _: Info, f: F, ) -> R { f(self) } fn emit_with ParseEvent>(&mut self, _: F) {} fn invoke + ?Sized>( &mut self, parser: &P, stream: &mut StreamOf, ) -> PResult { parser.parse_inner_silent(self, stream) } } chumsky-0.9.3/src/error.rs000064400000000000000000000401571046102023000136150ustar 00000000000000//! Error types, traits and utilities. //! //! *“I like the cover," he said. "Don't Panic. It's the first helpful or intelligible thing anybody's said to me all //! day.”* //! //! You can implement the [`Error`] trait to create your own parser errors, or you can use one provided by the crate //! like [`Simple`] or [`Cheap`]. use super::*; use alloc::{format, string::ToString}; use core::hash::Hash; #[cfg(not(feature = "std"))] use hashbrown::HashSet; #[cfg(feature = "std")] use std::collections::HashSet; // (ahash + std) => ahash // (ahash) => ahash // (std) => std // () => ahash #[cfg(any(feature = "ahash", not(feature = "std")))] type RandomState = hashbrown::hash_map::DefaultHashBuilder; #[cfg(all(not(feature = "ahash"), feature = "std"))] type RandomState = std::collections::hash_map::RandomState; /// A trait that describes parser error types. /// /// If you have a custom error type in your compiler, or your needs are not sufficiently met by [`Simple`], you should /// implement this trait. If your error type has 'extra' features that allow for more specific error messages, you can /// use the [`Parser::map_err`] or [`Parser::try_map`] functions to take advantage of these inline within your parser. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// type Span = std::ops::Range; /// /// // A custom error type /// #[derive(Debug, PartialEq)] /// enum MyError { /// ExpectedFound(Span, Vec>, Option), /// NotADigit(Span, char), /// } /// /// impl chumsky::Error for MyError { /// type Span = Span; /// type Label = (); /// /// fn expected_input_found>>( /// span: Span, /// expected: Iter, /// found: Option, /// ) -> Self { /// Self::ExpectedFound(span, expected.into_iter().collect(), found) /// } /// /// fn with_label(mut self, label: Self::Label) -> Self { self } /// /// fn merge(mut self, mut other: Self) -> Self { /// if let (Self::ExpectedFound(_, expected, _), Self::ExpectedFound(_, expected_other, _)) = ( /// &mut self, /// &mut other, /// ) { /// expected.append(expected_other); /// } /// self /// } /// } /// /// let numeral = filter_map(|span, c: char| match c.to_digit(10) { /// Some(x) => Ok(x), /// None => Err(MyError::NotADigit(span, c)), /// }); /// /// assert_eq!(numeral.parse("3"), Ok(3)); /// assert_eq!(numeral.parse("7"), Ok(7)); /// assert_eq!(numeral.parse("f"), Err(vec![MyError::NotADigit(0..1, 'f')])); /// ``` pub trait Error: Sized { /// The type of spans to be used in the error. type Span: Span; // TODO: Default to = Range; /// The label used to describe a syntactic structure currently being parsed. /// /// This can be used to generate errors that tell the user what syntactic structure was currently being parsed when /// the error occurred. type Label; // TODO: Default to = &'static str; /// Create a new error describing a conflict between expected inputs and that which was actually found. /// /// `found` having the value `None` indicates that the end of input was reached, but was not expected. /// /// An expected input having the value `None` indicates that the end of input was expected. fn expected_input_found>>( span: Self::Span, expected: Iter, found: Option, ) -> Self; /// Create a new error describing a delimiter that was not correctly closed. /// /// Provided to this function is the span of the unclosed delimiter, the delimiter itself, the span of the input /// that was found in its place, the closing delimiter that was expected but not found, and the input that was /// found in its place. /// /// The default implementation of this function uses [`Error::expected_input_found`], but you'll probably want to /// implement it yourself to take full advantage of the extra diagnostic information. fn unclosed_delimiter( unclosed_span: Self::Span, unclosed: I, span: Self::Span, expected: I, found: Option, ) -> Self { #![allow(unused_variables)] Self::expected_input_found(span, Some(Some(expected)), found) } /// Indicate that the error occurred while parsing a particular syntactic structure. /// /// How the error handles this information is up to it. It can append it to a list of structures to get a sort of /// 'parse backtrace', or it can just keep only the most recent label. If the latter, this method should have no /// effect when the error already has a label. fn with_label(self, label: Self::Label) -> Self; /// Merge two errors that point to the same input together, combining their information. fn merge(self, other: Self) -> Self; } // /// A simple default input pattern that allows describing inputs and input patterns in error messages. // #[derive(Clone, Debug, PartialEq, Eq, Hash)] // pub enum SimplePattern { // /// A pattern with the given name was expected. // Labelled(&'static str), // /// A specific input was expected. // Token(I), // } // impl From<&'static str> for SimplePattern { // fn from(s: &'static str) -> Self { Self::Labelled(s) } // } // impl fmt::Display for SimplePattern { // fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { // match self { // Self::Labelled(s) => write!(f, "{}", s), // Self::Token(x) => write!(f, "'{}'", x), // } // } // } /// A type representing possible reasons for an error. #[derive(Clone, Debug, PartialEq, Eq)] pub enum SimpleReason { /// An unexpected input was found. Unexpected, /// An unclosed delimiter was found. Unclosed { /// The span of the unclosed delimiter. span: S, /// The unclosed delimiter. delimiter: I, }, /// An error with a custom message occurred. Custom(String), } impl fmt::Display for SimpleReason { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { const DEFAULT_DISPLAY_UNEXPECTED: &str = "unexpected input"; match self { Self::Unexpected => write!(f, "{}", DEFAULT_DISPLAY_UNEXPECTED), Self::Unclosed { span, delimiter } => { write!(f, "unclosed delimiter ({}) in {}", span, delimiter) } Self::Custom(string) => write!(f, "error {}", string), } } } /// A type representing zero, one, or many labels applied to an error #[derive(Clone, Copy, Debug, PartialEq)] enum SimpleLabel { Some(&'static str), None, Multi, } impl SimpleLabel { fn merge(self, other: Self) -> Self { match (self, other) { (SimpleLabel::Some(a), SimpleLabel::Some(b)) if a == b => SimpleLabel::Some(a), (SimpleLabel::Some(_), SimpleLabel::Some(_)) => SimpleLabel::Multi, (SimpleLabel::Multi, _) => SimpleLabel::Multi, (_, SimpleLabel::Multi) => SimpleLabel::Multi, (SimpleLabel::None, x) => x, (x, SimpleLabel::None) => x, } } } impl From for Option<&'static str> { fn from(label: SimpleLabel) -> Self { match label { SimpleLabel::Some(s) => Some(s), _ => None, } } } /// A simple default error type that tracks error spans, expected inputs, and the actual input found at an error site. /// /// Please note that it uses a [`HashSet`] to remember expected symbols. If you find this to be too slow, you can /// implement [`Error`] for your own error type or use [`Cheap`] instead. #[derive(Clone, Debug)] pub struct Simple> { span: S, reason: SimpleReason, expected: HashSet, RandomState>, found: Option, label: SimpleLabel, } impl Simple { /// Create an error with a custom error message. pub fn custom(span: S, msg: M) -> Self { Self { span, reason: SimpleReason::Custom(msg.to_string()), expected: HashSet::default(), found: None, label: SimpleLabel::None, } } /// Returns the span that the error occurred at. pub fn span(&self) -> S { self.span.clone() } /// Returns an iterator over possible expected patterns. pub fn expected(&self) -> impl ExactSizeIterator> + '_ { self.expected.iter() } /// Returns the input, if any, that was found instead of an expected pattern. pub fn found(&self) -> Option<&I> { self.found.as_ref() } /// Returns the reason for the error. pub fn reason(&self) -> &SimpleReason { &self.reason } /// Returns the error's label, if any. pub fn label(&self) -> Option<&'static str> { self.label.into() } /// Map the error's inputs using the given function. /// /// This can be used to unify the errors between parsing stages that operate upon two forms of input (for example, /// the initial lexing stage and the parsing stage in most compilers). pub fn map U>(self, mut f: F) -> Simple { Simple { span: self.span, reason: match self.reason { SimpleReason::Unclosed { span, delimiter } => SimpleReason::Unclosed { span, delimiter: f(delimiter), }, SimpleReason::Unexpected => SimpleReason::Unexpected, SimpleReason::Custom(msg) => SimpleReason::Custom(msg), }, expected: self.expected.into_iter().map(|e| e.map(&mut f)).collect(), found: self.found.map(f), label: self.label, } } } impl Error for Simple { type Span = S; type Label = &'static str; fn expected_input_found>>( span: Self::Span, expected: Iter, found: Option, ) -> Self { Self { span, reason: SimpleReason::Unexpected, expected: expected.into_iter().collect(), found, label: SimpleLabel::None, } } fn unclosed_delimiter( unclosed_span: Self::Span, delimiter: I, span: Self::Span, expected: I, found: Option, ) -> Self { Self { span, reason: SimpleReason::Unclosed { span: unclosed_span, delimiter, }, expected: core::iter::once(Some(expected)).collect(), found, label: SimpleLabel::None, } } fn with_label(mut self, label: Self::Label) -> Self { match self.label { SimpleLabel::Some(_) => {} _ => { self.label = SimpleLabel::Some(label); } } self } fn merge(mut self, other: Self) -> Self { // TODO: Assert that `self.span == other.span` here? self.reason = match (&self.reason, &other.reason) { (SimpleReason::Unclosed { .. }, _) => self.reason, (_, SimpleReason::Unclosed { .. }) => other.reason, _ => self.reason, }; self.label = self.label.merge(other.label); for expected in other.expected { self.expected.insert(expected); } self } } impl PartialEq for Simple { fn eq(&self, other: &Self) -> bool { self.span == other.span && self.found == other.found && self.reason == other.reason && self.label == other.label } } impl Eq for Simple {} impl fmt::Display for Simple { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { // TODO: Take `self.reason` into account if let Some(found) = &self.found { write!(f, "found {:?}", found.to_string())?; } else { write!(f, "found end of input")?; }; match self.expected.len() { 0 => {} //write!(f, " but end of input was expected")?, 1 => write!( f, " but expected {}", match self.expected.iter().next().unwrap() { Some(x) => format!("{:?}", x.to_string()), None => "end of input".to_string(), }, )?, _ => { write!( f, " but expected one of {}", self.expected .iter() .map(|expected| match expected { Some(x) => format!("{:?}", x.to_string()), None => "end of input".to_string(), }) .collect::>() .join(", ") )?; } } Ok(()) } } #[cfg(feature = "std")] impl std::error::Error for Simple { } /// A minimal error type that tracks only the error span and label. This type is most useful when you want fast parsing /// but do not particularly care about the quality of error messages. #[derive(Clone, Debug, PartialEq, Eq)] pub struct Cheap> { span: S, label: Option<&'static str>, phantom: PhantomData, } impl Cheap { /// Returns the span that the error occurred at. pub fn span(&self) -> S { self.span.clone() } /// Returns the error's label, if any. pub fn label(&self) -> Option<&'static str> { self.label } } impl Error for Cheap { type Span = S; type Label = &'static str; fn expected_input_found>>( span: Self::Span, _: Iter, _: Option, ) -> Self { Self { span, label: None, phantom: PhantomData, } } fn with_label(mut self, label: Self::Label) -> Self { self.label.get_or_insert(label); self } fn merge(self, _: Self) -> Self { self } } /// An internal type used to facilitate error prioritisation. You shouldn't need to interact with this type during /// normal use of the crate. pub struct Located { pub(crate) at: usize, pub(crate) error: E, pub(crate) phantom: PhantomData, } impl> Located { /// Create a new [`Located`] with the give input position and error. pub fn at(at: usize, error: E) -> Self { Self { at, error, phantom: PhantomData, } } /// Get the maximum of two located errors. If they hold the same position in the input, merge them. pub fn max(self, other: impl Into>) -> Self { let other = match other.into() { Some(other) => other, None => return self, }; match self.at.cmp(&other.at) { Ordering::Greater => self, Ordering::Less => other, Ordering::Equal => Self { error: self.error.merge(other.error), ..self }, } } /// Map the error with the given function. pub fn map U>(self, f: F) -> Located { Located { at: self.at, error: f(self.error), phantom: PhantomData, } } } // Merge two alternative errors pub(crate) fn merge_alts, T: IntoIterator>>( mut error: Option>, errors: T, ) -> Option> { for other in errors { match (error, other) { (Some(a), b) => { error = Some(b.max(a)); } (None, b) => { error = Some(b); } } } error } chumsky-0.9.3/src/lib.rs000064400000000000000000001501341046102023000132270ustar 00000000000000#![cfg_attr(feature = "nightly", feature(rustc_attrs))] #![cfg_attr(not(any(doc, feature = "std")), no_std)] #![doc = include_str!("../README.md")] #![deny(missing_docs)] #![allow(deprecated)] // TODO: Don't allow this extern crate alloc; pub mod chain; pub mod combinator; pub mod debug; pub mod error; pub mod primitive; pub mod recovery; pub mod recursive; pub mod span; pub mod stream; pub mod text; pub use crate::{error::Error, span::Span}; pub use crate::stream::{BoxStream, Flat, Stream}; use crate::{ chain::Chain, combinator::*, debug::*, error::{merge_alts, Located}, primitive::*, recovery::*, }; use alloc::{boxed::Box, rc::Rc, string::String, sync::Arc, vec, vec::Vec}; use core::{ cmp::Ordering, // TODO: Enable when stable //lazy::OnceCell, fmt, marker::PhantomData, ops::Range, panic::Location, str::FromStr, }; #[cfg(doc)] use std::{ collections::HashMap, // TODO: Remove when switching to 2021 edition iter::FromIterator, }; /// Commonly used functions, traits and types. /// /// *Listen, three eyes,” he said, “don’t you try to outweird me, I get stranger things than you free with my breakfast /// cereal.”* pub mod prelude { pub use super::{ error::{Error as _, Simple}, primitive::{ any, choice, empty, end, filter, filter_map, just, none_of, one_of, seq, take_until, todo, }, recovery::{nested_delimiters, skip_parser, skip_then_retry_until, skip_until}, recursive::{recursive, Recursive}, select, span::Span as _, text, text::TextParser as _, BoxedParser, Parser, }; } // TODO: Replace with `std::ops::ControlFlow` when stable enum ControlFlow { Continue(C), Break(B), } // ([], Ok((out, alt_err))) => parsing successful, // alt_err = potential alternative error should a different number of optional patterns be parsed // ([x, ...], Ok((out, alt_err)) => parsing failed, but recovery occurred so parsing may continue // ([...], Err(err)) => parsing failed, recovery failed, and one or more errors were produced // TODO: Change `alt_err` from `Option>` to `Vec>` type PResult = ( Vec>, Result<(O, Option>), Located>, ); // Shorthand for a stream with the given input and error type. type StreamOf<'a, I, E> = Stream<'a, I, >::Span>; // [`Parser::parse_recovery`], but generic across the debugger. fn parse_recovery_inner< 'a, I: Clone, O, P: Parser, D: Debugger, Iter: Iterator>::Span)> + 'a, S: Into>::Span, Iter>>, >( parser: &P, debugger: &mut D, stream: S, ) -> (Option, Vec) where P: Sized, { #[allow(deprecated)] let (mut errors, res) = parser.parse_inner(debugger, &mut stream.into()); let out = match res { Ok((out, _)) => Some(out), Err(err) => { errors.push(err); None } }; (out, errors.into_iter().map(|e| e.error).collect()) } /// A trait implemented by parsers. /// /// Parsers take a stream of tokens of type `I` and attempt to parse them into a value of type `O`. In doing so, they /// may encounter errors. These need not be fatal to the parsing process: syntactic errors can be recovered from and a /// valid output may still be generated alongside any syntax errors that were encountered along the way. Usually, this /// output comes in the form of an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (AST). /// /// You should not need to implement this trait by hand. If you cannot combine existing combintors (and in particular /// [`custom`]) to create the combinator you're looking for, please /// [open an issue](https://github.com/zesterer/chumsky/issues/new)! If you *really* need to implement this trait, /// please check the documentation in the source: some implementation details have been deliberately hidden. #[cfg_attr( feature = "nightly", rustc_on_unimplemented( message = "`{Self}` is not a parser from `{I}` to `{O}`", label = "This parser is not compatible because it does not implement `Parser<{I}, {O}>`", note = "You should check that the output types of your parsers are consistent with combinator you're using", ) )] #[allow(clippy::type_complexity)] pub trait Parser { /// The type of errors emitted by this parser. type Error: Error; // TODO when default associated types are stable: = Cheap; /// Parse a stream with all the bells & whistles. You can use this to implement your own parser combinators. Note /// that both the signature and semantic requirements of this function are very likely to change in later versions. /// Where possible, prefer more ergonomic combinators provided elsewhere in the crate rather than implementing your /// own. For example, [`custom`] provides a flexible, ergonomic way API for process input streams that likely /// covers your use-case. #[doc(hidden)] #[deprecated( note = "This method is excluded from the semver guarantees of chumsky. If you decide to use it, broken builds are your fault." )] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult where Self: Sized; /// [`Parser::parse_inner`], but specialised for verbose output. Do not call this method directly. /// /// If you *really* need to implement this trait, this method should just directly invoke [`Parser::parse_inner`]. #[doc(hidden)] #[deprecated( note = "This method is excluded from the semver guarantees of chumsky. If you decide to use it, broken builds are your fault." )] fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult; /// [`Parser::parse_inner`], but specialised for silent output. Do not call this method directly. /// /// If you *really* need to implement this trait, this method should just directly invoke [`Parser::parse_inner`]. #[doc(hidden)] #[deprecated( note = "This method is excluded from the semver guarantees of chumsky. If you decide to use it, broken builds are your fault." )] fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult; /// Parse a stream of tokens, yielding an output if possible, and any errors encountered along the way. /// /// If `None` is returned (i.e: parsing failed) then there will *always* be at least one item in the error `Vec`. /// If you don't care about producing an output if errors are encountered, use [`Parser::parse`] instead. /// /// Although the signature of this function looks complicated, it's simpler than you think! You can pass a /// `&[I]`, a [`&str`], or a [`Stream`] to it. fn parse_recovery<'a, Iter, S>(&self, stream: S) -> (Option, Vec) where Self: Sized, Iter: Iterator>::Span)> + 'a, S: Into>::Span, Iter>>, { parse_recovery_inner(self, &mut Silent::new(), stream) } /// Parse a stream of tokens, yielding an output if possible, and any errors encountered along the way. Unlike /// [`Parser::parse_recovery`], this function will produce verbose debugging output as it executes. /// /// If `None` is returned (i.e: parsing failed) then there will *always* be at least one item in the error `Vec`. /// If you don't care about producing an output if errors are encountered, use `Parser::parse` instead. /// /// Although the signature of this function looks complicated, it's simpler than you think! You can pass a /// `&[I]`, a [`&str`], or a [`Stream`] to it. /// /// You'll probably want to make sure that this doesn't end up in production code: it exists only to help you debug /// your parser. Additionally, its API is quite likely to change in future versions. /// /// This method will receive more extensive documentation as the crate's debugging features mature. fn parse_recovery_verbose<'a, Iter, S>(&self, stream: S) -> (Option, Vec) where Self: Sized, Iter: Iterator>::Span)> + 'a, S: Into>::Span, Iter>>, { let mut debugger = Verbose::new(); let res = parse_recovery_inner(self, &mut debugger, stream); debugger.print(); res } /// Parse a stream of tokens, yielding an output *or* any errors that were encountered along the way. /// /// If you wish to attempt to produce an output even if errors are encountered, use [`Parser::parse_recovery`]. /// /// Although the signature of this function looks complicated, it's simpler than you think! You can pass a /// [`&[I]`], a [`&str`], or a [`Stream`] to it. fn parse<'a, Iter, S>(&self, stream: S) -> Result> where Self: Sized, Iter: Iterator>::Span)> + 'a, S: Into>::Span, Iter>>, { let (output, errors) = self.parse_recovery(stream); if errors.is_empty() { Ok(output.expect( "Parsing failed, but no errors were emitted. This is troubling, to say the least.", )) } else { Err(errors) } } /// Include this parser in the debugging output produced by [`Parser::parse_recovery_verbose`]. /// /// You'll probably want to make sure that this doesn't end up in production code: it exists only to help you debug /// your parser. Additionally, its API is quite likely to change in future versions. /// Use this parser like a print statement, to display whatever you pass as the argument 'x' /// /// This method will receive more extensive documentation as the crate's debugging features mature. #[track_caller] fn debug(self, x: T) -> Debug where Self: Sized, T: fmt::Display + 'static, { Debug(self, Rc::new(x), *core::panic::Location::caller()) } /// Map the output of this parser to another value. /// /// The output type of this parser is `U`, the same as the function's output. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// #[derive(Debug, PartialEq)] /// enum Token { Word(String), Num(u64) } /// /// let word = filter::<_, _, Cheap>(|c: &char| c.is_alphabetic()) /// .repeated().at_least(1) /// .collect::() /// .map(Token::Word); /// /// let num = filter::<_, _, Cheap>(|c: &char| c.is_ascii_digit()) /// .repeated().at_least(1) /// .collect::() /// .map(|s| Token::Num(s.parse().unwrap())); /// /// let token = word.or(num); /// /// assert_eq!(token.parse("test"), Ok(Token::Word("test".to_string()))); /// assert_eq!(token.parse("42"), Ok(Token::Num(42))); /// ``` fn map(self, f: F) -> Map where Self: Sized, F: Fn(O) -> U, { Map(self, f, PhantomData) } /// Map the output of this parser to another value, making use of the pattern's span when doing so. /// /// This is very useful when generating an AST that attaches a span to each AST node. /// /// The output type of this parser is `U`, the same as the function's output. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// use std::ops::Range; /// /// // It's common for AST nodes to use a wrapper type that allows attaching span information to them /// #[derive(Debug, PartialEq)] /// pub struct Spanned(T, Range); /// /// let ident = text::ident::<_, Simple>() /// .map_with_span(|ident, span| Spanned(ident, span)) /// .padded(); /// /// assert_eq!(ident.parse("hello"), Ok(Spanned("hello".to_string(), 0..5))); /// assert_eq!(ident.parse(" hello "), Ok(Spanned("hello".to_string(), 7..12))); /// ``` fn map_with_span(self, f: F) -> MapWithSpan where Self: Sized, F: Fn(O, >::Span) -> U, { MapWithSpan(self, f, PhantomData) } /// Map the primary error of this parser to another value. /// /// This function is most useful when using a custom error type, allowing you to augment errors according to /// context. /// /// The output type of this parser is `O`, the same as the original parser. // TODO: Map E -> D, not E -> E fn map_err(self, f: F) -> MapErr where Self: Sized, F: Fn(Self::Error) -> Self::Error, { MapErr(self, f) } /// Map the primary error of this parser to a result. If the result is [`Ok`], the parser succeeds with that value. /// /// Note that even if the function returns an [`Ok`], the input stream will still be 'stuck' at the input following /// the input that triggered the error. You'll need to follow uses of this combinator with a parser that resets /// the input stream to a known-good state (for example, [`take_until`]). /// /// The output type of this parser is `U`, the [`Ok`] type of the result. fn or_else(self, f: F) -> OrElse where Self: Sized, F: Fn(Self::Error) -> Result, { OrElse(self, f) } /// Map the primary error of this parser to another value, making use of the span from the start of the attempted /// to the point at which the error was encountered. /// /// This function is useful for augmenting errors to allow them to display the span of the initial part of a /// pattern, for example to add a "while parsing" clause to your error messages. /// /// The output type of this parser is `O`, the same as the original parser. /// // TODO: Map E -> D, not E -> E fn map_err_with_span(self, f: F) -> MapErrWithSpan where Self: Sized, F: Fn(Self::Error, >::Span) -> Self::Error, { MapErrWithSpan(self, f) } /// After a successful parse, apply a fallible function to the output. If the function produces an error, treat it /// as a parsing error. /// /// If you wish parsing of this pattern to continue when an error is generated instead of halting, consider using /// [`Parser::validate`] instead. /// /// The output type of this parser is `U`, the [`Ok`] return value of the function. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let byte = text::int::<_, Simple>(10) /// .try_map(|s, span| s /// .parse::() /// .map_err(|e| Simple::custom(span, format!("{}", e)))); /// /// assert!(byte.parse("255").is_ok()); /// assert!(byte.parse("256").is_err()); // Out of range /// ``` fn try_map(self, f: F) -> TryMap where Self: Sized, F: Fn(O, >::Span) -> Result, { TryMap(self, f, PhantomData) } /// Validate an output, producing non-terminal errors if it does not fulfil certain criteria. /// /// This function also permits mapping the output to a value of another type, similar to [`Parser::map`]. /// /// If you wish parsing of this pattern to halt when an error is generated instead of continuing, consider using /// [`Parser::try_map`] instead. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let large_int = text::int::(10) /// .from_str() /// .unwrapped() /// .validate(|x: u32, span, emit| { /// if x < 256 { emit(Simple::custom(span, format!("{} must be 256 or higher.", x))) } /// x /// }); /// /// assert_eq!(large_int.parse("537"), Ok(537)); /// assert!(large_int.parse("243").is_err()); /// ``` fn validate(self, f: F) -> Validate where Self: Sized, F: Fn(O, >::Span, &mut dyn FnMut(Self::Error)) -> U, { Validate(self, f, PhantomData) } /// Label the pattern parsed by this parser for more useful error messages. /// /// This is useful when you want to give users a more useful description of an expected pattern than simply a list /// of possible initial tokens. For example, it's common to use the term "expression" at a catch-all for a number /// of different constructs in many languages. /// /// This does not label recovered errors generated by sub-patterns within the parser, only error *directly* emitted /// by the parser. /// /// This does not label errors where the labelled pattern consumed input (i.e: in unambiguous cases). /// /// The output type of this parser is `O`, the same as the original parser. /// /// *Note: There is a chance that this method will be deprecated in favour of a more general solution in later /// versions of the crate.* /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let frac = text::digits(10) /// .chain(just('.')) /// .chain::(text::digits(10)) /// .collect::() /// .then_ignore(end()) /// .labelled("number"); /// /// assert_eq!(frac.parse("42.3"), Ok("42.3".to_string())); /// assert_eq!(frac.parse("hello"), Err(vec![Cheap::expected_input_found(0..1, Vec::new(), Some('h')).with_label("number")])); /// assert_eq!(frac.parse("42!"), Err(vec![Cheap::expected_input_found(2..3, vec![Some('.')], Some('!')).with_label("number")])); /// ``` fn labelled(self, label: L) -> Label where Self: Sized, L: Into<>::Label> + Clone, { Label(self, label) } /// Transform all outputs of this parser to a pretermined value. /// /// The output type of this parser is `U`, the type of the predetermined value. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// #[derive(Clone, Debug, PartialEq)] /// enum Op { Add, Sub, Mul, Div } /// /// let op = just::<_, _, Cheap>('+').to(Op::Add) /// .or(just('-').to(Op::Sub)) /// .or(just('*').to(Op::Mul)) /// .or(just('/').to(Op::Div)); /// /// assert_eq!(op.parse("+"), Ok(Op::Add)); /// assert_eq!(op.parse("/"), Ok(Op::Div)); /// ``` fn to(self, x: U) -> To where Self: Sized, U: Clone, { To(self, x, PhantomData) } /// Left-fold the output of the parser into a single value. /// /// The output of the original parser must be of type `(A, impl IntoIterator)`. /// /// The output type of this parser is `A`, the left-hand component of the original parser's output. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let int = text::int::>(10) /// .from_str() /// .unwrapped(); /// /// let sum = int /// .then(just('+').ignore_then(int).repeated()) /// .foldl(|a, b| a + b); /// /// assert_eq!(sum.parse("1+12+3+9"), Ok(25)); /// assert_eq!(sum.parse("6"), Ok(6)); /// ``` fn foldl(self, f: F) -> Foldl where Self: Parser + Sized, B: IntoIterator, F: Fn(A, B::Item) -> A, { Foldl(self, f, PhantomData) } /// Right-fold the output of the parser into a single value. /// /// The output of the original parser must be of type `(impl IntoIterator, B)`. Because right-folds work /// backwards, the iterator must implement [`DoubleEndedIterator`] so that it can be reversed. /// /// The output type of this parser is `B`, the right-hand component of the original parser's output. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let int = text::int::>(10) /// .from_str() /// .unwrapped(); /// /// let signed = just('+').to(1) /// .or(just('-').to(-1)) /// .repeated() /// .then(int) /// .foldr(|a, b| a * b); /// /// assert_eq!(signed.parse("3"), Ok(3)); /// assert_eq!(signed.parse("-17"), Ok(-17)); /// assert_eq!(signed.parse("--+-+-5"), Ok(5)); /// ``` fn foldr<'a, A, B, F>(self, f: F) -> Foldr where Self: Parser + Sized, A: IntoIterator, A::IntoIter: DoubleEndedIterator, F: Fn(A::Item, B) -> B + 'a, { Foldr(self, f, PhantomData) } /// Ignore the output of this parser, yielding `()` as an output instead. /// /// This can be used to reduce the cost of parsing by avoiding unnecessary allocations (most collections containing /// [ZSTs](https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts) /// [do not allocate](https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees)). For example, it's common to /// want to ignore whitespace in many grammars (see [`text::whitespace`]). /// /// The output type of this parser is `()`. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// // A parser that parses any number of whitespace characters without allocating /// let whitespace = filter::<_, _, Cheap>(|c: &char| c.is_whitespace()) /// .ignored() /// .repeated(); /// /// assert_eq!(whitespace.parse(" "), Ok(vec![(); 4])); /// assert_eq!(whitespace.parse(" hello"), Ok(vec![(); 2])); /// ``` fn ignored(self) -> Ignored where Self: Sized, { To(self, (), PhantomData) } /// Collect the output of this parser into a type implementing [`FromIterator`]. /// /// This is commonly useful for collecting [`Vec`] outputs into [`String`]s, or [`(T, U)`] into a /// [`HashMap`] and is analogous to [`Iterator::collect`]. /// /// The output type of this parser is `C`, the type being collected into. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let word = filter::<_, _, Cheap>(|c: &char| c.is_alphabetic()) // This parser produces an output of `char` /// .repeated() // This parser produces an output of `Vec` /// .collect::(); // But `Vec` is less useful than `String`, so convert to the latter /// /// assert_eq!(word.parse("hello"), Ok("hello".to_string())); /// ``` // TODO: Make `Parser::repeated` generic over an `impl FromIterator` to reduce required allocations fn collect(self) -> Map C, O> where Self: Sized, O: IntoIterator, C: core::iter::FromIterator, { self.map(|items| C::from_iter(items.into_iter())) } /// Parse one thing and then another thing, yielding a tuple of the two outputs. /// /// The output type of this parser is `(O, U)`, a combination of the outputs of both parsers. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let word = filter::<_, _, Cheap>(|c: &char| c.is_alphabetic()) /// .repeated().at_least(1) /// .collect::(); /// let two_words = word.then_ignore(just(' ')).then(word); /// /// assert_eq!(two_words.parse("dog cat"), Ok(("dog".to_string(), "cat".to_string()))); /// assert!(two_words.parse("hedgehog").is_err()); /// ``` fn then(self, other: P) -> Then where Self: Sized, P: Parser, { Then(self, other) } /// Parse one thing and then another thing, creating the second parser from the result of /// the first. If you only have a couple cases to handle, prefer [`Parser::or`]. /// /// The output of this parser is `U`, the result of the second parser /// /// Error recovery for this parser may be sub-optimal, as if the first parser succeeds on /// recovery then the second produces an error, the primary error will point to the location in /// the second parser which failed, ignoring that the first parser may be the root cause. There /// may be other pathological errors cases as well. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// // A parser that parses a single letter and then its successor /// let successive_letters = one_of::<_, _, Cheap>((b'a'..=b'z').collect::>()) /// .then_with(|letter: u8| just(letter + 1)); /// /// assert_eq!(successive_letters.parse(*b"ab"), Ok(b'b')); // 'b' follows 'a' /// assert!(successive_letters.parse(*b"ac").is_err()); // 'c' does not follow 'a' /// ``` fn then_with P>(self, other: F) -> ThenWith where Self: Sized, P: Parser, { ThenWith(self, other, PhantomData) } /// Parse one thing and then another thing, attempting to chain the two outputs into a [`Vec`]. /// /// The output type of this parser is `Vec`, composed of the elements of the outputs of both parsers. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let int = just('-').or_not() /// .chain(filter::<_, _, Cheap>(|c: &char| c.is_ascii_digit() && *c != '0') /// .chain(filter::<_, _, Cheap>(|c: &char| c.is_ascii_digit()).repeated())) /// .or(just('0').map(|c| vec![c])) /// .then_ignore(end()) /// .collect::() /// .from_str() /// .unwrapped(); /// /// assert_eq!(int.parse("0"), Ok(0)); /// assert_eq!(int.parse("415"), Ok(415)); /// assert_eq!(int.parse("-50"), Ok(-50)); /// assert!(int.parse("-0").is_err()); /// assert!(int.parse("05").is_err()); /// ``` fn chain(self, other: P) -> Map, fn((O, U)) -> Vec, (O, U)> where Self: Sized, U: Chain, O: Chain, P: Parser, { self.then(other).map(|(a, b)| { let mut v = Vec::with_capacity(a.len() + b.len()); a.append_to(&mut v); b.append_to(&mut v); v }) } /// Flatten a nested collection. /// /// This use-cases of this method are broadly similar to those of [`Iterator::flatten`]. /// /// The output type of this parser is `Vec`, where the original parser output was /// `impl IntoIterator>`. fn flatten(self) -> Map Vec, O> where Self: Sized, O: IntoIterator, Inner: IntoIterator, { self.map(|xs| xs.into_iter().flat_map(|xs| xs.into_iter()).collect()) } /// Parse one thing and then another thing, yielding only the output of the latter. /// /// The output type of this parser is `U`, the same as the second parser. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let zeroes = filter::<_, _, Cheap>(|c: &char| *c == '0').ignored().repeated(); /// let digits = filter(|c: &char| c.is_ascii_digit()).repeated(); /// let integer = zeroes /// .ignore_then(digits) /// .collect::() /// .from_str() /// .unwrapped(); /// /// assert_eq!(integer.parse("00064"), Ok(64)); /// assert_eq!(integer.parse("32"), Ok(32)); /// ``` fn ignore_then(self, other: P) -> IgnoreThen where Self: Sized, P: Parser, { Map(Then(self, other), |(_, u)| u, PhantomData) } /// Parse one thing and then another thing, yielding only the output of the former. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let word = filter::<_, _, Cheap>(|c: &char| c.is_alphabetic()) /// .repeated().at_least(1) /// .collect::(); /// /// let punctuated = word /// .then_ignore(just('!').or(just('?')).or_not()); /// /// let sentence = punctuated /// .padded() // Allow for whitespace gaps /// .repeated(); /// /// assert_eq!( /// sentence.parse("hello! how are you?"), /// Ok(vec![ /// "hello".to_string(), /// "how".to_string(), /// "are".to_string(), /// "you".to_string(), /// ]), /// ); /// ``` fn then_ignore(self, other: P) -> ThenIgnore where Self: Sized, P: Parser, { Map(Then(self, other), |(o, _)| o, PhantomData) } /// Parse a pattern, but with an instance of another pattern on either end, yielding the output of the inner. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let ident = text::ident::<_, Simple>() /// .padded_by(just('!')); /// /// assert_eq!(ident.parse("!hello!"), Ok("hello".to_string())); /// assert!(ident.parse("hello!").is_err()); /// assert!(ident.parse("!hello").is_err()); /// assert!(ident.parse("hello").is_err()); /// ``` fn padded_by(self, other: P) -> ThenIgnore, P, O, U> where Self: Sized, P: Parser + Clone, { other.clone().ignore_then(self).then_ignore(other) } /// Parse the pattern surrounded by the given delimiters. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// // A LISP-style S-expression /// #[derive(Debug, PartialEq)] /// enum SExpr { /// Ident(String), /// Num(u64), /// List(Vec), /// } /// /// let ident = filter::<_, _, Cheap>(|c: &char| c.is_alphabetic()) /// .repeated().at_least(1) /// .collect::(); /// /// let num = text::int(10) /// .from_str() /// .unwrapped(); /// /// let s_expr = recursive(|s_expr| s_expr /// .padded() /// .repeated() /// .map(SExpr::List) /// .delimited_by(just('('), just(')')) /// .or(ident.map(SExpr::Ident)) /// .or(num.map(SExpr::Num))); /// /// // A valid input /// assert_eq!( /// s_expr.parse_recovery("(add (mul 42 3) 15)"), /// ( /// Some(SExpr::List(vec![ /// SExpr::Ident("add".to_string()), /// SExpr::List(vec![ /// SExpr::Ident("mul".to_string()), /// SExpr::Num(42), /// SExpr::Num(3), /// ]), /// SExpr::Num(15), /// ])), /// Vec::new(), // No errors! /// ), /// ); /// ``` fn delimited_by(self, start: L, end: R) -> DelimitedBy where Self: Sized, L: Parser, R: Parser, { DelimitedBy { item: self, start, end, phantom: PhantomData, } } /// Parse one thing or, on failure, another thing. /// /// The output of both parsers must be of the same type, because either output can be produced. /// /// If both parser succeed, the output of the first parser is guaranteed to be prioritised over the output of the /// second. /// /// If both parsers produce errors, the combinator will attempt to select from or combine the errors to produce an /// error that is most likely to be useful to a human attempting to understand the problem. The exact algorithm /// used is left unspecified, and is not part of the crate's semver guarantees, although regressions in error /// quality should be reported in the issue tracker of the main repository. /// /// Please note that long chains of [`Parser::or`] combinators have been known to result in poor compilation times. /// If you feel you are experiencing this, consider using [`choice`] instead. /// /// The output type of this parser is `O`, the output of both parsers. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let op = just::<_, _, Cheap>('+') /// .or(just('-')) /// .or(just('*')) /// .or(just('/')); /// /// assert_eq!(op.parse("+"), Ok('+')); /// assert_eq!(op.parse("/"), Ok('/')); /// assert!(op.parse("!").is_err()); /// ``` fn or

(self, other: P) -> Or where Self: Sized, P: Parser, { Or(self, other) } /// Apply a fallback recovery strategy to this parser should it fail. /// /// There is no silver bullet for error recovery, so this function allows you to specify one of several different /// strategies at the location of your choice. Prefer an error recovery strategy that more precisely mirrors valid /// syntax where possible to make error recovery more reliable. /// /// Because chumsky is a [PEG](https://en.m.wikipedia.org/wiki/Parsing_expression_grammar) parser, which always /// take the first successful parsing route through a grammar, recovering from an error may cause the parser to /// erroneously miss alternative valid routes through the grammar that do not generate recoverable errors. If you /// run into cases where valid syntax fails to parse without errors, this might be happening: consider removing /// error recovery or switching to a more specific error recovery strategy. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// #[derive(Debug, PartialEq)] /// enum Expr { /// Error, /// Int(String), /// List(Vec), /// } /// /// let expr = recursive::<_, _, _, _, Simple>(|expr| expr /// .separated_by(just(',')) /// .delimited_by(just('['), just(']')) /// .map(Expr::List) /// // If parsing a list expression fails, recover at the next delimiter, generating an error AST node /// .recover_with(nested_delimiters('[', ']', [], |_| Expr::Error)) /// .or(text::int(10).map(Expr::Int)) /// .padded()); /// /// assert!(expr.parse("five").is_err()); // Text is not a valid expression in this language... /// assert!(expr.parse("[1, 2, 3]").is_ok()); // ...but lists and numbers are! /// /// // This input has two syntax errors... /// let (ast, errors) = expr.parse_recovery("[[1, two], [3, four]]"); /// // ...and error recovery allows us to catch both of them! /// assert_eq!(errors.len(), 2); /// // Additionally, the AST we get back still has useful information. /// assert_eq!(ast, Some(Expr::List(vec![Expr::Error, Expr::Error]))); /// ``` fn recover_with(self, strategy: S) -> Recovery where Self: Sized, S: Strategy, { Recovery(self, strategy) } /// Attempt to parse something, but only if it exists. /// /// If parsing of the pattern is successful, the output is `Some(_)`. Otherwise, the output is `None`. /// /// The output type of this parser is `Option`. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let word = filter::<_, _, Cheap>(|c: &char| c.is_alphabetic()) /// .repeated().at_least(1) /// .collect::(); /// /// let word_or_question = word /// .then(just('?').or_not()); /// /// assert_eq!(word_or_question.parse("hello?"), Ok(("hello".to_string(), Some('?')))); /// assert_eq!(word_or_question.parse("wednesday"), Ok(("wednesday".to_string(), None))); /// ``` fn or_not(self) -> OrNot where Self: Sized, { OrNot(self) } /// Parses a single token if, and only if, the pattern fails to parse. /// /// The output type of this parser is `I`. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// /// #[derive(Debug, PartialEq)] /// enum Tree { /// Text(String), /// Group(Vec), /// } /// /// // Arbitrary text, nested in a tree with { ... } delimiters /// let tree = recursive::<_, _, _, _, Cheap>(|tree| { /// let text = one_of("{}") /// .not() /// .repeated() /// .at_least(1) /// .collect() /// .map(Tree::Text); /// /// let group = tree /// .repeated() /// .delimited_by(just('{'), just('}')) /// .map(Tree::Group); /// /// text.or(group) /// }); /// /// assert_eq!( /// tree.parse("{abcd{efg{hijk}lmn{opq}rs}tuvwxyz}"), /// Ok(Tree::Group(vec![ /// Tree::Text("abcd".to_string()), /// Tree::Group(vec![ /// Tree::Text("efg".to_string()), /// Tree::Group(vec![ /// Tree::Text("hijk".to_string()), /// ]), /// Tree::Text("lmn".to_string()), /// Tree::Group(vec![ /// Tree::Text("opq".to_string()), /// ]), /// Tree::Text("rs".to_string()), /// ]), /// Tree::Text("tuvwxyz".to_string()), /// ])), /// ); /// ``` fn not(self) -> Not where Self: Sized, { Not(self, PhantomData) } /// Parse a pattern any number of times (including zero times). /// /// Input is eagerly parsed. Be aware that the parser will accept no occurrences of the pattern too. Consider using /// [`Repeated::at_least`] instead if it better suits your use-case. /// /// The output type of this parser is `Vec`. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let num = filter::<_, _, Cheap>(|c: &char| c.is_ascii_digit()) /// .repeated().at_least(1) /// .collect::() /// .from_str() /// .unwrapped(); /// /// let sum = num.then(just('+').ignore_then(num).repeated()) /// .foldl(|a, b| a + b); /// /// assert_eq!(sum.parse("2+13+4+0+5"), Ok(24)); /// ``` fn repeated(self) -> Repeated where Self: Sized, { Repeated(self, 0, None) } /// Parse a pattern, separated by another, any number of times. /// /// You can use [`SeparatedBy::allow_leading`] or [`SeparatedBy::allow_trailing`] to allow leading or trailing /// separators. /// /// The output type of this parser is `Vec`. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let shopping = text::ident::<_, Simple>() /// .padded() /// .separated_by(just(',')); /// /// assert_eq!(shopping.parse("eggs"), Ok(vec!["eggs".to_string()])); /// assert_eq!(shopping.parse("eggs, flour, milk"), Ok(vec!["eggs".to_string(), "flour".to_string(), "milk".to_string()])); /// ``` /// /// See [`SeparatedBy::allow_leading`] and [`SeparatedBy::allow_trailing`] for more examples. fn separated_by(self, other: P) -> SeparatedBy where Self: Sized, P: Parser, { SeparatedBy { item: self, delimiter: other, at_least: 0, at_most: None, allow_leading: false, allow_trailing: false, phantom: PhantomData, } } /// Parse a pattern. Afterwards, the input stream will be rewound to its original state, as if parsing had not /// occurred. /// /// This combinator is useful for cases in which you wish to avoid a parser accidentally consuming too much input, /// causing later parsers to fail as a result. A typical use-case of this is that you want to parse something that /// is not followed by something else. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let just_numbers = text::digits::<_, Simple>(10) /// .padded() /// .then_ignore(none_of("+-*/").rewind()) /// .separated_by(just(',')); /// // 3 is not parsed because it's followed by '+'. /// assert_eq!(just_numbers.parse("1, 2, 3 + 4"), Ok(vec!["1".to_string(), "2".to_string()])); /// ``` fn rewind(self) -> Rewind where Self: Sized, { Rewind(self) } /// Box the parser, yielding a parser that performs parsing through dynamic dispatch. /// /// Boxing a parser might be useful for: /// /// - Dynamically building up parsers at runtime /// /// - Improving compilation times (Rust can struggle to compile code containing very long types) /// /// - Passing a parser over an FFI boundary /// /// - Getting around compiler implementation problems with long types such as /// [this](https://github.com/rust-lang/rust/issues/54540). /// /// - Places where you need to name the type of a parser /// /// Boxing a parser is broadly equivalent to boxing other combinators via dynamic dispatch, such as [`Iterator`]. /// /// The output type of this parser is `O`, the same as the original parser. fn boxed<'a>(self) -> BoxedParser<'a, I, O, Self::Error> where Self: Sized + 'a, { BoxedParser(Rc::new(self)) } /// Attempt to convert the output of this parser into something else using Rust's [`FromStr`] trait. /// /// This is most useful when wanting to convert literal values into their corresponding Rust type, such as when /// parsing integers. /// /// The output type of this parser is `Result`, the result of attempting to parse the output, `O`, into /// the value `U`. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let uint64 = text::int::<_, Simple>(10) /// .from_str::() /// .unwrapped(); /// /// assert_eq!(uint64.parse("7"), Ok(7)); /// assert_eq!(uint64.parse("42"), Ok(42)); /// ``` #[allow(clippy::wrong_self_convention)] fn from_str(self) -> Map Result, O> where Self: Sized, U: FromStr, O: AsRef, { self.map(|o| o.as_ref().parse()) } /// For parsers that produce a [`Result`] as their output, unwrap the result (panicking if an [`Err`] is /// encountered). /// /// In general, this method should be avoided except in cases where all possible that the parser might produce can /// by parsed using [`FromStr`] without producing an error. /// /// This combinator is not named `unwrap` to avoid confusion: it unwraps *during parsing*, not immediately. /// /// The output type of this parser is `U`, the [`Ok`] value of the [`Result`]. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let boolean = just::<_, _, Simple>("true") /// .or(just("false")) /// .from_str::() /// .unwrapped(); // Cannot panic: the only possible outputs generated by the parser are "true" or "false" /// /// assert_eq!(boolean.parse("true"), Ok(true)); /// assert_eq!(boolean.parse("false"), Ok(false)); /// // Does not panic, because the original parser only accepts "true" or "false" /// assert!(boolean.parse("42").is_err()); /// ``` #[track_caller] fn unwrapped(self) -> Unwrapped>::Error> where Self: Sized + Parser>, E: fmt::Debug, { Unwrapped(Location::caller(), self, PhantomData) } } impl<'a, I: Clone, O, T: Parser + ?Sized> Parser for &'a T { type Error = T::Error; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { debugger.invoke::<_, _, T>(*self, stream) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } impl + ?Sized> Parser for Box { type Error = T::Error; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { debugger.invoke::<_, _, T>(&*self, stream) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } impl + ?Sized> Parser for Rc { type Error = T::Error; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { debugger.invoke::<_, _, T>(&*self, stream) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } impl + ?Sized> Parser for Arc { type Error = T::Error; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { debugger.invoke::<_, _, T>(&*self, stream) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`Parser::boxed`]. /// /// This type is a [`repr(transparent)`](https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent) wrapper /// around its inner value. /// /// Due to current implementation details, the inner value is not, in fact, a [`Box`], but is an [`Rc`] to facilitate /// efficient cloning. This is likely to change in the future. Unlike [`Box`], [`Rc`] has no size guarantees: although /// it is *currently* the same size as a raw pointer. // TODO: Don't use an Rc #[must_use] #[repr(transparent)] pub struct BoxedParser<'a, I, O, E: Error>(Rc + 'a>); impl<'a, I, O, E: Error> Clone for BoxedParser<'a, I, O, E> { fn clone(&self) -> Self { Self(self.0.clone()) } } impl<'a, I: Clone, O, E: Error> Parser for BoxedParser<'a, I, O, E> { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[allow(deprecated)] debugger.invoke(&self.0, stream) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn boxed<'b>(self) -> BoxedParser<'b, I, O, Self::Error> where Self: Sized + 'b, { // Avoid boxing twice. self } } /// Create a parser that selects one or more input patterns and map them to an output value. /// /// This is most useful when turning the tokens of a previous compilation pass (such as lexing) into data that can be /// used for parsing, although it can also generally be used to select inputs and map them to outputs. Any unmapped /// input patterns will become syntax errors, just as with [`filter`]. /// /// The macro is semantically similar to a `match` expression and so supports /// [pattern guards](https://doc.rust-lang.org/reference/expressions/match-expr.html#match-guards) too. /// /// ```ignore /// select! { /// Token::Bool(x) if x => Expr::True, /// Token::Bool(x) if !x => Expr::False, /// } /// ``` /// /// If you require access to the input's span, you may add an argument before the patterns to gain access to it. /// /// ```ignore /// select! { |span| /// Token::Num(x) => Expr::Num(x).spanned(span), /// Token::Str(s) => Expr::Str(s).spanned(span), /// } /// ``` /// /// Internally, [`select!`] is a loose wrapper around [`filter_map`] and thinking of it as such might make it less /// confusing. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// // The type of our parser's input (tokens like this might be emitted by your compiler's lexer) /// #[derive(Clone, Debug, PartialEq)] /// enum Token { /// Num(u64), /// Bool(bool), /// LParen, /// RParen, /// } /// /// // The type of our parser's output, a syntax tree /// #[derive(Debug, PartialEq)] /// enum Ast { /// Num(u64), /// Bool(bool), /// List(Vec), /// } /// /// // Our parser converts a stream of input tokens into an AST /// // `select!` is used to deconstruct some of the tokens and turn them into AST nodes /// let ast = recursive::<_, _, _, _, Cheap>(|ast| { /// let literal = select! { /// Token::Num(x) => Ast::Num(x), /// Token::Bool(x) => Ast::Bool(x), /// }; /// /// literal.or(ast /// .repeated() /// .delimited_by(just(Token::LParen), just(Token::RParen)) /// .map(Ast::List)) /// }); /// /// use Token::*; /// assert_eq!( /// ast.parse(vec![LParen, Num(5), LParen, Bool(false), Num(42), RParen, RParen]), /// Ok(Ast::List(vec![ /// Ast::Num(5), /// Ast::List(vec![ /// Ast::Bool(false), /// Ast::Num(42), /// ]), /// ])), /// ); /// ``` #[macro_export] macro_rules! select { (|$span:ident| $($p:pat $(if $guard:expr)? => $out:expr),+ $(,)?) => ({ $crate::primitive::filter_map(move |$span, x| match x { $($p $(if $guard)? => ::core::result::Result::Ok($out)),+, _ => ::core::result::Result::Err($crate::error::Error::expected_input_found($span, ::core::option::Option::None, ::core::option::Option::Some(x))), }) }); ($($p:pat $(if $guard:expr)? => $out:expr),+ $(,)?) => (select!(|_span| $($p $(if $guard)? => $out),+)); } chumsky-0.9.3/src/primitive.rs000064400000000000000000001034261046102023000144730ustar 00000000000000//! Parser primitives that accept specific token patterns. //! //! *“These creatures you call mice, you see, they are not quite as they appear. They are merely the protrusion into //! our dimension of vastly hyperintelligent pandimensional beings.”* //! //! Chumsky parsers are created by combining together smaller parsers. Right at the bottom of the pile are the parser //! primitives, a parser developer's bread & butter. Each of these primitives are very easy to understand in isolation, //! usually only doing one thing. //! //! ## The Important Ones //! //! - [`just`]: parses a specific input or sequence of inputs //! - [`filter`]: parses a single input, if the given filter function returns `true` //! - [`end`]: parses the end of input (i.e: if there any more inputs, this parse fails) use super::*; use core::panic::Location; /// See [`custom`]. #[must_use] pub struct Custom(F, PhantomData); impl Copy for Custom {} impl Clone for Custom { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl) -> PResult, E: Error> Parser for Custom { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { (self.0)(stream) } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser primitive that allows you to define your own custom parsers. /// /// In theory you shouldn't need to use this unless you have particularly bizarre requirements, but it's a cleaner and /// more sustainable alternative to implementing [`Parser`] by hand. /// /// The output type of this parser is determined by the parse result of the function. pub fn custom(f: F) -> Custom { Custom(f, PhantomData) } /// See [`end`]. #[must_use] pub struct End(PhantomData); impl Clone for End { fn clone(&self) -> Self { Self(PhantomData) } } impl> Parser for End { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { match stream.next() { (_, _, None) => (Vec::new(), Ok(((), None))), (at, span, found) => ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Some(None), found), )), ), } } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts only the end of input. /// /// This parser is very useful when you wish to force a parser to consume *all* of the input. It is typically combined /// with [`Parser::then_ignore`]. /// /// The output type of this parser is `()`. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// assert_eq!(end::>().parse(""), Ok(())); /// assert!(end::>().parse("hello").is_err()); /// ``` /// /// ``` /// # use chumsky::prelude::*; /// let digits = text::digits::<_, Simple>(10); /// /// // This parser parses digits! /// assert_eq!(digits.parse("1234"), Ok("1234".to_string())); /// /// // However, parsers are lazy and do not consume trailing input. /// // This can be inconvenient if we want to validate all of the input. /// assert_eq!(digits.parse("1234AhasjADSJAlaDJKSDAK"), Ok("1234".to_string())); /// /// // To fix this problem, we require that the end of input follows any successfully parsed input /// let only_digits = digits.then_ignore(end()); /// /// // Now our parser correctly produces an error if any trailing input is found... /// assert!(only_digits.parse("1234AhasjADSJAlaDJKSDAK").is_err()); /// // ...while still behaving correctly for inputs that only consist of valid patterns /// assert_eq!(only_digits.parse("1234"), Ok("1234".to_string())); /// ``` pub fn end() -> End { End(PhantomData) } mod private { pub trait Sealed {} impl Sealed for T {} impl Sealed for alloc::string::String {} impl<'a> Sealed for &'a str {} impl<'a, T> Sealed for &'a [T] {} impl Sealed for [T; N] {} impl<'a, T, const N: usize> Sealed for &'a [T; N] {} impl Sealed for alloc::vec::Vec {} impl Sealed for alloc::collections::LinkedList {} impl Sealed for alloc::collections::VecDeque {} impl Sealed for alloc::collections::BTreeSet {} impl Sealed for alloc::collections::BinaryHeap {} #[cfg(feature = "std")] impl Sealed for std::collections::HashSet {} #[cfg(not(feature = "std"))] impl Sealed for hashbrown::HashSet {} } /// A utility trait to abstract over container-like things. /// /// This trait is sealed and an implementation detail - its internals should not be relied on by users. pub trait Container: private::Sealed { /// An iterator over the items within this container, by value. type Iter: Iterator; /// Iterate over the elements of the container (using internal iteration because GATs are unstable). fn get_iter(&self) -> Self::Iter; } impl Container for T { type Iter = core::iter::Once; fn get_iter(&self) -> Self::Iter { core::iter::once(self.clone()) } } impl Container for String { type Iter = alloc::vec::IntoIter; fn get_iter(&self) -> Self::Iter { self.chars().collect::>().into_iter() } } impl<'a> Container for &'a str { type Iter = alloc::str::Chars<'a>; fn get_iter(&self) -> Self::Iter { self.chars() } } impl<'a, T: Clone> Container for &'a [T] { type Iter = core::iter::Cloned>; fn get_iter(&self) -> Self::Iter { self.iter().cloned() } } impl<'a, T: Clone, const N: usize> Container for &'a [T; N] { type Iter = core::iter::Cloned>; fn get_iter(&self) -> Self::Iter { self.iter().cloned() } } impl Container for [T; N] { type Iter = core::array::IntoIter; fn get_iter(&self) -> Self::Iter { core::array::IntoIter::new(self.clone()) } } impl Container for Vec { type Iter = alloc::vec::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } impl Container for alloc::collections::LinkedList { type Iter = alloc::collections::linked_list::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } impl Container for alloc::collections::VecDeque { type Iter = alloc::collections::vec_deque::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } #[cfg(feature = "std")] impl Container for std::collections::HashSet { type Iter = std::collections::hash_set::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } #[cfg(not(feature = "std"))] impl Container for hashbrown::HashSet { type Iter = hashbrown::hash_set::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } impl Container for alloc::collections::BTreeSet { type Iter = alloc::collections::btree_set::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } impl Container for alloc::collections::BinaryHeap { type Iter = alloc::collections::binary_heap::IntoIter; fn get_iter(&self) -> Self::Iter { self.clone().into_iter() } } /// A utility trait to abstract over linear and ordered container-like things, excluding things such /// as sets and heaps. /// /// This trait is sealed and an implementation detail - its internals should not be relied on by users. pub trait OrderedContainer: Container {} impl OrderedContainer for T {} impl OrderedContainer for String {} impl<'a> OrderedContainer for &'a str {} impl<'a, T: Clone> OrderedContainer for &'a [T] {} impl<'a, T: Clone, const N: usize> OrderedContainer for &'a [T; N] {} impl OrderedContainer for [T; N] {} impl OrderedContainer for Vec {} impl OrderedContainer for alloc::collections::LinkedList {} impl OrderedContainer for alloc::collections::VecDeque {} /// See [`just`]. #[must_use] pub struct Just, E>(C, PhantomData<(I, E)>); impl, E> Copy for Just {} impl, E> Clone for Just { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl + Clone, E: Error> Parser for Just { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { for expected in self.0.get_iter() { match stream.next() { (_, _, Some(tok)) if tok == expected => {} (at, span, found) => { return ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Some(Some(expected)), found), )), ) } } } (Vec::new(), Ok((self.0.clone(), None))) } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts only the given input. /// /// The output type of this parser is `C`, the input or sequence that was provided. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let question = just::<_, _, Cheap>('?'); /// /// assert_eq!(question.parse("?"), Ok('?')); /// assert!(question.parse("!").is_err()); /// // This works because parsers do not eagerly consume input, so the '!' is not parsed /// assert_eq!(question.parse("?!"), Ok('?')); /// // This fails because the parser expects an end to the input after the '?' /// assert!(question.then(end()).parse("?!").is_err()); /// ``` pub fn just, E: Error>(inputs: C) -> Just { Just(inputs, PhantomData) } /// See [`seq`]. #[must_use] pub struct Seq(Vec, PhantomData); impl Clone for Seq { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl> Parser for Seq { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { for expected in &self.0 { match stream.next() { (_, _, Some(tok)) if &tok == expected => {} (at, span, found) => { return ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Some(Some(expected.clone())), found), )), ) } } } (Vec::new(), Ok(((), None))) } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts only a sequence of specific inputs. /// /// The output type of this parser is `()`. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let hello = seq::<_, _, Cheap>("Hello".chars()); /// /// assert_eq!(hello.parse("Hello"), Ok(())); /// assert_eq!(hello.parse("Hello, world!"), Ok(())); /// assert!(hello.parse("Goodbye").is_err()); /// /// let onetwothree = seq::<_, _, Cheap>([1, 2, 3]); /// /// assert_eq!(onetwothree.parse([1, 2, 3]), Ok(())); /// assert_eq!(onetwothree.parse([1, 2, 3, 4, 5]), Ok(())); /// assert!(onetwothree.parse([2, 1, 3]).is_err()); /// ``` #[deprecated( since = "0.7.0", note = "Use `just` instead: it now works for many sequence-like types!" )] pub fn seq, E>(xs: Iter) -> Seq { Seq(xs.into_iter().collect(), PhantomData) } /// See [`one_of`]. #[must_use] pub struct OneOf(C, PhantomData<(I, E)>); impl Clone for OneOf { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl, E: Error> Parser for OneOf { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { match stream.next() { (_, _, Some(tok)) if self.0.get_iter().any(|not| not == tok) => { (Vec::new(), Ok((tok, None))) } (at, span, found) => ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, self.0.get_iter().map(Some), found), )), ), } } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts one of a sequence of specific inputs. /// /// The output type of this parser is `I`, the input that was found. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let digits = one_of::<_, _, Cheap>("0123456789") /// .repeated().at_least(1) /// .then_ignore(end()) /// .collect::(); /// /// assert_eq!(digits.parse("48791"), Ok("48791".to_string())); /// assert!(digits.parse("421!53").is_err()); /// ``` pub fn one_of, E: Error>(inputs: C) -> OneOf { OneOf(inputs, PhantomData) } /// See [`empty`]. #[must_use] pub struct Empty(PhantomData); impl Clone for Empty { fn clone(&self) -> Self { Self(PhantomData) } } impl> Parser for Empty { type Error = E; fn parse_inner( &self, _debugger: &mut D, _: &mut StreamOf, ) -> PResult { (Vec::new(), Ok(((), None))) } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that parses no inputs. /// /// The output type of this parser is `()`. pub fn empty() -> Empty { Empty(PhantomData) } /// See [`none_of`]. #[must_use] pub struct NoneOf(C, PhantomData<(I, E)>); impl Clone for NoneOf { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl, E: Error> Parser for NoneOf { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { match stream.next() { (_, _, Some(tok)) if self.0.get_iter().all(|not| not != tok) => { (Vec::new(), Ok((tok, None))) } (at, span, found) => ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Vec::new(), found), )), ), } } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts any input that is *not* in a sequence of specific inputs. /// /// The output type of this parser is `I`, the input that was found. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let string = one_of::<_, _, Cheap>("\"'") /// .ignore_then(none_of("\"'").repeated()) /// .then_ignore(one_of("\"'")) /// .then_ignore(end()) /// .collect::(); /// /// assert_eq!(string.parse("'hello'"), Ok("hello".to_string())); /// assert_eq!(string.parse("\"world\""), Ok("world".to_string())); /// assert!(string.parse("\"421!53").is_err()); /// ``` pub fn none_of, E: Error>(inputs: C) -> NoneOf { NoneOf(inputs, PhantomData) } /// See [`take_until`]. #[must_use] #[derive(Copy, Clone)] pub struct TakeUntil(A); impl> Parser, O)> for TakeUntil { type Error = A::Error; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult, O), A::Error> { let mut outputs = Vec::new(); let mut alt = None; loop { let (errors, err) = match stream.try_parse(|stream| { #[allow(deprecated)] self.0.parse_inner(debugger, stream) }) { (errors, Ok((out, a_alt))) => { break (errors, Ok(((outputs, out), merge_alts(alt, a_alt)))) } (errors, Err(err)) => (errors, err), }; match stream.next() { (_, _, Some(tok)) => outputs.push(tok), (_, _, None) => break (errors, Err(err)), } alt = merge_alts(alt.take(), Some(err)); } } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult, O), A::Error> { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult, O), A::Error> { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts any number of inputs until a terminating pattern is reached. /// /// The output type of this parser is `(Vec, O)`, a combination of the preceding inputs and the output of the /// final patterns. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let single_line = just::<_, _, Simple>("//") /// .then(take_until(text::newline())) /// .ignored(); /// /// let multi_line = just::<_, _, Simple>("/*") /// .then(take_until(just("*/"))) /// .ignored(); /// /// let comment = single_line.or(multi_line); /// /// let tokens = text::ident() /// .padded() /// .padded_by(comment /// .padded() /// .repeated()) /// .repeated(); /// /// assert_eq!(tokens.parse(r#" /// // These tokens... /// these are /// /* /// ...have some /// multi-line... /// */ /// // ...and single-line... /// tokens /// // ...comments between them /// "#), Ok(vec!["these".to_string(), "are".to_string(), "tokens".to_string()])); /// ``` pub fn take_until(until: A) -> TakeUntil { TakeUntil(until) } /// See [`filter`]. #[must_use] pub struct Filter(F, PhantomData); impl Copy for Filter {} impl Clone for Filter { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl bool, E: Error> Parser for Filter { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { match stream.next() { (_, _, Some(tok)) if (self.0)(&tok) => (Vec::new(), Ok((tok, None))), (at, span, found) => ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Vec::new(), found), )), ), } } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts only inputs that match the given predicate. /// /// The output type of this parser is `I`, the input that was found. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let lowercase = filter::<_, _, Cheap>(char::is_ascii_lowercase) /// .repeated().at_least(1) /// .then_ignore(end()) /// .collect::(); /// /// assert_eq!(lowercase.parse("hello"), Ok("hello".to_string())); /// assert!(lowercase.parse("Hello").is_err()); /// ``` pub fn filter bool, E>(f: F) -> Filter { Filter(f, PhantomData) } /// See [`filter_map`]. #[must_use] pub struct FilterMap(F, PhantomData); impl Copy for FilterMap {} impl Clone for FilterMap { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl Result, E: Error> Parser for FilterMap { type Error = E; fn parse_inner( &self, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let (at, span, tok) = stream.next(); match tok.map(|tok| (self.0)(span.clone(), tok)) { Some(Ok(tok)) => (Vec::new(), Ok((tok, None))), Some(Err(err)) => (Vec::new(), Err(Located::at(at, err))), None => ( Vec::new(), Err(Located::at( at, E::expected_input_found(span, Vec::new(), None), )), ), } } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// A parser that accepts a input and tests it against the given fallible function. /// /// This function allows integration with custom error types to allow for custom parser errors. /// /// Before using this function, consider whether the [`select`] macro would serve you better. /// /// The output type of this parser is `I`, the input that was found. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let numeral = filter_map(|span, c: char| match c.to_digit(10) { /// Some(x) => Ok(x), /// None => Err(Simple::custom(span, format!("'{}' is not a digit", c))), /// }); /// /// assert_eq!(numeral.parse("3"), Ok(3)); /// assert_eq!(numeral.parse("7"), Ok(7)); /// assert_eq!(numeral.parse("f"), Err(vec![Simple::custom(0..1, "'f' is not a digit")])); /// ``` pub fn filter_map Result, E: Error>(f: F) -> FilterMap { FilterMap(f, PhantomData) } /// See [`any`]. pub type Any = Filter bool, E>; /// A parser that accepts any input (but not the end of input). /// /// The output type of this parser is `I`, the input that was found. /// /// # Examples /// /// ``` /// # use chumsky::{prelude::*, error::Cheap}; /// let any = any::>(); /// /// assert_eq!(any.parse("a"), Ok('a')); /// assert_eq!(any.parse("7"), Ok('7')); /// assert_eq!(any.parse("\t"), Ok('\t')); /// assert!(any.parse("").is_err()); /// ``` pub fn any() -> Any { Filter(|_| true, PhantomData) } /// See [`fn@todo`]. #[must_use] pub struct Todo(&'static Location<'static>, PhantomData<(I, O, E)>); /// A parser that can be used wherever you need to implement a parser later. /// /// This parser is analogous to the [`todo!`] and [`unimplemented!`] macros, but will produce a panic when used to /// parse input, not immediately when invoked. /// /// This function is useful when developing your parser, allowing you to prototype and run parts of your parser without /// committing to implementing the entire thing immediately. /// /// The output type of this parser is whatever you want it to be: it'll never produce output! /// /// # Examples /// /// ```should_panic /// # use chumsky::prelude::*; /// let int = just::<_, _, Simple>("0x").ignore_then(todo()) /// .or(just("0b").ignore_then(text::digits(2))) /// .or(text::int(10)); /// /// // Decimal numbers are parsed /// assert_eq!(int.parse("12"), Ok("12".to_string())); /// // Binary numbers are parsed /// assert_eq!(int.parse("0b00101"), Ok("00101".to_string())); /// // Parsing hexadecimal numbers results in a panic because the parser is unimplemented /// int.parse("0xd4"); /// ``` #[track_caller] pub fn todo() -> Todo { Todo(Location::caller(), PhantomData) } impl Copy for Todo {} impl Clone for Todo { fn clone(&self) -> Self { Self(self.0, PhantomData) } } impl> Parser for Todo { type Error = E; fn parse_inner( &self, _debugger: &mut D, _stream: &mut StreamOf, ) -> PResult { todo!( "Attempted to use an unimplemented parser. Parser defined at {}", self.0 ) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// See [`choice`]. #[must_use] pub struct Choice(pub(crate) T, pub(crate) PhantomData); impl Copy for Choice {} impl Clone for Choice { fn clone(&self) -> Self { Self(self.0.clone(), PhantomData) } } impl, A: Parser, const N: usize> Parser for Choice<[A; N], E> { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let Choice(parsers, _) = self; let mut alt = None; for parser in parsers { match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(parser, stream) }) { (errors, Ok(out)) => return (errors, Ok(out)), (_, Err(a_alt)) => { alt = merge_alts(alt.take(), Some(a_alt)); } }; } (Vec::new(), Err(alt.unwrap())) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } impl, A: Parser> Parser for Choice, E> { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let Choice(parsers, _) = self; let mut alt = None; for parser in parsers { match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(parser, stream) }) { (errors, Ok(out)) => return (errors, Ok(out)), (_, Err(a_alt)) => { alt = merge_alts(alt.take(), Some(a_alt)); } }; } (Vec::new(), Err(alt.unwrap())) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } macro_rules! impl_for_tuple { () => {}; ($head:ident $($X:ident)*) => { impl_for_tuple!($($X)*); impl_for_tuple!(~ $head $($X)*); }; (~ $($X:ident)*) => { #[allow(unused_variables, non_snake_case)] impl, $($X: Parser),*> Parser for Choice<($($X,)*), E> { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let Choice(($($X,)*), _) = self; let mut alt = None; $( match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke($X, stream) }) { (errors, Ok(out)) => return (errors, Ok(out)), (errors, Err(a_alt)) => { alt = merge_alts(alt.take(), Some(a_alt)); }, }; )* (Vec::new(), Err(alt.unwrap())) } fn parse_inner_verbose( &self, d: &mut Verbose, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent( &self, d: &mut Silent, s: &mut StreamOf, ) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } }; } impl_for_tuple!(A_ B_ C_ D_ E_ F_ G_ H_ I_ J_ K_ L_ M_ N_ O_ P_ Q_ S_ T_ U_ V_ W_ X_ Y_ Z_); /// Parse using a tuple or array of many parsers, producing the output of the first to successfully parse. /// /// This primitive has a twofold improvement over a chain of [`Parser::or`] calls: /// /// - Rust's trait solver seems to resolve the [`Parser`] impl for this type much faster, significantly reducing /// compilation times. /// /// - Parsing is likely a little faster in some cases because the resulting parser is 'less careful' about error /// routing, and doesn't perform the same fine-grained error prioritisation that [`Parser::or`] does. /// /// These qualities make this parser ideal for lexers. /// /// The output type of this parser is the output type of the inner parsers. /// /// # Examples /// ``` /// # use chumsky::prelude::*; /// #[derive(Clone, Debug, PartialEq)] /// enum Token { /// If, /// For, /// While, /// Fn, /// Int(u64), /// Ident(String), /// } /// /// let tokens = choice::<_, Simple>(( /// text::keyword("if").to(Token::If), /// text::keyword("for").to(Token::For), /// text::keyword("while").to(Token::While), /// text::keyword("fn").to(Token::Fn), /// text::int(10).from_str().unwrapped().map(Token::Int), /// text::ident().map(Token::Ident), /// )) /// .padded() /// .repeated(); /// /// use Token::*; /// assert_eq!( /// tokens.parse("if 56 for foo while 42 fn bar"), /// Ok(vec![If, Int(56), For, Ident("foo".to_string()), While, Int(42), Fn, Ident("bar".to_string())]), /// ); /// ``` /// /// If you have more than 26 choices, the array-form of choice will work for any length. The downside /// being that the contained parsers must all be of the same type. /// ``` /// # use chumsky::prelude::*; /// #[derive(Clone, Debug, PartialEq)] /// enum Token { /// If, /// For, /// While, /// Fn, /// Def, /// } /// /// let tokens = choice::<_, Simple>([ /// text::keyword("if").to(Token::If), /// text::keyword("for").to(Token::For), /// text::keyword("while").to(Token::While), /// text::keyword("fn").to(Token::Fn), /// text::keyword("def").to(Token::Def), /// ]) /// .padded() /// .repeated(); /// /// use Token::*; /// assert_eq!( /// tokens.parse("def fn while if for"), /// Ok(vec![Def, Fn, While, If, For]), /// ); /// ``` pub fn choice(parsers: T) -> Choice { Choice(parsers, PhantomData) } chumsky-0.9.3/src/recovery.rs000064400000000000000000000370671046102023000143300ustar 00000000000000//! Types and traits that facilitate error recovery. //! //! *“Do you find coming to terms with the mindless tedium of it all presents an interesting challenge?”* use super::*; /// A trait implemented by error recovery strategies. pub trait Strategy> { /// Recover from a parsing failure. fn recover>( &self, recovered_errors: Vec>, fatal_error: Located, parser: P, debugger: &mut D, stream: &mut StreamOf, ) -> PResult; } /// See [`skip_then_retry_until`]. #[must_use] #[derive(Copy, Clone)] pub struct SkipThenRetryUntil( pub(crate) [I; N], pub(crate) bool, pub(crate) bool, ); impl SkipThenRetryUntil { /// Alters this recovery strategy so that the first token will always be skipped. /// /// This is useful when the input being searched for also appears at the beginning of the pattern that failed to /// parse. pub fn skip_start(self) -> Self { Self(self.0, self.1, true) } /// Alters this recovery strategy so that the synchronisation token will be consumed during recovery. /// /// This is useful when the input being searched for is a delimiter of a prior pattern rather than the start of a /// new pattern and hence is no longer important once recovery has occurred. pub fn consume_end(self) -> Self { Self(self.0, true, self.2) } } impl, const N: usize> Strategy for SkipThenRetryUntil { fn recover>( &self, a_errors: Vec>, a_err: Located, parser: P, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { if self.2 { let _ = stream.next(); } loop { #[allow(deprecated)] let (mut errors, res) = stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(&parser, stream) }); if let Ok(out) = res { errors.push(a_err); break (errors, Ok(out)); } #[allow(clippy::blocks_in_if_conditions)] if !stream.attempt( |stream| match stream.next().2.map(|tok| self.0.contains(&tok)) { Some(true) => (self.1, false), Some(false) => (true, true), None => (false, false), }, ) { break (a_errors, Err(a_err)); } } } } /// A recovery mode that simply skips to the next input on parser failure and tries again, until reaching one of /// several inputs. /// /// Also see [`SkipThenRetryUntil::consume_end`]. /// /// This strategy is very 'stupid' and can result in very poor error generation in some languages. Place this strategy /// after others as a last resort, and be careful about over-using it. pub fn skip_then_retry_until(until: [I; N]) -> SkipThenRetryUntil { SkipThenRetryUntil(until, false, true) } /// See [`skip_until`]. #[must_use] #[derive(Copy, Clone)] pub struct SkipUntil( pub(crate) [I; N], pub(crate) F, pub(crate) bool, pub(crate) bool, ); impl SkipUntil { /// Alters this recovery strategy so that the first token will always be skipped. /// /// This is useful when the input being searched for also appears at the beginning of the pattern that failed to /// parse. pub fn skip_start(self) -> Self { Self(self.0, self.1, self.2, true) } /// Alters this recovery strategy so that the synchronisation token will be consumed during recovery. /// /// This is useful when the input being searched for is a delimiter of a prior pattern rather than the start of a /// new pattern and hence is no longer important once recovery has occurred. pub fn consume_end(self) -> Self { Self(self.0, self.1, true, self.3) } } impl O, E: Error, const N: usize> Strategy for SkipUntil { fn recover>( &self, mut a_errors: Vec>, a_err: Located, _parser: P, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let pre_state = stream.save(); if self.3 { let _ = stream.next(); } a_errors.push(a_err); loop { match stream.attempt(|stream| { let (at, span, tok) = stream.next(); match tok.map(|tok| self.0.contains(&tok)) { Some(true) => (self.2, Ok(true)), Some(false) => (true, Ok(false)), None => (true, Err((at, span))), } }) { Ok(true) => break (a_errors, Ok(((self.1)(stream.span_since(pre_state)), None))), Ok(false) => {} Err(_) if stream.save() > pre_state => { break (a_errors, Ok(((self.1)(stream.span_since(pre_state)), None))) } Err((at, span)) => { break ( a_errors, Err(Located::at( at, E::expected_input_found(span, self.0.iter().cloned().map(Some), None), )), ) } } } } } /// A recovery mode that skips input until one of several inputs is found. /// /// Also see [`SkipUntil::consume_end`]. /// /// This strategy is very 'stupid' and can result in very poor error generation in some languages. Place this strategy /// after others as a last resort, and be careful about over-using it. pub fn skip_until(until: [I; N], fallback: F) -> SkipUntil { SkipUntil(until, fallback, false, false) } /// See [`nested_delimiters`]. #[must_use] #[derive(Copy, Clone)] pub struct NestedDelimiters( pub(crate) I, pub(crate) I, pub(crate) [(I, I); N], pub(crate) F, ); impl O, E: Error, const N: usize> Strategy for NestedDelimiters { // This looks like something weird with clippy, it warns in a weird spot and isn't fixed by // marking it at the spot. #[allow(clippy::blocks_in_if_conditions)] fn recover>( &self, mut a_errors: Vec>, a_err: Located, _parser: P, _debugger: &mut D, stream: &mut StreamOf, ) -> PResult { let mut balance = 0; let mut balance_others = [0; N]; let mut starts = Vec::new(); let mut error = None; let pre_state = stream.save(); let recovered = loop { if match stream.next() { (_, span, Some(t)) if t == self.0 => { balance += 1; starts.push(span); true } (_, _, Some(t)) if t == self.1 => { balance -= 1; starts.pop(); true } (at, span, Some(t)) => { for (balance_other, others) in balance_others.iter_mut().zip(self.2.iter()) { if t == others.0 { *balance_other += 1; } else if t == others.1 { *balance_other -= 1; if *balance_other < 0 && balance == 1 { // stream.revert(pre_state); error.get_or_insert_with(|| { Located::at( at, P::Error::unclosed_delimiter( starts.pop().unwrap(), self.0.clone(), span.clone(), self.1.clone(), Some(t.clone()), ), ) }); } } } false } (at, span, None) => { if balance > 0 && balance == 1 { error.get_or_insert_with(|| match starts.pop() { Some(start) => Located::at( at, P::Error::unclosed_delimiter( start, self.0.clone(), span, self.1.clone(), None, ), ), None => Located::at( at, P::Error::expected_input_found( span, Some(Some(self.1.clone())), None, ), ), }); } break false; } } { match balance.cmp(&0) { Ordering::Equal => break true, // The end of a delimited section is not a valid recovery pattern Ordering::Less => break false, Ordering::Greater => (), } } else if balance == 0 { // A non-delimiter input before anything else is not a valid recovery pattern break false; } }; if let Some(e) = error { a_errors.push(e); } if recovered { if a_errors.last().map_or(true, |e| a_err.at < e.at) { a_errors.push(a_err); } (a_errors, Ok(((self.3)(stream.span_since(pre_state)), None))) } else { (a_errors, Err(a_err)) } } } /// A recovery strategy that searches for a start and end delimiter, respecting nesting. /// /// It is possible to specify additional delimiter pairs that are valid in the pattern's context for better errors. For /// example, you might want to also specify `[('[', ']'), ('{', '}')]` when recovering a parenthesised expression as /// this can aid in detecting delimiter mismatches. /// /// A function that generates a fallback output on recovery is also required. pub fn nested_delimiters( start: I, end: I, others: [(I, I); N], fallback: F, ) -> NestedDelimiters { assert!( start != end, "Start and end delimiters cannot be the same when using `NestedDelimiters`" ); NestedDelimiters(start, end, others, fallback) } /// See [`skip_parser`]. #[derive(Copy, Clone)] pub struct SkipParser(pub(crate) R); impl, E: Error> Strategy for SkipParser { fn recover>( &self, mut a_errors: Vec>, a_err: Located, _parser: P, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { a_errors.push(a_err); let (mut errors, res) = self.0.parse_inner(debugger, stream); a_errors.append(&mut errors); (a_errors, res) } } /// A recovery mode that applies the provided recovery parser to determine the content to skip. /// /// ``` /// # use chumsky::prelude::*; /// #[derive(Clone, Debug, PartialEq, Eq, Hash)] /// enum Token { /// GoodKeyword, /// BadKeyword, /// Newline, /// } /// /// #[derive(Clone, Debug, PartialEq, Eq, Hash)] /// enum AST { /// GoodLine, /// Error, /// } /// /// // The happy path... /// let goodline = just::>(Token::GoodKeyword) /// .ignore_then(none_of(Token::Newline).repeated().to(AST::GoodLine)) /// .then_ignore(just(Token::Newline)); /// /// // If it fails, swallow everything up to a newline, but only if the line /// // didn't contain BadKeyword which marks an alternative parse route that /// // we want to accept instead. /// let goodline_with_recovery = goodline.recover_with(skip_parser( /// none_of([Token::Newline, Token::BadKeyword]) /// .repeated() /// .then_ignore(just(Token::Newline)) /// .to(AST::Error), /// )); /// ``` pub fn skip_parser(recovery_parser: R) -> SkipParser { SkipParser(recovery_parser) } /// A parser that includes a fallback recovery strategy should parsing result in an error. #[must_use] #[derive(Copy, Clone)] pub struct Recovery(pub(crate) A, pub(crate) S); impl, S: Strategy, E: Error> Parser for Recovery { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { match stream.try_parse(|stream| { #[allow(deprecated)] debugger.invoke(&self.0, stream) }) { (a_errors, Ok(a_out)) => (a_errors, Ok(a_out)), (a_errors, Err(a_err)) => self.1.recover(a_errors, a_err, &self.0, debugger, stream), } } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } #[cfg(test)] mod tests { use crate::error::Cheap; use crate::prelude::*; #[test] fn recover_with_skip_then_retry_until() { let parser = just::<_, _, Cheap<_>>('a') .recover_with(skip_then_retry_until([','])) .separated_by(just(',')); { let (result, errors) = parser.parse_recovery("a,a,2a,a"); assert_eq!(result, Some(vec!['a', 'a', 'a', 'a'])); assert_eq!(errors.len(), 1) } { let (result, errors) = parser.parse_recovery("a,a,2 a,a"); assert_eq!(result, Some(vec!['a', 'a', 'a', 'a'])); assert_eq!(errors.len(), 1) } { let (result, errors) = parser.parse_recovery("a,a,2 a,a"); assert_eq!(result, Some(vec!['a', 'a', 'a', 'a'])); assert_eq!(errors.len(), 1) } } #[test] fn until_nothing() { #[derive(Debug, Clone, Copy, PartialEq)] pub enum Token { Foo, Bar, } fn lexer() -> impl Parser> { let foo = just("foo").to(Token::Foo); let bar = just("bar").to(Token::Bar); choice((foo, bar)).recover_with(skip_then_retry_until([])) } let (result, errors) = lexer().parse_recovery("baz"); assert_eq!(result, None); assert_eq!(errors.len(), 1); } } chumsky-0.9.3/src/recursive.rs000064400000000000000000000165341046102023000144750ustar 00000000000000//! Recursive parsers (parser that include themselves within their patterns). //! //! *“It's unpleasantly like being drunk." //! "What's so unpleasant about being drunk?" //! "You ask a glass of water.”* //! //! The [`recursive()`] function covers most cases, but sometimes it's necessary to manually control the declaration and //! definition of parsers more corefully, particularly for mutually-recursive parsers. In such cases, the functions on //! [`Recursive`] allow for this. use super::*; use alloc::rc::{Rc, Weak}; // TODO: Remove when `OnceCell` is stable struct OnceCell(core::cell::RefCell>); impl OnceCell { pub fn new() -> Self { Self(core::cell::RefCell::new(None)) } pub fn set(&self, x: T) -> Result<(), ()> { let mut inner = self.0.try_borrow_mut().map_err(|_| ())?; if inner.is_none() { *inner = Some(x); Ok(()) } else { Err(()) } } pub fn get(&self) -> Option> { Some(core::cell::Ref::map(self.0.borrow(), |x| { x.as_ref().unwrap() })) } } enum RecursiveInner { Owned(Rc), Unowned(Weak), } type OnceParser<'a, I, O, E> = OnceCell + 'a>>; /// A parser that can be defined in terms of itself by separating its [declaration](Recursive::declare) from its /// [definition](Recursive::define). /// /// Prefer to use [`recursive()`], which exists as a convenient wrapper around both operations, if possible. #[must_use] pub struct Recursive<'a, I, O, E: Error>(RecursiveInner>); impl<'a, I: Clone, O, E: Error> Recursive<'a, I, O, E> { fn cell(&self) -> Rc> { match &self.0 { RecursiveInner::Owned(x) => x.clone(), RecursiveInner::Unowned(x) => x .upgrade() .expect("Recursive parser used before being defined"), } } /// Declare the existence of a recursive parser, allowing it to be used to construct parser combinators before /// being fulled defined. /// /// Declaring a parser before defining it is required for a parser to reference itself. /// /// This should be followed by **exactly one** call to the [`Recursive::define`] method prior to using the parser /// for parsing (i.e: via the [`Parser::parse`] method or similar). /// /// Prefer to use [`recursive()`], which is a convenient wrapper around this method and [`Recursive::define`], if /// possible. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// #[derive(Debug, PartialEq)] /// enum Chain { /// End, /// Link(char, Box), /// } /// /// // Declare the existence of the parser before defining it so that it can reference itself /// let mut chain = Recursive::<_, _, Simple>::declare(); /// /// // Define the parser in terms of itself. /// // In this case, the parser parses a right-recursive list of '+' into a singly linked list /// chain.define(just('+') /// .then(chain.clone()) /// .map(|(c, chain)| Chain::Link(c, Box::new(chain))) /// .or_not() /// .map(|chain| chain.unwrap_or(Chain::End))); /// /// assert_eq!(chain.parse(""), Ok(Chain::End)); /// assert_eq!( /// chain.parse("++"), /// Ok(Chain::Link('+', Box::new(Chain::Link('+', Box::new(Chain::End))))), /// ); /// ``` pub fn declare() -> Self { Recursive(RecursiveInner::Owned(Rc::new(OnceCell::new()))) } /// Defines the parser after declaring it, allowing it to be used for parsing. pub fn define + 'a>(&mut self, parser: P) { self.cell() .set(Box::new(parser)) .unwrap_or_else(|_| panic!("Parser defined more than once")); } } impl<'a, I: Clone, O, E: Error> Clone for Recursive<'a, I, O, E> { fn clone(&self) -> Self { Self(match &self.0 { RecursiveInner::Owned(x) => RecursiveInner::Owned(x.clone()), RecursiveInner::Unowned(x) => RecursiveInner::Unowned(x.clone()), }) } } impl<'a, I: Clone, O, E: Error> Parser for Recursive<'a, I, O, E> { type Error = E; fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { #[cfg(feature = "stacker")] #[inline(always)] fn recurse R>(f: F) -> R { stacker::maybe_grow(1024 * 1024, 1024 * 1024, f) } #[cfg(not(feature = "stacker"))] #[inline(always)] fn recurse R>(f: F) -> R { f() } recurse(|| { #[allow(deprecated)] debugger.invoke( self.cell() .get() .expect("Recursive parser used before being defined") .as_ref(), stream, ) }) } fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } /// Construct a recursive parser (i.e: a parser that may contain itself as part of its pattern). /// /// The given function must create the parser. The parser must not be used to parse input before this function returns. /// /// This is a wrapper around [`Recursive::declare`] and [`Recursive::define`]. /// /// The output type of this parser is `O`, the same as the inner parser. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// #[derive(Debug, PartialEq)] /// enum Tree { /// Leaf(String), /// Branch(Vec), /// } /// /// // Parser that recursively parses nested lists /// let tree = recursive::<_, _, _, _, Simple>(|tree| tree /// .separated_by(just(',')) /// .delimited_by(just('['), just(']')) /// .map(Tree::Branch) /// .or(text::ident().map(Tree::Leaf)) /// .padded()); /// /// assert_eq!(tree.parse("hello"), Ok(Tree::Leaf("hello".to_string()))); /// assert_eq!(tree.parse("[a, b, c]"), Ok(Tree::Branch(vec![ /// Tree::Leaf("a".to_string()), /// Tree::Leaf("b".to_string()), /// Tree::Leaf("c".to_string()), /// ]))); /// // The parser can deal with arbitrarily complex nested lists /// assert_eq!(tree.parse("[[a, b], c, [d, [e, f]]]"), Ok(Tree::Branch(vec![ /// Tree::Branch(vec![ /// Tree::Leaf("a".to_string()), /// Tree::Leaf("b".to_string()), /// ]), /// Tree::Leaf("c".to_string()), /// Tree::Branch(vec![ /// Tree::Leaf("d".to_string()), /// Tree::Branch(vec![ /// Tree::Leaf("e".to_string()), /// Tree::Leaf("f".to_string()), /// ]), /// ]), /// ]))); /// ``` pub fn recursive< 'a, I: Clone, O, P: Parser + 'a, F: FnOnce(Recursive<'a, I, O, E>) -> P, E: Error, >( f: F, ) -> Recursive<'a, I, O, E> { let mut parser = Recursive::declare(); parser.define(f(Recursive(match &parser.0 { RecursiveInner::Owned(x) => RecursiveInner::Unowned(Rc::downgrade(x)), RecursiveInner::Unowned(_) => unreachable!(), }))); parser } chumsky-0.9.3/src/span.rs000064400000000000000000000053211046102023000134170ustar 00000000000000//! Types and traits related to spans. //! //! *“We demand rigidly defined areas of doubt and uncertainty!”* //! //! You can use the [`Span`] trait to connect up chumsky to your compiler's knowledge of the input source. use core::ops::Range; /// A trait that describes a span over a particular range of inputs. /// /// Spans typically consist of some context, such as the file they originated from, and a start/end offset. Spans are /// permitted to overlap one-another. The end offset must always be greater than or equal to the start offset. /// /// Span is automatically implemented for [`Range`] and [`(C, Range)`]. pub trait Span: Clone { /// Extra context used in a span. /// /// This is usually some way to uniquely identity the source file that a span originated in such as the file's /// path, URL, etc. /// /// NOTE: Span contexts have no inherent meaning to Chumsky and can be anything. For example, [`Range`]'s /// implementation of [`Span`] simply uses [`()`] as its context. type Context: Clone; /// A type representing a span's start or end offset from the start of the input. /// /// Typically, [`usize`] is used. /// /// NOTE: Offsets have no inherently meaning to Chumsky and are not used to decide how to prioritise errors. This /// means that it's perfectly fine for tokens to have non-continuous spans that bear no relation to their actual /// location in the input stream. This is useful for languages with an AST-level macro system that need to /// correctly point to symbols in the macro input when producing errors. type Offset: Clone; /// Create a new span given a context and an offset range. fn new(context: Self::Context, range: Range) -> Self; /// Return the span's context. fn context(&self) -> Self::Context; /// Return the start offset of the span. fn start(&self) -> Self::Offset; /// Return the end offset of the span. fn end(&self) -> Self::Offset; } impl Span for Range { type Context = (); type Offset = T; fn new((): Self::Context, range: Self) -> Self { range } fn context(&self) -> Self::Context {} fn start(&self) -> Self::Offset { self.start.clone() } fn end(&self) -> Self::Offset { self.end.clone() } } impl Span for (C, Range) { type Context = C; type Offset = T; fn new(context: Self::Context, range: Range) -> Self { (context, range) } fn context(&self) -> Self::Context { self.0.clone() } fn start(&self) -> Self::Offset { self.1.start.clone() } fn end(&self) -> Self::Offset { self.1.end.clone() } } chumsky-0.9.3/src/stream.rs000064400000000000000000000274601046102023000137610ustar 00000000000000//! Token streams and tools converting to and from them.. //! //! *“What’s up?” “I don’t know,” said Marvin, “I’ve never been there.”* //! //! [`Stream`] is the primary type used to feed input data into a chumsky parser. You can create them in a number of //! ways: from strings, iterators, arrays, etc. use super::*; use alloc::vec; trait StreamExtend: Iterator { /// Extend the vector with input. The actual amount can be more or less than `n`, but must be at least 1 (0 implies /// that the stream has been exhausted. fn extend(&mut self, v: &mut Vec, n: usize); } #[allow(deprecated)] impl StreamExtend for I { fn extend(&mut self, v: &mut Vec, n: usize) { v.reserve(n); v.extend(self.take(n)); } } /// A utility type used to flatten input trees. See [`Stream::from_nested`]. pub enum Flat { /// The input tree flattens into a single input. Single(I), /// The input tree flattens into many sub-trees. Many(Iter), } /// A type that represents a stream of input tokens. Unlike [`Iterator`], this type supports backtracking and a few /// other features required by the crate. #[allow(deprecated)] pub struct Stream< 'a, I, S: Span, Iter: Iterator + ?Sized = dyn Iterator + 'a, > { pub(crate) phantom: PhantomData<&'a ()>, pub(crate) eoi: S, pub(crate) offset: usize, pub(crate) buffer: Vec<(I, S)>, pub(crate) iter: Iter, } /// A [`Stream`] that pulls tokens from a boxed [`Iterator`]. pub type BoxStream<'a, I, S> = Stream<'a, I, S, Box + 'a>>; impl<'a, I, S: Span, Iter: Iterator> Stream<'a, I, S, Iter> { /// Create a new stream from an iterator of `(Token, Span)` pairs. A span representing the end of input must also /// be provided. /// /// There is no requirement that spans must map exactly to the position of inputs in the stream, but they should /// be non-overlapping and should appear in a monotonically-increasing order. pub fn from_iter(eoi: S, iter: Iter) -> Self { Self { phantom: PhantomData, eoi, offset: 0, buffer: Vec::new(), iter, } } /// Eagerly evaluate the token stream, returning an iterator over the tokens in it (but without modifying the /// stream's state so that it can still be used for parsing). /// /// This is most useful when you wish to check the input of a parser during debugging. pub fn fetch_tokens(&mut self) -> impl Iterator + '_ where (I, S): Clone, { self.buffer.extend(&mut self.iter); self.buffer.iter().cloned() } } impl<'a, I: Clone, S: Span + 'a> BoxStream<'a, I, S> { /// Create a new `Stream` from an iterator of nested tokens and a function that flattens them. /// /// It's not uncommon for compilers to perform delimiter parsing during the lexing stage (Rust does this!). When /// this is done, the output of the lexing stage is usually a series of nested token trees. This functions allows /// you to easily flatten such token trees into a linear token stream so that they can be parsed (Chumsky currently /// only support parsing linear streams of inputs). /// /// For reference, [here](https://docs.rs/syn/0.11.1/syn/enum.TokenTree.html) is `syn`'s `TokenTree` type that it /// uses when parsing Rust syntax. /// /// # Examples /// /// ``` /// # use chumsky::{Stream, BoxStream, Flat}; /// type Span = std::ops::Range; /// /// fn span_at(at: usize) -> Span { at..at + 1 } /// /// #[derive(Clone)] /// enum Token { /// Local(String), /// Int(i64), /// Bool(bool), /// Add, /// Sub, /// OpenParen, /// CloseParen, /// OpenBrace, /// CloseBrace, /// // etc. /// } /// /// enum Delimiter { /// Paren, // ( ... ) /// Brace, // { ... } /// } /// /// // The structure of this token tree is very similar to that which Rust uses. /// // See: https://docs.rs/syn/0.11.1/syn/enum.TokenTree.html /// enum TokenTree { /// Token(Token), /// Tree(Delimiter, Vec<(TokenTree, Span)>), /// } /// /// // A function that turns a series of nested token trees into a linear stream that can be used for parsing. /// fn flatten_tts(eoi: Span, token_trees: Vec<(TokenTree, Span)>) -> BoxStream<'static, Token, Span> { /// use std::iter::once; /// // Currently, this is quite an explicit process: it will likely become easier in future versions of Chumsky. /// Stream::from_nested( /// eoi, /// token_trees.into_iter(), /// |(tt, span)| match tt { /// // For token trees that contain just a single token, no flattening needs to occur! /// TokenTree::Token(token) => Flat::Single((token, span)), /// // Flatten a parenthesised token tree into an iterator of the inner token trees, surrounded by parenthesis tokens /// TokenTree::Tree(Delimiter::Paren, tree) => Flat::Many(once((TokenTree::Token(Token::OpenParen), span_at(span.start))) /// .chain(tree.into_iter()) /// .chain(once((TokenTree::Token(Token::CloseParen), span_at(span.end - 1))))), /// // Flatten a braced token tree into an iterator of the inner token trees, surrounded by brace tokens /// TokenTree::Tree(Delimiter::Brace, tree) => Flat::Many(once((TokenTree::Token(Token::OpenBrace), span_at(span.start))) /// .chain(tree.into_iter()) /// .chain(once((TokenTree::Token(Token::CloseBrace), span_at(span.end - 1))))), /// } /// ) /// } /// ``` pub fn from_nested< P: 'a, Iter: Iterator, Many: Iterator, F: FnMut((P, S)) -> Flat<(I, S), Many> + 'a, >( eoi: S, iter: Iter, mut flatten: F, ) -> Self { let mut v: Vec> = vec![iter.collect()]; Self::from_iter( eoi, Box::new(core::iter::from_fn(move || loop { if let Some(many) = v.last_mut() { match many.pop_front().map(&mut flatten) { Some(Flat::Single(input)) => break Some(input), Some(Flat::Many(many)) => v.push(many.collect()), None => { v.pop(); } } } else { break None; } })), ) } } impl<'a, I: Clone, S: Span> Stream<'a, I, S> { pub(crate) fn offset(&self) -> usize { self.offset } pub(crate) fn save(&self) -> usize { self.offset } pub(crate) fn revert(&mut self, offset: usize) { self.offset = offset; } fn pull_until(&mut self, offset: usize) -> Option<&(I, S)> { let additional = offset.saturating_sub(self.buffer.len()) + 1024; #[allow(deprecated)] (&mut &mut self.iter as &mut dyn StreamExtend<_>).extend(&mut self.buffer, additional); self.buffer.get(offset) } pub(crate) fn skip_if(&mut self, f: impl FnOnce(&I) -> bool) -> bool { match self.pull_until(self.offset).cloned() { Some((out, _)) if f(&out) => { self.offset += 1; true } Some(_) => false, None => false, } } pub(crate) fn next(&mut self) -> (usize, S, Option) { match self.pull_until(self.offset).cloned() { Some((out, span)) => { self.offset += 1; (self.offset - 1, span, Some(out)) } None => (self.offset, self.eoi.clone(), None), } } pub(crate) fn span_since(&mut self, start_offset: usize) -> S { debug_assert!( start_offset <= self.offset, "{} > {}", self.offset, start_offset ); let start = self .pull_until(start_offset) .as_ref() .map(|(_, s)| s.start()) .unwrap_or_else(|| self.eoi.start()); let end = self .pull_until(self.offset.saturating_sub(1).max(start_offset)) .as_ref() .map(|(_, s)| s.end()) .unwrap_or_else(|| self.eoi.end()); S::new(self.eoi.context(), start..end) } pub(crate) fn attempt (bool, R)>(&mut self, f: F) -> R { let old_offset = self.offset; let (commit, out) = f(self); if !commit { self.offset = old_offset; } out } pub(crate) fn try_parse PResult>( &mut self, f: F, ) -> PResult { self.attempt(move |stream| { let out = f(stream); (out.1.is_ok(), out) }) } } impl<'a> From<&'a str> for Stream<'a, char, Range, Box)> + 'a>> { /// Please note that Chumsky currently uses character indices and not byte offsets in this impl. This is likely to /// change in the future. If you wish to use byte offsets, you can do so with [`Stream::from_iter`]. fn from(s: &'a str) -> Self { let len = s.chars().count(); Self::from_iter( len..len, Box::new(s.chars().enumerate().map(|(i, c)| (c, i..i + 1))), ) } } impl<'a> From for Stream<'a, char, Range, Box)>>> { /// Please note that Chumsky currently uses character indices and not byte offsets in this impl. This is likely to /// change in the future. If you wish to use byte offsets, you can do so with [`Stream::from_iter`]. fn from(s: String) -> Self { let chars = s.chars().collect::>(); Self::from_iter( chars.len()..chars.len(), Box::new(chars.into_iter().enumerate().map(|(i, c)| (c, i..i + 1))), ) } } impl<'a, T: Clone> From<&'a [T]> for Stream<'a, T, Range, Box)> + 'a>> { fn from(s: &'a [T]) -> Self { let len = s.len(); Self::from_iter( len..len, Box::new(s.iter().cloned().enumerate().map(|(i, x)| (x, i..i + 1))), ) } } impl<'a, T: Clone + 'a> From> for Stream<'a, T, Range, Box)> + 'a>> { fn from(s: Vec) -> Self { let len = s.len(); Self::from_iter( len..len, Box::new(s.into_iter().enumerate().map(|(i, x)| (x, i..i + 1))), ) } } impl<'a, T: Clone + 'a, const N: usize> From<[T; N]> for Stream<'a, T, Range, Box)> + 'a>> { fn from(s: [T; N]) -> Self { Self::from_iter( N..N, Box::new( core::array::IntoIter::new(s) .enumerate() .map(|(i, x)| (x, i..i + 1)), ), ) } } impl<'a, T: Clone, const N: usize> From<&'a [T; N]> for Stream<'a, T, Range, Box)> + 'a>> { fn from(s: &'a [T; N]) -> Self { Self::from_iter( N..N, Box::new(s.iter().cloned().enumerate().map(|(i, x)| (x, i..i + 1))), ) } } // impl<'a, T: Clone, S: Clone + Span> From<&'a [(T, S)]> for Stream<'a, T, S, Box + 'a>> // where S::Offset: Default // { // fn from(s: &'a [(T, S)]) -> Self { // Self::from_iter(Default::default(), Box::new(s.iter().cloned())) // } // } chumsky-0.9.3/src/text.rs000064400000000000000000000345701046102023000134520ustar 00000000000000//! Text-specific parsers and utilities. //! //! *“Ford!" he said, "there's an infinite number of monkeys outside who want to talk to us about this script for //! Hamlet they've worked out.”* //! //! The parsers in this module are generic over both Unicode ([`char`]) and ASCII ([`u8`]) characters. Most parsers take //! a type parameter, `C`, that can be either [`u8`] or [`char`] in order to handle either case. //! //! The [`TextParser`] trait is an extension on top of the main [`Parser`] trait that adds combinators unique to the //! parsing of text. use super::*; use core::iter::FromIterator; /// The type of a parser that accepts (and ignores) any number of whitespace characters. pub type Padding = Custom) -> PResult, E>; /// The type of a parser that accepts (and ignores) any number of whitespace characters before or after another /// pattern. // pub type Padded = ThenIgnore< // IgnoreThen>::Error>, P, (), O>, // Padding>::Error>, // O, // (), // >; /// A parser that accepts (and ignores) any number of whitespace characters before or after another pattern. #[must_use] #[derive(Copy, Clone)] pub struct Padded(A); impl, E: Error> Parser for Padded { type Error = E; #[inline] fn parse_inner( &self, debugger: &mut D, stream: &mut StreamOf, ) -> PResult { while stream.skip_if(|c| c.is_whitespace()) {} match self.0.parse_inner(debugger, stream) { (a_errors, Ok((a_out, a_alt))) => { while stream.skip_if(|c| c.is_whitespace()) {} (a_errors, Ok((a_out, a_alt))) } (a_errors, Err(err)) => (a_errors, Err(err)), } } #[inline] fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } #[inline] fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf) -> PResult { #[allow(deprecated)] self.parse_inner(d, s) } } mod private { pub trait Sealed {} impl Sealed for u8 {} impl Sealed for char {} } /// A trait implemented by textual character types (currently, [`u8`] and [`char`]). /// /// Avoid implementing this trait yourself if you can: it's *very* likely to be expanded in future versions! pub trait Character: private::Sealed + Copy + PartialEq { /// The default unsized [`str`]-like type of a linear sequence of this character. /// /// For [`char`], this is [`str`]. For [`u8`], this is [`[u8]`]. type Str: ?Sized + PartialEq; /// The default type that this character collects into. /// /// For [`char`], this is [`String`]. For [`u8`], this is [`Vec`]. type Collection: Chain + FromIterator + AsRef + 'static; /// Convert the given ASCII character to this character type. fn from_ascii(c: u8) -> Self; /// Returns true if the character is canonically considered to be inline whitespace (i.e: not part of a newline). fn is_inline_whitespace(&self) -> bool; /// Returns true if the character is canonically considered to be whitespace. fn is_whitespace(&self) -> bool; /// Return the '0' digit of the character. fn digit_zero() -> Self; /// Returns true if the character is canonically considered to be a numeric digit. fn is_digit(&self, radix: u32) -> bool; /// Returns this character as a [`char`]. fn to_char(&self) -> char; } impl Character for u8 { type Str = [u8]; type Collection = Vec; fn from_ascii(c: u8) -> Self { c } fn is_inline_whitespace(&self) -> bool { *self == b' ' || *self == b'\t' } fn is_whitespace(&self) -> bool { self.is_ascii_whitespace() } fn digit_zero() -> Self { b'0' } fn is_digit(&self, radix: u32) -> bool { (*self as char).is_digit(radix) } fn to_char(&self) -> char { *self as char } } impl Character for char { type Str = str; type Collection = String; fn from_ascii(c: u8) -> Self { c as char } fn is_inline_whitespace(&self) -> bool { *self == ' ' || *self == '\t' } fn is_whitespace(&self) -> bool { char::is_whitespace(*self) } fn digit_zero() -> Self { '0' } fn is_digit(&self, radix: u32) -> bool { char::is_digit(*self, radix) } fn to_char(&self) -> char { *self } } /// A trait containing text-specific functionality that extends the [`Parser`] trait. pub trait TextParser: Parser { /// Parse a pattern, ignoring any amount of whitespace both before and after the pattern. /// /// The output type of this parser is `O`, the same as the original parser. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let ident = text::ident::<_, Simple>().padded(); /// /// // A pattern with no whitespace surrounding it is accepted /// assert_eq!(ident.parse("hello"), Ok("hello".to_string())); /// // A pattern with arbitrary whitespace surrounding it is also accepted /// assert_eq!(ident.parse(" \t \n \t world \t "), Ok("world".to_string())); /// ``` fn padded(self) -> Padded where Self: Sized, { Padded(self) // whitespace().ignore_then(self).then_ignore(whitespace()) } } impl> TextParser for P {} /// A parser that accepts (and ignores) any number of whitespace characters. /// /// This parser is a `Parser::Repeated` and so methods such as `at_least()` can be called on it. /// /// The output type of this parser is `Vec<()>`. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let whitespace = text::whitespace::<_, Simple>(); /// /// // Any amount of whitespace is parsed... /// assert_eq!(whitespace.parse("\t \n \r "), Ok(vec![(), (), (), (), (), (), ()])); /// // ...including none at all! /// assert_eq!(whitespace.parse(""), Ok(vec![])); /// ``` pub fn whitespace<'a, C: Character + 'a, E: Error + 'a>( ) -> Repeated + Copy + Clone + 'a> { filter(|c: &C| c.is_whitespace()).ignored().repeated() } /// A parser that accepts (and ignores) any newline characters or character sequences. /// /// The output type of this parser is `()`. /// /// This parser is quite extensive, recognising: /// /// - Line feed (`\n`) /// - Carriage return (`\r`) /// - Carriage return + line feed (`\r\n`) /// - Vertical tab (`\x0B`) /// - Form feed (`\x0C`) /// - Next line (`\u{0085}`) /// - Line separator (`\u{2028}`) /// - Paragraph separator (`\u{2029}`) /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let newline = text::newline::>() /// .then_ignore(end()); /// /// assert_eq!(newline.parse("\n"), Ok(())); /// assert_eq!(newline.parse("\r"), Ok(())); /// assert_eq!(newline.parse("\r\n"), Ok(())); /// assert_eq!(newline.parse("\x0B"), Ok(())); /// assert_eq!(newline.parse("\x0C"), Ok(())); /// assert_eq!(newline.parse("\u{0085}"), Ok(())); /// assert_eq!(newline.parse("\u{2028}"), Ok(())); /// assert_eq!(newline.parse("\u{2029}"), Ok(())); /// ``` #[must_use] pub fn newline<'a, C: Character + 'a, E: Error + 'a>( ) -> impl Parser + Copy + Clone + 'a { just(C::from_ascii(b'\r')) .or_not() .ignore_then(just(C::from_ascii(b'\n'))) .or(filter(|c: &C| { [ '\r', // Carriage return '\x0B', // Vertical tab '\x0C', // Form feed '\u{0085}', // Next line '\u{2028}', // Line separator '\u{2029}', // Paragraph separator ] .contains(&c.to_char()) })) .ignored() } /// A parser that accepts one or more ASCII digits. /// /// The output type of this parser is [`Character::Collection`] (i.e: [`String`] when `C` is [`char`], and [`Vec`] /// when `C` is [`u8`]). /// /// The `radix` parameter functions identically to [`char::is_digit`]. If in doubt, choose `10`. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let digits = text::digits::<_, Simple>(10); /// /// assert_eq!(digits.parse("0"), Ok("0".to_string())); /// assert_eq!(digits.parse("1"), Ok("1".to_string())); /// assert_eq!(digits.parse("01234"), Ok("01234".to_string())); /// assert_eq!(digits.parse("98345"), Ok("98345".to_string())); /// // A string of zeroes is still valid. Use `int` if this is not desirable. /// assert_eq!(digits.parse("0000"), Ok("0000".to_string())); /// assert!(digits.parse("").is_err()); /// ``` #[must_use] pub fn digits>( radix: u32, ) -> impl Parser + Copy + Clone { filter(move |c: &C| c.is_digit(radix)) .repeated() .at_least(1) .collect() } /// A parser that accepts a non-negative integer. /// /// An integer is defined as a non-empty sequence of ASCII digits, where the first digit is non-zero or the sequence /// has length one. /// /// The output type of this parser is [`Character::Collection`] (i.e: [`String`] when `C` is [`char`], and [`Vec`] /// when `C` is [`u8`]). /// /// The `radix` parameter functions identically to [`char::is_digit`]. If in doubt, choose `10`. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let dec = text::int::<_, Simple>(10) /// .then_ignore(end()); /// /// assert_eq!(dec.parse("0"), Ok("0".to_string())); /// assert_eq!(dec.parse("1"), Ok("1".to_string())); /// assert_eq!(dec.parse("1452"), Ok("1452".to_string())); /// // No leading zeroes are permitted! /// assert!(dec.parse("04").is_err()); /// /// let hex = text::int::<_, Simple>(16) /// .then_ignore(end()); /// /// assert_eq!(hex.parse("2A"), Ok("2A".to_string())); /// assert_eq!(hex.parse("d"), Ok("d".to_string())); /// assert_eq!(hex.parse("b4"), Ok("b4".to_string())); /// assert!(hex.parse("0B").is_err()); /// ``` #[must_use] pub fn int>( radix: u32, ) -> impl Parser + Copy + Clone { filter(move |c: &C| c.is_digit(radix) && c != &C::digit_zero()) .map(Some) .chain::, _>(filter(move |c: &C| c.is_digit(radix)).repeated()) .collect() .or(just(C::digit_zero()).map(|c| core::iter::once(c).collect())) } /// A parser that accepts a C-style identifier. /// /// The output type of this parser is [`Character::Collection`] (i.e: [`String`] when `C` is [`char`], and [`Vec`] /// when `C` is [`u8`]). /// /// An identifier is defined as an ASCII alphabetic character or an underscore followed by any number of alphanumeric /// characters or underscores. The regex pattern for it is `[a-zA-Z_][a-zA-Z0-9_]*`. #[must_use] pub fn ident>() -> impl Parser + Copy + Clone { filter(|c: &C| c.to_char().is_ascii_alphabetic() || c.to_char() == '_') .map(Some) .chain::, _>( filter(|c: &C| c.to_char().is_ascii_alphanumeric() || c.to_char() == '_').repeated(), ) .collect() } /// Like [`ident`], but only accepts an exact identifier while ignoring trailing identifier characters. /// /// The output type of this parser is `()`. /// /// # Examples /// /// ``` /// # use chumsky::prelude::*; /// let def = text::keyword::<_, _, Simple>("def"); /// /// // Exactly 'def' was found /// assert_eq!(def.parse("def"), Ok(())); /// // Exactly 'def' was found, with non-identifier trailing characters /// assert_eq!(def.parse("def(foo, bar)"), Ok(())); /// // 'def' was found, but only as part of a larger identifier, so this fails to parse /// assert!(def.parse("define").is_err()); /// ``` #[must_use] pub fn keyword<'a, C: Character + 'a, S: AsRef + 'a + Clone, E: Error + 'a>( keyword: S, ) -> impl Parser + Clone + 'a { // TODO: use .filter(...), improve error messages ident().try_map(move |s: C::Collection, span| { if s.as_ref() == keyword.as_ref() { Ok(()) } else { Err(E::expected_input_found(span, None, None)) } }) } /// A parser that consumes text and generates tokens using semantic whitespace rules and the given token parser. /// /// Also required is a function that collects a [`Vec`] of tokens into a whitespace-indicated token tree. #[must_use] pub fn semantic_indentation<'a, C, Tok, T, F, E: Error + 'a>( token: T, make_group: F, ) -> impl Parser, Error = E> + Clone + 'a where C: Character + 'a, Tok: 'a, T: Parser + Clone + 'a, F: Fn(Vec, E::Span) -> Tok + Clone + 'a, { let line_ws = filter(|c: &C| c.is_inline_whitespace()); let line = token.padded_by(line_ws.ignored().repeated()).repeated(); let lines = line_ws .repeated() .then(line.map_with_span(|line, span| (line, span))) .separated_by(newline()) .padded(); lines.map(move |lines| { fn collapse( mut tree: Vec<(Vec, Vec, Option)>, make_group: &F, ) -> Option where F: Fn(Vec, S) -> Tok, { while let Some((_, tts, line_span)) = tree.pop() { let tt = make_group(tts, line_span?); if let Some(last) = tree.last_mut() { last.1.push(tt); } else { return Some(tt); } } None } let mut nesting = vec![(Vec::new(), Vec::new(), None)]; for (indent, (mut line, line_span)) in lines { let mut indent = indent.as_slice(); let mut i = 0; while let Some(tail) = nesting .get(i) .and_then(|(n, _, _)| indent.strip_prefix(n.as_slice())) { indent = tail; i += 1; } if let Some(tail) = collapse(nesting.split_off(i), &make_group) { nesting.last_mut().unwrap().1.push(tail); } if !indent.is_empty() { nesting.push((indent.to_vec(), line, Some(line_span))); } else { nesting.last_mut().unwrap().1.append(&mut line); } } nesting.remove(0).1 }) } chumsky-0.9.3/tutorial.md000064400000000000000000001035261046102023000135140ustar 00000000000000# Chumsky: A Tutorial *Please note that this tutorial is kept up to date with the `master` branch and not the most stable release: small details may differ!* In this tutorial, we'll develop a parser (and interpreter!) for a programming language called 'Foo'. Foo is a small language, but it's enough for us to have some fun. It isn't [Turing-complete](https://en.wikipedia.org/wiki/Turing_completeness), but it is complex enough to allow us to get to grips with parsing using Chumsky, containing many of the elements you'd find in a 'real' programming language. Here's some sample code written in Foo: ``` let seven = 7; fn add x y = x + y; add(2, 3) * -seven ``` By the end of this tutorial, you'll have an interpreter that will let you run this code, and more. This tutorial should take somewhere between 30 and 100 minutes to complete, depending on factors such as knowledge of Rust and compiler theory. *You can find the source code for the full interpreter in [`examples/foo.rs`](https://github.com/zesterer/chumsky/blob/master/examples/foo.rs) in the main repository.* ## Assumptions This tutorial is here to show you how to use Chumsky: it's not a general-purpose introduction to language development as a whole. For that reason, we make a few assumptions about things you should know before jumping in: - You should be happy reading and writing Rust. Particularly obscure syntax will be explained, but you should already be reasonably confident with concepts like functions, types, pattern matching, and error handling (`Result`, `?`, etc.). - You should be familiar with data structures like trees and vectors. - You should have some awareness of basic compiler theory concepts like [Abstract Syntax Trees (ASTs)](https://en.wikipedia.org/wiki/Abstract_syntax_tree), the difference between parsing and evaluation, [Backus Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form), etc. ## Documentation As we go, we'll be encountering many functions and concepts from Chumsky. I strongly recommend you keep [Chumsky's documentation](https://docs.rs/chumsky/) open in another browser tab and use it to cross-reference your understanding or gain more insight into specific things that you'd like more clarification on. In particular, most of the functions we'll be using come from the [`Parser`](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html) trait. Chumsky's docs include extensive doc examples for almost every function, so be sure to make use of them! Chumsky also has [several longer examples](https://github.com/zesterer/chumsky/tree/master/examples) in the main repository: looking at these may help improve your understanding if you get stuck. ## A note on imperative vs declarative parsers If you've tried hand-writing a parser before, you're probably expecting lots of flow control: splitting text by whitespace, matching/switching/branching on things, making a decision about whether to recurse into a function or expect another token, etc. This is an [*imperative*](https://en.wikipedia.org/wiki/Imperative_programming) approach to parser development and can be very time-consuming to write, maintain, and test. In contrast, Chumsky parsers are [*declarative*](https://en.wikipedia.org/wiki/Declarative_programming): they still perform intricate flow control internally, but it's all hidden away so you don't need to think of it. Instead of describing *how* to parse a particular grammar, Chumsky parsers simply *describe* a grammar: and it is then Chumsky's job to figure out how to efficiency parse it. If you've ever seen [Backus Naur Form (BNF)](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) used to describe a language's syntax, you'll have a good sense of what this means: if you squint, you'll find that a lot of parsers written in Chumsky look pretty close to the BNF definition. Another consequence of creating parsers in a declarative style is that *defining* a parser and *using* a parser are two different things: once created, parsers won't do anything on their own unless you give them an input to parse. ## Similarities between `Parser` and `Iterator` The most important API in Chumsky is the [`Parser`](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html) trait, implemented by all parsers. Because parsers don't do anything by themselves, writing Chumsky parsers often feels very similar to writing iterators in Rust using the [`Iterator`](https://doc.rust-lang.org/std/iter/trait.Iterator.html) trait. If you've enjoyed writing iterators in Rust before, you'll hopefully find the same satisfaction writing parsers with Chumsky. They even [share](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html#method.map) [several](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html#method.flatten) [functions](https://docs.rs/chumsky/latest/chumsky/trait.Parser.html#method.collect) with each other! ## Setting up Create a new project with `cargo new --bin foo`, add the latest version of Chumsky as a dependency, and place the following in your `main.rs`: ```rust use chumsky::prelude::*; fn main() { let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap(); println!("{}", src); } ``` This code has one purpose: it treats the first command-line argument as a path, reads the corresponding file, then prints the contents to the terminal. We don't really care for handling IO errors in this tutorial, so `.unwrap()` will suffice. Create a file named `test.foo` and run `cargo run -- test.foo` (the `--` tells cargo to pass the remaining arguments to the program instead of cargo itself). You should see that the contents of `test.foo`, if any, get printed to the console. Next, we'll create a data type that represents a program written in Foo. All programs in Foo are expressions, so we'll call it `Expr`. ```rust #[derive(Debug)] enum Expr { Num(f64), Var(String), Neg(Box), Add(Box, Box), Sub(Box, Box), Mul(Box, Box), Div(Box, Box), Call(String, Vec), Let { name: String, rhs: Box, then: Box, }, Fn { name: String, args: Vec, body: Box, then: Box, }, } ``` This is Foo's [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (AST). It represents all possible Foo programs and is defined recursively in terms of itself (`Box` is used to avoid the type being infinitely large). Each expression may itself contain sub-expressions. As an example, the expression `let x = 5; x * 3` is encoded as follows using the `Expr` type: ```rs Expr::Let { name: "x", rhs: Expr::Num(5.0), then: Expr::Mul( Expr::Var("x"), Expr::Num(3.0), ), } ``` The purpose of our parser will be to perform this conversion, from source code to AST. We're also going to create a function that creates Foo's parser. Our parser takes in a `char` stream and produces an `Expr`, so we'll use those types for the `I` (input) and `O` (output) type parameters. ```rust fn parser() -> impl Parser> { // To be filled in later... } ``` The `Error` associated type allows us to customise the error type that Chumsky uses. For now, we'll stick to `Simple`, a built-in error type that does everything we need. In `main`, we'll alter the `println!` as follows: ```rust println!("{:?}", parser().parse(src)); ``` ## Parsing digits Chumsky is a 'parser combinator' library. It allows the creation of parsers by combining together many smaller parsers. The very smallest parsers are called 'primitives' and live in the [`primitive`](https://docs.rs/chumsky/latest/chumsky/primitive/index.html) module. We're going to want to start by parsing the simplest element of Foo's syntax: numbers. ```rust // In `parser`... filter(|c: &char| c.is_ascii_digit()) ``` The `filter` primitive allows us to read a single input and accept it if it passes a condition. In our case, that condition simply checks that the character is a digit. If we compile this code now, we'll encounter an error. Why? Although we promised that our parser would produce an `Expr`, the `filter` primitive only outputs the input it found. Right now, all we have is a parser from `char` to `char` instead of a parser from `char` to `Expr`! To solve this, we need to crack open the 'combinator' part of parser combinators. We'll use Chumsky's `map` method to convert the output of the parser to an `Expr`. This method is very similar to its namesake on `Iterator`. ```rust filter(|c: &char| c.is_ascii_digit()) .map(|c| Expr::Num(c.to_digit(10).unwrap() as f64)) ``` Here, we're converting the `char` digit to an `f64` (unwrapping is fine: `map` only gets applied to outputs that successfully parsed!) and then wrapping it in `Expr::Num(_)` to convert it to a Foo expression. Try running the code. You'll see that you can type a digit into `test.foo` and have our interpreter generate an AST like so: ``` Ok(Num(5.0)) ``` ## Parsing numbers If you're more than a little adventurous, you'll quickly notice that typing in a multi-digit number doesn't quite behave as expected. Inputting `42` will only produce a `Num(4.0)` AST. This is because `filter` only accepts a *single* input. But now another question arises: why did our interpreter *not* complain at the trailing digits that didn't get parsed? The answer is that Chumsky's parsers are *lazy*: they will consume all of the input that they can and then stop. If there's any trailing input, it'll be ignored. This is obviously not always desirable. If the user places random nonsense at the end of the file, we want to be able to generate an error about it! Worse still, that 'nonsense' could be input the user intended to be part of the program, but that contained a syntax error and so was not properly parsed. How can we force the parser to consume all of the input? To do this, we can make use of two new parsers: the `then_ignore` combinator and the `end` primitive. ```rust filter(|c: &char| c.is_ascii_digit()) .map(|c| Expr::Num(c.to_digit(10).unwrap() as f64)) .then_ignore(end()) ``` The `then_ignore` combinator parses a second pattern after the first, but ignores its output in favour of that of the first. The `end` primitive succeeds if it encounters only the end of input. Combining these together, we now get an error for longer inputs. Unfortunately, this just reveals another problem (particularly if you're working on a Unix-like platform): any whitespace before or after our digit will upset our parser and trigger an error. We can handle whitespace by adding a call to `padded_by` (which ignores a given pattern before and after the first) after our digit parser, and a repeating filter for any whitespace characters. ```rust filter(|c: &char| c.is_ascii_digit()) .map(|c| Expr::Num(c.to_digit(10).unwrap() as f64)) .padded_by(filter(|c: &char| c.is_whitespace()).repeated()) .then_ignore(end()) ``` This example should have taught you a few important things about Chumsky's parsers: 1. Parsers are lazy: trailing input is ignored 2. Whitespace is not automatically ignored. Chumsky is a general-purpose parsing library, and some languages care very much about the structure of whitespace, so Chumsky does too ## Cleaning up and taking shortcuts At this point, things are starting to look a little messy. We've ended up writing 4 lines of code to properly parse a single digit. Let's clean things up a bit. We'll also make use of a bunch of text-based parser primitives that come with Chumsky to get rid of some of this cruft. ```rust let int = text::int(10) .map(|s: String| Expr::Num(s.parse().unwrap())) .padded(); int.then_ignore(end()) ``` That's better. We've also swapped out our custom digit parser with a built-in parser that parses any non-negative integer. ## Evaluating simple expressions We'll now take a diversion away from the parser to create a function that can evaluate our AST. This is the 'heart' of our interpreter and is the thing that actually performs the computation of programs. ```rust fn eval(expr: &Expr) -> Result { match expr { Expr::Num(x) => Ok(*x), Expr::Neg(a) => Ok(-eval(a)?), Expr::Add(a, b) => Ok(eval(a)? + eval(b)?), Expr::Sub(a, b) => Ok(eval(a)? - eval(b)?), Expr::Mul(a, b) => Ok(eval(a)? * eval(b)?), Expr::Div(a, b) => Ok(eval(a)? / eval(b)?), _ => todo!(), // We'll handle other cases later } } ``` This function might look scary at first glance, but there's not too much going on here: it just recursively calls itself, evaluating each node of the AST, combining the results via operators, until it has a final result. Any runtime errors simply get thrown back down the stack using `?`. We'll also change our `main` function a little so that we can pass our AST to `eval`. ```rust fn main() { let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap(); match parser().parse(src) { Ok(ast) => match eval(&ast) { Ok(output) => println!("{}", output), Err(eval_err) => println!("Evaluation error: {}", eval_err), }, Err(parse_errs) => parse_errs .into_iter() .for_each(|e| println!("Parse error: {}", e)), } } ``` This looks like a big change, but it's mostly just an extension of the previous code to pass the AST on to `eval` if parsing is successful. If unsuccessful, we just print the errors generated by the parser. Right now, none of our operators can produce errors when evaluated, but this will change in the future so we make sure to handle them in preparation. ## Parsing unary operators Jumping back to our parser, let's handle unary operators. Currently, our only unary operator is `-`, the negation operator. We're looking to parse any number of `-`, followed by a number. More formally: ``` expr = op* + int ``` We'll also give our `int` parser a new name, 'atom', for reasons that will become clear later. ```rust let int = text::int(10) .map(|s: String| Expr::Num(s.parse().unwrap())) .padded(); let atom = int; let op = |c| just(c).padded(); let unary = op('-') .repeated() .then(atom) .foldr(|_op, rhs| Expr::Neg(Box::new(rhs))); unary.then_ignore(end()) ``` Here, we meet a few new combinators: - `repeated` will parse a given pattern any number of times (including zero!), collecting the outputs into a `Vec` - `then` will parse one pattern and then another immediately afterwards, collecting both outputs into a tuple pair - `foldr` will take an output of the form `(Vec, U)` and will fold it into a single `U` by repeatedly applying the given function to each element of the `Vec` This last combinator is worth a little more consideration. We're trying to parse *any number* of negation operators, followed by a single atom (for now, just a number). For example, the input `---42` would generate the following input to `foldr`: ```rust (['-', '-', '-'], Num(42.0)) ``` The `foldr` function repeatedly applies the function to 'fold' the elements into a single element, like so: ```rust (['-', '-', '-'], Num(42.0)) --- --- --- --------- | | | | | | \ / | | Neg(Num(42.0)) | | | | \ / | Neg(Neg(Num(42.0))) | | \ / Neg(Neg(Neg(Num(42.0)))) ``` This may be a little hard to conceptualise for those used to imperative programming, but for functional programmers it should come naturally: `foldr` is just equivalent to `reduce`! Give the interpreter a try. You'll be able to enter inputs as before, but also values like `-17`. You can even apply the negation operator multiple times: `--9` will yield a value of `9` in the command line. This is exciting: we've finally started to see our interpreter perform useful (sort of) computations! ## Parsing binary operators Let's keep the momentum going and move over to binary operators. Traditionally, these pose quite a problem for parsers. To parse an expression like `3 + 4 * 2`, it's necessary to understand that multiplication [binds more eagerly than addition](https://en.wikipedia.org/wiki/Order_of_operations) and hence is applied first. Therefore, the result of this expression is `11` and not `14`. Parsers employ a range of strategies to handle these cases, but for Chumsky things are simple: the most eagerly binding (highest 'precedence') operators should be those that get considered first when parsing. It's worth noting that summation operators (`+` and `-`) are typically considered to have the *same* precedence as one-another. The same also applies to product operators (`*` and `/`). For this reason, we treat each group as a single pattern. At each stage, we're looking for a simple pattern: a unary expression, following by any number of a combination of an operator and a unary expression. More formally: ``` expr = unary + (op + unary)* ``` Let's expand our parser. ```rust let int = text::int(10) .map(|s: String| Expr::Num(s.parse().unwrap())) .padded(); let atom = int; let op = |c| just(c).padded(); let unary = op('-') .repeated() .then(atom) .foldr(|_op, rhs| Expr::Neg(Box::new(rhs))); let product = unary.clone() .then(op('*').to(Expr::Mul as fn(_, _) -> _) .or(op('/').to(Expr::Div as fn(_, _) -> _)) .then(unary) .repeated()) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); let sum = product.clone() .then(op('+').to(Expr::Add as fn(_, _) -> _) .or(op('-').to(Expr::Sub as fn(_, _) -> _)) .then(product) .repeated()) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); sum.then_ignore(end()) ``` The `Expr::Mul as fn(_, _) -> _` syntax might look a little unfamiliar, but don't worry! In Rust, [tuple enum variants are implicitly functions](https://stackoverflow.com/questions/54802045/what-is-this-strange-syntax-where-an-enum-variant-is-used-as-a-function). All we're doing here is making sure that Rust treats each of them as if they had the same type using the `as` cast, and then letting type inference do the rest. Those functions then get passed through the internals of the parser and end up in `op` within the `foldl` call. Another three combinators are introduced here: - `or` attempts to parse a pattern and, if unsuccessful, instead attempts another pattern - `to` is similar to `map`, but instead of mapping the output, entirely overrides the output with a new value. In our case, we use it to convert each binary operator to a function that produces the relevant AST node for that operator. - `foldl` is very similar to `foldr` in the last section but, instead of operating on a `(Vec<_>, _)`, it operates upon a `(_, Vec<_>)`, going backwards to combine values together with the function In a similar manner to `foldr` in the previous section on unary expressions, `foldl` is used to fold chains of binary operators into a single expression tree. For example, the input `2 + 3 - 7 + 5` would generate the following input to `foldl`: ```rust (Num(2.0), [(Expr::Add, Num(3.0)), (Expr::Sub, Num(7.0)), (Add, Num(5.0))]) ``` This then gets folded together by `foldl` like so: ```rust (Num(2.0), [(Add, Num(3.0)), (Sub, Num(7.0)), (Add, Num(5.0))]) -------- --------------- -------------- --------------- | | | | \ / | | Add(Num(2.0), Num(3.0)) | | | | | \ / | Sub(Add(Num(2.0), Num(3.0)), Num(7.0)) | | | \ / Add(Sub(Add(Num(2.0), Num(3.0)), Num(7.0)), Num(5.0)) ``` Give the interpreter a try. You should find that it can correctly handle both unary and binary operations combined in arbitrary configurations, correctly handling precedence. You can use it as a pocket calculator! ## Parsing parentheses A new challenger approaches: *nested expressions*. Sometimes, we want to override the default operator precedence rules entirely. We can do this by nesting expressions within parentheses, like `(3 + 4) * 2`. How do we handle this? The creation of the `atom` pattern a few sections before was no accident: parentheses have a greater precedence than any operator, so we should treat a parenthesised expression as if it were equivalent to a single value. We call things that behave like single values 'atoms' by convention. We're going to hoist our entire parser up into a closure, allowing us to define it in terms of itself. ```rust recursive(|expr| { let int = text::int(10) .map(|s: String| Expr::Num(s.parse().unwrap())) .padded(); let atom = int .or(expr.delimited_by(just('('), just(')'))).padded(); let op = |c| just(c).padded(); let unary = op('-') .repeated() .then(atom) .foldr(|_op, rhs| Expr::Neg(Box::new(rhs))); let product = unary.clone() .then(op('*').to(Expr::Mul as fn(_, _) -> _) .or(op('/').to(Expr::Div as fn(_, _) -> _)) .then(unary) .repeated()) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); let sum = product.clone() .then(op('+').to(Expr::Add as fn(_, _) -> _) .or(op('-').to(Expr::Sub as fn(_, _) -> _)) .then(product) .repeated()) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); sum }) .then_ignore(end()) ``` There are a few things worth paying attention to here. 1. `recursive` allows us to define a parser recursively in terms of itself by giving us a copy of it within the closure's scope 2. We use the recursive definition of `expr` within the definition of `atom`. We use the new `delimited_by` combinator to allow it to sit nested within a pair of parentheses 3. The `then_ignore(end())` call has *not* been hoisted inside the `recursive` call. This is because we only want to parse an end of input on the outermost expression, not at every level of nesting Try running the interpreter. You'll find that it can handle a surprising number of cases elegantly. Make sure that the following cases work correctly: | Expression | Expected result | |---------------|-----------------| | `3 * 4 + 2` | `14` | | `3 * (4 + 2)` | `18` | | `-4 + 2` | `-2` | | `-(4 + 2)` | `-6` | ## Parsing lets Our next step is to handle `let`. Unlike Rust and other imperative languages, `let` in Foo is an expression and not an statement (Foo has no statements) that takes the following form: ``` let = ; ``` We only want `let`s to appear at the outermost level of the expression, so we leave it out of the original recursive expression definition. However, we also want to be able to chain `let`s together, so we put them in their own recursive definition. We call it `decl` ('declaration') because we're eventually going to be adding `fn` syntax too. ```rust let ident = text::ident() .padded(); let expr = recursive(|expr| { let int = text::int(10) .map(|s: String| Expr::Num(s.parse().unwrap())) .padded(); let atom = int .or(expr.delimited_by(just('('), just(')'))) .or(ident.map(Expr::Var)); let op = |c| just(c).padded(); let unary = op('-') .repeated() .then(atom) .foldr(|_op, rhs| Expr::Neg(Box::new(rhs))); let product = unary.clone() .then(op('*').to(Expr::Mul as fn(_, _) -> _) .or(op('/').to(Expr::Div as fn(_, _) -> _)) .then(unary) .repeated()) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); let sum = product.clone() .then(op('+').to(Expr::Add as fn(_, _) -> _) .or(op('-').to(Expr::Sub as fn(_, _) -> _)) .then(product) .repeated()) .foldl(|lhs, (op, rhs)| op(Box::new(lhs), Box::new(rhs))); sum }); let decl = recursive(|decl| { let r#let = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(just(';')) .then(decl) .map(|((name, rhs), then)| Expr::Let { name, rhs: Box::new(rhs), then: Box::new(then), }); r#let // Must be later in the chain than `r#let` to avoid ambiguity .or(expr) .padded() }); decl .then_ignore(end()) ``` `keyword` is simply a parser that looks for an exact identifier (i.e: it doesn't match identifiers that only start with a keyword). Other than that, there's nothing in the definition of `r#let` that you haven't seen before: familiar combinators, but combined in different ways. It selectively ignores parts of the syntax that we don't care about after validating that it exists, then uses those elements that it does care about to create an `Expr::Let` AST node. Another thing to note is that the definition of `ident` will parse `"let"`. To avoid the parser accidentally deciding that `"let"` is a variable, we place `r#let` earlier in the or chain than `expr` so that it prioritises the correct interpretation. As mentioned in previous sections, Chumsky handles ambiguity simply by choosing the first successful parse it encounters, so making sure that we declare things in the right order can sometimes be important. You should now be able to run the interpreter and have it accept an input such as ``` let five = 5; five * 3 ``` Unfortunately, the `eval` function will panic because we've not yet handled `Expr::Var` or `Expr::Let`. Let's do that now. ```rust fn eval<'a>(expr: &'a Expr, vars: &mut Vec<(&'a String, f64)>) -> Result { match expr { Expr::Num(x) => Ok(*x), Expr::Neg(a) => Ok(-eval(a, vars)?), Expr::Add(a, b) => Ok(eval(a, vars)? + eval(b, vars)?), Expr::Sub(a, b) => Ok(eval(a, vars)? - eval(b, vars)?), Expr::Mul(a, b) => Ok(eval(a, vars)? * eval(b, vars)?), Expr::Div(a, b) => Ok(eval(a, vars)? / eval(b, vars)?), Expr::Var(name) => if let Some((_, val)) = vars.iter().rev().find(|(var, _)| *var == name) { Ok(*val) } else { Err(format!("Cannot find variable `{}` in scope", name)) }, Expr::Let { name, rhs, then } => { let rhs = eval(rhs, vars)?; vars.push((name, rhs)); let output = eval(then, vars); vars.pop(); output }, _ => todo!(), } } ``` Woo! That got a bit more complicated. Don't fear, there are only 3 important changes: 1. Because we need to keep track of variables that were previously defined, we use a `Vec` to remember them. Because `eval` is a recursive function, we also need to pass is to all recursive calls. 2. When we encounter an `Expr::Let`, we first evaluate the right-hand side (`rhs`). Once evaluated, we push it to the `vars` stack and evaluate the trailing `then` expression (i.e: all of the remaining code that appears after the semicolon). Popping it afterwards is not *technically* necessary because Foo does not permit nested declarations, but we do it anyway because it's good practice and it's what we'd want to do if we ever decided to add nesting. 3. When we encounter an `Expr::Var` (i.e: an inline variable) we search the stack *backwards* (because Foo permits [variable shadowing](https://en.wikipedia.org/wiki/Variable_shadowing) and we only want to find the most recently declared variable with the same name) to find the variables's value. If we can't find a variable of that name, we generate a runtime error which gets propagated back up the stack. Obviously, the signature of `eval` has changed so we'll update the call in `main` to become: ```rust eval(&ast, &mut Vec::new()) ``` Make sure to test the interpreter. Try experimenting with `let` declarations to make sure things aren't broken. In particular, it's worth testing variable shadowing by ensuring that the following program produces `8`: ``` let x = 5; let x = 3 + x; x ``` ## Parsing functions We're almost at a complete implementation of Foo. There's just one thing left: *functions*. Surprisingly, parsing functions is the easy part. All we need to modify is the definition of `decl` to add `r#fn`. It looks very much like the existing definition of `r#let`: ```rust let decl = recursive(|decl| { let r#let = text::keyword("let") .ignore_then(ident) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(just(';')) .then(decl.clone()) .map(|((name, rhs), then)| Expr::Let { name, rhs: Box::new(rhs), then: Box::new(then), }); let r#fn = text::keyword("fn") .ignore_then(ident) .then(ident.repeated()) .then_ignore(just('=')) .then(expr.clone()) .then_ignore(just(';')) .then(decl) .map(|(((name, args), body), then)| Expr::Fn { name, args, body: Box::new(body), then: Box::new(then), }); r#let .or(r#fn) .or(expr) .padded() }); ``` There's nothing new here, you understand this all already. Obviously, we also need to add support for *calling* functions by modifying `atom`: ```rust let call = ident .then(expr.clone() .separated_by(just(',')) .allow_trailing() // Foo is Rust-like, so allow trailing commas to appear in arg lists .delimited_by(just('('), just(')'))) .map(|(f, args)| Expr::Call(f, args)); let atom = int .or(expr.delimited_by(just('('), just(')'))) .or(call) .or(ident.map(Expr::Var)); ``` The only new combinator here is `separated_by` which behaves like `repeated`, but requires a separator pattern between each element. It has a method called `allow_trailing` which allows for parsing a trailing separator at the end of the elements. Next, we modify our `eval` function to support a function stack. ```rust fn eval<'a>( expr: &'a Expr, vars: &mut Vec<(&'a String, f64)>, funcs: &mut Vec<(&'a String, &'a [String], &'a Expr)>, ) -> Result { match expr { Expr::Num(x) => Ok(*x), Expr::Neg(a) => Ok(-eval(a, vars, funcs)?), Expr::Add(a, b) => Ok(eval(a, vars, funcs)? + eval(b, vars, funcs)?), Expr::Sub(a, b) => Ok(eval(a, vars, funcs)? - eval(b, vars, funcs)?), Expr::Mul(a, b) => Ok(eval(a, vars, funcs)? * eval(b, vars, funcs)?), Expr::Div(a, b) => Ok(eval(a, vars, funcs)? / eval(b, vars, funcs)?), Expr::Var(name) => if let Some((_, val)) = vars.iter().rev().find(|(var, _)| *var == name) { Ok(*val) } else { Err(format!("Cannot find variable `{}` in scope", name)) }, Expr::Let { name, rhs, then } => { let rhs = eval(rhs, vars, funcs)?; vars.push((name, rhs)); let output = eval(then, vars, funcs); vars.pop(); output }, Expr::Call(name, args) => if let Some((_, arg_names, body)) = funcs .iter() .rev() .find(|(var, _, _)| *var == name) .copied() { if arg_names.len() == args.len() { let mut args = args .iter() .map(|arg| eval(arg, vars, funcs)) .zip(arg_names.iter()) .map(|(val, name)| Ok((name, val?))) .collect::>()?; vars.append(&mut args); let output = eval(body, vars, funcs); vars.truncate(vars.len() - args.len()); output } else { Err(format!( "Wrong number of arguments for function `{}`: expected {}, found {}", name, arg_names.len(), args.len(), )) } } else { Err(format!("Cannot find function `{}` in scope", name)) }, Expr::Fn { name, args, body, then } => { funcs.push((name, args, body)); let output = eval(then, vars, funcs); funcs.pop(); output }, } } ``` Another big change! On closer inspection, however, this looks a lot like the change we made previously when we added support for `let` declarations. Whenever we encounter an `Expr::Fn`, we just push the function to the `funcs` stack and continue. Whenever we encounter an `Expr::Call`, we search the function stack backwards, as we did for variables, and then execute the body of the function (making sure to evaluate and push the arguments!). As before, we'll need to change the `eval` call in `main` to: ```rust eval(&ast, &mut Vec::new(), &mut Vec::new()) ``` Give the interpreter a test - see what you can do with it! Here's an example program to get you started: ``` let five = 5; let eight = 3 + five; fn add x y = x + y; add(five, eight) ``` ## Conclusion Here ends our exploration into Chumsky's API. We only scratched the surface of what Chumsky can do, but now you'll need to rely on the examples in the repository and the API doc examples for further help. Nonetheless, I hope it was an interesting foray into the use of parser combinators for the development of parsers. If nothing else, you've now got a neat little calculator language to play with. Interestingly, there is a subtle bug in Foo's `eval` function that produces unexpected scoping behaviour with function calls. I'll leave finding it as an exercise for the reader. ## Extension tasks - Find the interesting function scoping bug and consider how it could be fixed - Split token lexing into a separate compilation stage to avoid the need for `.padded()` in the parser - Add more operators - Add an `if then else ` ternary operator - Add values of different types by turning `f64` into an `enum` - Add lambdas to the language - Format the error message in a more useful way, perhaps by providing a reference to the original code