replace-megaparsec-1.5.0.1/0000755000000000000000000000000007346545000013565 5ustar0000000000000000replace-megaparsec-1.5.0.1/CHANGELOG.md0000644000000000000000000000466507346545000015411 0ustar0000000000000000# Revision history for replace-megaparsec ## 1.5.0.0 -- 2023-05-30 Upgrade to GHC v9.4.4, text v2.0.1 Text does not work with GHC v9.4.3 Test * exitcode-stdio-1.0 instead of detailed-0.9 * HSpec instead of Cabal Distribution.TestSuite Added megaparsec version bounds #36 ## 1.4.5.0 -- 2022-04-14 Minor documentation changes. Confirmed tests pass for text v2. ## 1.4.4.0 -- 2020-12-04 Add `splitCapT` and `breakCapT`. ## 1.4.3.0 -- 2020-09-28 Bugfix sepCap backtracking when sep fails See [#33](https://github.com/jamesdbrock/replace-megaparsec/issues/33) ## 1.4.1.0 -- 2020-05-07 anyTill use getInput instead of takeRest ## 1.4.0.0 -- 2020-05-06 __Running Parsers__: Add `splitCap` and `breakCap`. __Parser Combinators__: Add `anyTill`. Remove `Show` and `Typeable` constraints on `streamEditT`. ## 1.3.0.0 -- 2020-03-06 `sepCap` won't throw. Don't throw an exception on an unreachable error case, just bottom. Remove type constraints for `Exception`. ## 1.2.1.0 -- 2020-01-01 Allow any error parameter, not just `Void`. ## 1.2.0.0 -- 2019-10-31 Benchmark improvements Specializations of the `sepCap` function, guided by [replace-benchmark](https://github.com/jamesdbrock/replace-benchmark). ### New benchmarks | Program | dense | sparse | | :--- | ---: | ---: | | `Replace.Megaparsec.streamEdit` `String` | 454.95ms | 375.04ms | | `Replace.Megaparsec.streamEdit` `ByteString` | 529.99ms | 73.76ms | | `Replace.Megaparsec.streamEdit` `Text` | 547.47ms | 139.21ms | ### Old benchmarks | Program | dense | sparse | | :--- | ---: | ---: | | `Replace.Megaparsec.streamEdit` `String` | 454.95ms | 375.04ms | | `Replace.Megaparsec.streamEdit` `ByteString` | 611.98ms | 433.26ms | | `Replace.Megaparsec.streamEdit` `Text` | 592.66ms | 353.32ms | ## 1.1.5.0 -- 2019-10-08 * Move benchmarks to [__replace-benchmark__](https://github.com/jamesdbrock/replace-benchmark) ## 1.1.0.0 -- 2019-09-01 * Add benchmark suite. * In `streamEditT`, replace `fold` with `mconcat`. The benchmarks now show linear scaling instead of quadratic. ## 1.0.1.0 -- 2019-08-28 * Add test suite. * `sepCap` will treats `sep` as failing if it succeeds but consumes no input. ## 1.0.0.0 -- 2019-08-24 * First version. replace-megaparsec-1.5.0.1/LICENSE0000644000000000000000000000242507346545000014575 0ustar0000000000000000Copyright (c) 2019, James Brock All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. replace-megaparsec-1.5.0.1/README.md0000644000000000000000000004110707346545000015047 0ustar0000000000000000# replace-megaparsec [![Hackage](https://img.shields.io/hackage/v/replace-megaparsec.svg?style=flat)](https://hackage.haskell.org/package/replace-megaparsec) [![Stackage Nightly](http://stackage.org/package/replace-megaparsec/badge/nightly)](http://stackage.org/nightly/package/replace-megaparsec) [![Stackage LTS](http://stackage.org/package/replace-megaparsec/badge/lts)](http://stackage.org/lts/package/replace-megaparsec) * [Usage Examples](#usage-examples) * [In the Shell](#in-the-shell) * [Alternatives](#alternatives) * [Benchmarks](#benchmarks) * [Hypothetically Asked Questions](#hypothetically-asked-questions) __replace-megaparsec__ is for finding text patterns, and also replacing or splitting on the found patterns. This activity is traditionally done with regular expressions, but __replace-megaparsec__ uses [__megaparsec__](http://hackage.haskell.org/package/megaparsec) parsers instead for the pattern matching. __replace-megaparsec__ can be used in the same sort of “pattern capture” or “find all” situations in which one would use Python [`re.findall`](https://docs.python.org/3/library/re.html#re.findall) or Perl [`m//`](https://perldoc.perl.org/functions/m.html), or Unix [`grep`](https://www.gnu.org/software/grep/). __replace-megaparsec__ can be used in the same sort of “stream editing” or “search-and-replace” situations in which one would use Python [`re.sub`](https://docs.python.org/3/library/re.html#re.sub), or Perl [`s///`](https://perldoc.perl.org/functions/s.html), or Unix [`sed`](https://www.gnu.org/software/sed/manual/html_node/The-_0022s_0022-Command.html), or [`awk`](https://www.gnu.org/software/gawk/manual/gawk.html). __replace-megaparsec__ can be used in the same sort of “string splitting” situations in which one would use Python [`re.split`](https://docs.python.org/3/library/re.html#re.split) or Perl [`split`](https://perldoc.perl.org/functions/split.html). See [__replace-attoparsec__](https://hackage.haskell.org/package/replace-attoparsec) for the [__attoparsec__](http://hackage.haskell.org/package/attoparsec) version. ## Why would we want to do pattern matching and substitution with parsers instead of regular expressions? * Haskell parsers have a nicer syntax than [regular expressions](https://en.wikipedia.org/wiki/Regular_expression), which are notoriously [difficult to read](https://en.wikipedia.org/wiki/Write-only_language). * Regular expressions can do “group capture” on sections of the matched pattern, but they can only return stringy lists of the capture groups. Parsers can construct typed data structures based on the capture groups, guaranteeing no disagreement between the pattern rules and the rules that we're using to build data structures based on the pattern matches. For example, consider scanning a string for numbers. A lot of different things can look like a number, and can have leading plus or minus signs, or be in scientific notation, or have commas, or whatever. If we try to parse all of the numbers out of a string using regular expressions, then we have to make sure that the regular expression and the string-to-number conversion function agree about exactly what is and what isn't a numeric string. We can get into an awkward situation in which the regular expression says it has found a numeric string but the string-to-number conversion function fails. A typed parser will perform both the pattern match and the conversion, so it will never be in that situation. [Parse, don't validate.](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/) * Regular expressions are only able to pattern-match [regular](https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy) grammars. Megaparsec parsers are able pattern-match context-free grammars, and even context-sensitive grammars, if needed. See below for an example of lifting a `Parser` into a `State` monad for context-sensitive pattern-matching. * The replacement expression for a traditional regular expression-based substitution command is usually just a string template in which the *Nth* “capture group” can be inserted with the syntax `\N`. With this library, instead of a template, we get an `editor` function which can perform any computation, including IO. # Usage Examples The examples depend on these imports. ```haskell import Data.Void import Replace.Megaparsec import Text.Megaparsec import Text.Megaparsec.Char import Text.Megaparsec.Char.Lexer ``` ## Split strings with `splitCap` ### Find all pattern matches, capture the matched text and the parsed result Separate the input string into sections which can be parsed as a hexadecimal number with a prefix `"0x"`, and sections which can't. Parse the numbers. ```haskell let hexparser = chunk "0x" *> hexadecimal :: Parsec Void String Integer splitCap (match hexparser) "0xA 000 0xFFFF" ``` ```haskell [Right ("0xA",10), Left " 000 ", Right ("0xFFFF",65535)] ``` ### Find all pattern matches, capture only the locations of the matched patterns Find all of the sections of the stream which are letters. Capture a list of the offsets of the beginning of every pattern match. ```haskell import Data.Either let letterOffset = getOffset <* some letterChar :: Parsec Void String Int rights $ splitCap letterOffset " a bc " ``` ```haskell [1,4] ``` ### Pattern match balanced parentheses Find groups of balanced nested parentheses. This is an example of a “context-free” grammar, a pattern that can't be expressed by a regular expression. We can express the pattern with a recursive parser. ```haskell import Data.Functor (void) import Data.Bifunctor (second) let parens :: Parsec Void String () parens = do char '(' manyTill (void (noneOf "()") <|> void parens) (char ')') pure () second fst <$> splitCap (match parens) "(()) (()())" ``` ```haskell [Right "(())",Left " ",Right "(()())"] ``` ## Edit strings with `streamEdit` The following examples show how to search for a pattern in a string of text and then edit the string of text to substitute in some replacement text for the matched patterns. ### Pattern match and replace with a constant Replace all carriage-return-newline occurances with newline. ```haskell let crnl = chunk "\r\n" :: Parsec Void String String streamEdit crnl (const "\n") "1\r\n2\r\n" ``` ```haskell "1\n2\n" ``` ### Pattern match and edit the matches Replace alphabetic characters with the next character in the alphabet. ```haskell let somelet = some letterChar :: Parsec Void String String streamEdit somelet (fmap succ) "HAL 9000" ``` ```haskell "IBM 9000" ``` ### Pattern match and maybe edit the matches, or maybe leave them alone Find all of the string sections *`s`* which can be parsed as a hexadecimal number *`r`*, and if *`r≤16`*, then replace *`s`* with a decimal number. Uses the [`match`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html#v:match) combinator. ```haskell let hexparser = chunk "0x" *> hexadecimal :: Parsec Void String Integer streamEdit (match hexparser) (\(s,r) -> if r<=16 then show r else s) "0xA 000 0xFFFF" ``` ```haskell "10 000 0xFFFF" ``` ### Pattern match and edit the matches with IO with [`streamEditT`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEditT) Find an environment variable in curly braces and replace it with its value from the environment. ```haskell import System.Environment (getEnv) let bracevar = char '{' *> manyTill anySingle (char '}') :: ParsecT Void String IO String streamEditT bracevar getEnv "- {HOME} -" ``` ```haskell "- /home/jbrock -" ``` ### Context-sensitive pattern match and edit the matches with [`streamEditT`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEditT) Capitalize the third letter in a string. The `capThird` parser searches for individual letters, and it needs to remember how many times it has run so that it can match successfully only on the third time that it finds a letter. To enable the parser to remember how many times it has run, we'll compose the parser with a `State` monad from the `mtl` package. (Run in `ghci` with `cabal v2-repl -b mtl`). Because it has stateful memory, this parser is an example of a “context-sensitive” grammar. ```haskell import qualified Control.Monad.State.Strict as MTL import Control.Monad.State.Strict (get, put, evalState) import Data.Char (toUpper) let capThird :: ParsecT Void String (MTL.State Int) String capThird = do x <- letterChar i <- get let i' = i+1 put i' if i'==3 then pure [x] else empty flip evalState 0 $ streamEditT capThird (pure . fmap toUpper) "a a a a a" ``` ```haskell "a a A a a" ``` ### Pattern match, edit the matches, and count the edits with [`streamEditT`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEditT) Find and capitalize no more than three letters in a string, and return the edited string along with the number of letters capitalized. To enable the editor function to remember how many letters it has capitalized, we'll run `streamEditT` in the `State` monad from the `mtl` package. Use this technique to get the same functionality as Python [`re.subn`](https://docs.python.org/3/library/re.html#re.subn). ```haskell import qualified Control.Monad.State.Strict as MTL import Control.Monad.State.Strict (get, put, runState) import Data.Char (toUpper) let editThree :: Char -> MTL.State Int String editThree x = do i <- get if i<3 then do put $ i+1 pure [toUpper x] else pure [x] flip runState 0 $ streamEditT letterChar editThree "a a a a a" ``` ```haskell ("A A A a a",3) ``` ### Non-greedy pattern repetition This is not a feature of this library, but it’s a useful technique to know. How do we do non-greedy repetition of a pattern `p`, like we would in Regex by writing `p*?`? By using the [`manyTill_`](https://hackage.haskell.org/package/parser-combinators/docs/Control-Monad-Combinators.html#v:manyTill_) combinator. To repeat pattern `p` non-greedily, write `manyTill_ p q` where `q` is the entire rest of the parser. For example, this parse fails because `many` repeats the pattern `letterChar` greedily. ```haskell flip parseMaybe "aab" $ do a <- many letterChar b <- single 'b' pure (a,b) ``` ```haskell Nothing ``` To repeat pattern `letterChar` non-greedily, use `manyTill_`. ```haskell flip parseMaybe "aab" $ do (a,b) <- manyTill_ letterChar $ do single 'b' pure (a,b) ``` ```haskell Just ("aa",'b') ``` # In the Shell If we're going to have a viable `sed` replacement then we want to be able to use it easily from the command line. This [Stack script interpreter](https://docs.haskellstack.org/en/stable/GUIDE/#script-interpreter) script will find decimal numbers in a stream and replace them with their double. ```haskell #!/usr/bin/env stack {- stack script --resolver lts-16 --package megaparsec --package replace-megaparsec -} -- https://docs.haskellstack.org/en/stable/GUIDE/#script-interpreter import Data.Void import Text.Megaparsec import Text.Megaparsec.Char import Text.Megaparsec.Char.Lexer import Replace.Megaparsec main = interact $ streamEdit (decimal :: Parsec Void String Int) (show . (*2)) ``` If you have [The Haskell Tool Stack](https://docs.haskellstack.org/en/stable/README/) installed then you can just copy-paste this into a file named `doubler.hs` and run it. (On the first run Stack may need to download the dependencies.) ```bash $ chmod u+x doubler.hs $ echo "1 6 21 107" | ./doubler.hs 2 12 42 214 ``` # Alternatives Some libraries that one might consider instead of this one. # Benchmarks These benchmarks are intended to measure the wall-clock speed of *everything except the actual pattern-matching*. Speed of the pattern-matching is the responsibility of the [__megaparsec__](http://hackage.haskell.org/package/megaparsec) and [__attoparsec__](http://hackage.haskell.org/package/attoparsec) libraries. The benchmark task is to find all of the one-character patterns `x` in a text stream and replace them by a function which returns the constant string `oo`. So, like the regex `s/x/oo/g`. We have two benchmark input cases, which we call __dense__ and __sparse__. The __dense__ case is one megabyte of alternating spaces and `x`s like ``` x x x x x x x x x x x x x x x x x x x x x x x x x x x x ``` The __sparse__ case is one megabyte of spaces with a single `x` in the middle like ``` x ``` Each benchmark program reads the input from `stdin`, replaces `x` with `oo`, and writes the result to `stdout`. The time elapsed is measured by `perf stat`, and the best observed time is recorded. See [replace-benchmark](https://github.com/jamesdbrock/replace-benchmark) for details. | Program | dense | sparse | | :--- | ---: | ---: | | [Python 3.7.4 `re.sub`][sub] *repl* function | 89.23ms | 23.98ms | | [Perl 5 `s///ge`][s] | 180.65ms | 5.02ms | | [`Replace.Megaparsec.streamEdit`][m] `String` | 441.94ms | 375.04ms | | [`Replace.Megaparsec.streamEdit`][m] `ByteString` | 529.99ms | 73.76ms | | [`Replace.Megaparsec.streamEdit`][m] `Text` | 547.47ms | 139.21ms | | [`Replace.Attoparsec.ByteString.streamEdit`][ab] | 394.12ms | 41.13ms | | [`Replace.Attoparsec.Text.streamEdit`][at] | 515.26ms | 46.10ms | | [`Text.Regex.Applicative.replace`][ra] `String` | 1083.98ms | 646.40ms | | [`Text.Regex.PCRE.Heavy.gsub`][ph] `Text` | > 10min | 14.29ms | | [`Control.Lens.Regex.ByteString.match`][lb] | > 10min | 4.27ms | | [`Control.Lens.Regex.Text.match`][lt] | > 10min | 14.74ms | [sub]: https://docs.python.org/3/library/re.html#re.sub [s]: https://perldoc.perl.org/functions/s.html [m]: https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:streamEdit [ab]: https://hackage.haskell.org/package/replace-attoparsec/docs/Replace-Attoparsec-ByteString.html#v:streamEdit [at]: https://hackage.haskell.org/package/replace-attoparsec/docs/Replace-Attoparsec-Text.html#v:streamEdit [ra]: http://hackage.haskell.org/package/regex-applicative/docs/Text-Regex-Applicative.html#v:replace [ss]: http://hackage.haskell.org/package/stringsearch/docs/Data-ByteString-Search.html#v:replace [ph]: http://hackage.haskell.org/package/pcre-heavy/docs/Text-Regex-PCRE-Heavy.html#v:gsub [lb]: https://hackage.haskell.org/package/lens-regex-pcre/docs/Control-Lens-Regex-ByteString.html#v:match [lt]: https://hackage.haskell.org/package/lens-regex-pcre/docs/Control-Lens-Regex-Text.html#v:match # Hypothetically Asked Questions 1. *Could we write this library for __parsec__?* No, because the [`match`](https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec.html#v:match) combinator doesn't exist for __parsec__. (I can't find it anywhere. [Can it be written?](http://www.serpentine.com/blog/2014/05/31/attoparsec/#from-strings-to-buffers-and-cursors)) 2. *Is this a good idea?* You may have [heard it suggested](https://stackoverflow.com/questions/57667534/how-can-i-use-a-parser-in-haskell-to-find-the-locations-of-some-substrings-in-a/57712672#comment101804063_57667534) that monadic parsers are better for pattern-matching when the input stream is mostly signal, and regular expressions are better when the input stream is mostly noise. The premise of this library is that monadic parsers are great for finding small signal patterns in a stream of otherwise noisy text. Our reluctance to forego the speedup opportunities afforded by restricting ourselves to regular grammars is an old superstition about opportunities which [remain mostly unexploited anyway](https://swtch.com/~rsc/regexp/regexp1.html). The performance compromise of allowing stack memory allocation (a.k.a. pushdown automata, a.k.a. context-free grammar) was once considered [controversial for *general-purpose* programming languages](https://vanemden.wordpress.com/2014/06/18/how-recursion-got-into-programming-a-comedy-of-errors-3/). I think we can now resolve that controversy the same way for pattern matching languages. replace-megaparsec-1.5.0.1/Setup.hs0000644000000000000000000000005607346545000015222 0ustar0000000000000000import Distribution.Simple main = defaultMain replace-megaparsec-1.5.0.1/replace-megaparsec.cabal0000644000000000000000000000345707346545000020302 0ustar0000000000000000cabal-version: 3.0 name: replace-megaparsec version: 1.5.0.1 synopsis: Find, replace, split string patterns with Megaparsec parsers (instead of regex) homepage: https://github.com/jamesdbrock/replace-megaparsec bug-reports: https://github.com/jamesdbrock/replace-megaparsec/issues license: BSD-2-Clause license-file: LICENSE author: James Brock maintainer: James Brock build-type: Simple category: Parsing description: Find text patterns, replace the patterns, split on the patterns. Use Megaparsec monadic parsers instead of regular expressions for pattern matching. extra-doc-files: README.md , CHANGELOG.md source-repository head type: git location: https://github.com/jamesdbrock/replace-megaparsec.git library hs-source-dirs: src build-depends: base >=4.0 && <5.0 , megaparsec >=7.0.0 && <10.0.0 , bytestring >=0.2 && <1.0 , text >=0.2 && <3.0 , parser-combinators >=1.2.0 && < 2.0.0 default-language: Haskell2010 exposed-modules: Replace.Megaparsec , Replace.Megaparsec.Internal.ByteString , Replace.Megaparsec.Internal.Text test-suite test type: exitcode-stdio-1.0 main-is: Test.hs hs-source-dirs: tests other-modules: TestString, TestByteString, TestText default-language: Haskell2010 build-depends: base , replace-megaparsec , megaparsec , hspec >=2.0.0 && <3.0.0 , text , bytestring replace-megaparsec-1.5.0.1/src/Replace/0000755000000000000000000000000007346545000015727 5ustar0000000000000000replace-megaparsec-1.5.0.1/src/Replace/Megaparsec.hs0000644000000000000000000004321307346545000020335 0ustar0000000000000000-- | -- Module : Replace.Megaparsec -- Copyright : ©2019 James Brock -- License : BSD2 -- Maintainer: James Brock -- -- __Replace.Megaparsec__ is for finding text patterns, and also -- replacing or splitting on the found patterns. -- This activity is traditionally done with regular expressions, -- but __Replace.Megaparsec__ uses "Text.Megaparsec" parsers instead for -- the pattern matching. -- -- __Replace.Megaparsec__ can be used in the same sort of “pattern capture” -- or “find all” situations in which one would use Python -- , -- or Perl -- , -- or Unix -- . -- -- __Replace.Megaparsec__ can be used in the same sort of “stream editing” -- or “search-and-replace” situations in which one would use Python -- , -- or Perl -- , -- or Unix -- , -- or -- . -- -- __Replace.Megaparsec__ can be used in the same sort of “string splitting” -- situations in which one would use Python -- -- or Perl -- . -- -- See the __replace-megaparsec__ package README for usage examples. -- -- == Type constraints -- -- === output stream type @Tokens s@ = input stream type @s@ -- -- All functions in the __Running Parser__ section require the type of the -- stream of text that is input to be -- @'Text.Megaparsec.Stream.Stream' s@ -- such that -- @'Text.Megaparsec.Stream.Tokens' s ~ s@, -- because we want to output the same type of stream that was input. -- That requirement is satisfied for all the 'Text.Megaparsec.Stream' instances -- included with "Text.Megaparsec": -- -- * "Data.String" -- * "Data.Text" -- * "Data.Text.Lazy" -- * "Data.ByteString" -- * "Data.ByteString.Lazy" -- -- === Custom error type @e@ should be 'Data.Void' -- -- Megaparsec parsers have a custom error data component @e@. When writing parsers -- to be used by this module, the custom error type @e@ should usually -- be 'Data.Void', because every function in this module expects a parser -- failure to occur on every token in a non-matching section of the input -- stream, so parser failure error descriptions are not returned, and you'll -- never see the custom error information. -- -- == Special fast input types -- -- Functions in this module will be “fast” when the input stream -- type @s@ is: -- -- * "Data.Text" -- * "Data.ByteString" -- -- We mean “fast” in the same sense as 'Text.Megaparsec.MonadParsec': -- when returning subsections of the input stream, -- we return slices of the input stream data, rather than constructing a list -- of tokens and then building a new stream subsection from that list. -- This relies on implementation details of the stream representation, -- so there are specialization re-write rules in this module to make -- that possible without adding new typeclasses. {-# LANGUAGE LambdaCase #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeApplications #-} {-# LANGUAGE CPP #-} {-# LANGUAGE TypeOperators #-} module Replace.Megaparsec ( -- * Running parser -- -- | Functions in this section are /ways to run parsers/ -- (like 'Text.Megaparsec.runParser'). They take -- as arguments a @sep@ parser and some input, run the parser on the input, -- and return a result. breakCap , breakCapT , splitCap , splitCapT , streamEdit , streamEditT -- * Parser combinator -- -- | Functions in this section are /parser combinators/. They take -- a @sep@ parser for an argument, combine @sep@ with another parser, -- and return a new parser. , anyTill , sepCap , findAll , findAllCap ) where import Data.Bifunctor import Data.Functor.Identity import Data.Proxy import Control.Monad import qualified Data.ByteString as B import qualified Data.Text as T import Text.Megaparsec import Replace.Megaparsec.Internal.ByteString import Replace.Megaparsec.Internal.Text -- | -- === Break on and capture one pattern -- -- Find the first occurence of a pattern in a text stream, capture the found -- pattern, and break the input text stream on the found pattern. -- -- The 'breakCap' function is like 'Data.List.takeWhile', but can be predicated -- beyond more than just the next one token. It's also like 'Data.Text.breakOn', -- but the @needle@ can be a pattern instead of a constant string. -- -- Be careful not to look too far -- ahead; if the @sep@ parser looks to the end of the input then 'breakCap' -- could be /O(n²)/. -- -- The pattern parser @sep@ may match a zero-width pattern (a pattern which -- consumes no parser input on success). -- -- ==== Output -- -- * @Nothing@ when no pattern match was found. -- * @Just (prefix, parse_result, suffix)@ for the result of parsing the -- pattern match, and the @prefix@ string before and the @suffix@ string -- after the pattern match. @prefix@ and @suffix@ may be zero-length strings. -- -- ==== Access the matched section of text -- -- If you want to capture the matched string, then combine the pattern -- parser @sep@ with 'Text.Megaparsec.match'. -- -- With the matched string, we can reconstruct the input string. -- For all @input@, @sep@, if -- -- @ -- let ('Just' (prefix, (infix, _), suffix)) = breakCap ('Text.Megaparsec.match' sep) input -- @ -- -- then -- -- @ -- input == prefix '<>' infix '<>' suffix -- @ breakCap :: forall e s a. (Ord e, Stream s, Tokens s ~ s) => Parsec e s a -- ^ The pattern matching parser @sep@ -> s -- ^ The input stream of text -> Maybe (s, a, s) -- ^ Maybe (prefix, parse_result, suffix) breakCap sep input = runIdentity $ breakCapT sep input {-# INLINABLE breakCap #-} -- | -- === Break on and capture one pattern -- -- Monad transformer version of 'breakCap'. -- -- The parser @sep@ will run in the underlying monad context. breakCapT :: forall m e s a. (Ord e, Stream s, Tokens s ~ s, Monad m) => ParsecT e s m a -- ^ The pattern matching parser @sep@ -> s -- ^ The input stream of text -> m (Maybe (s, a, s)) -- ^ Maybe (prefix, parse_result, suffix) breakCapT sep input = runParserT pser "" input >>= \case (Left _) -> pure Nothing (Right x) -> pure $ Just x where pser = do (prefix, cap) <- anyTill sep suffix <- getInput pure (prefix, cap, suffix) {-# INLINABLE breakCapT #-} -- | -- === Split on and capture all patterns -- -- Find all occurences of the pattern @sep@, split the input string, capture -- all the patterns and the splits. -- -- The input string will be split on every leftmost non-overlapping occurence -- of the pattern @sep@. The output list will contain -- the parsed result of input string sections which match the @sep@ pattern -- in 'Right', and non-matching sections in 'Left'. -- -- 'splitCap' depends on 'sepCap', see 'sepCap' for more details. -- -- ==== Access the matched section of text -- -- If you want to capture the matched strings, then combine the pattern -- parser @sep@ with 'Text.Megaparsec.match'. -- -- With the matched strings, we can reconstruct the input string. -- For all @input@, @sep@, if -- -- @ -- let output = splitCap ('Text.Megaparsec.match' sep) input -- @ -- -- then -- -- @ -- input == 'Data.Monoid.mconcat' ('Data.Bifunctor.second' 'Data.Tuple.fst' '<$>' output) -- @ splitCap :: forall e s a. (Ord e, Stream s, Tokens s ~ s) => Parsec e s a -- ^ The pattern matching parser @sep@ -> s -- ^ The input stream of text -> [Either s a] -- ^ List of matching and non-matching input sections. splitCap sep input = runIdentity $ splitCapT sep input {-# INLINABLE splitCap #-} -- | -- === Split on and capture all patterns -- -- Monad transformer version of 'splitCap'. -- -- The parser @sep@ will run in the underlying monad context. splitCapT :: forall e s m a. (Ord e, Stream s, Tokens s ~ s, Monad m) => ParsecT e s m a -- ^ The pattern matching parser @sep@ -> s -- ^ The input stream of text -> m [Either s a] -- ^ List of matching and non-matching input sections. splitCapT sep input = runParserT (sepCap sep) "" input >>= \case (Left _) -> undefined -- sepCap can never fail (Right r) -> pure r {-# INLINABLE splitCapT #-} -- | -- === Stream editor -- -- Also known as “find-and-replace”, or “match-and-substitute”. Finds all -- non-overlapping sections of the stream which match the pattern @sep@, -- and replaces them with the result of the @editor@ function. -- -- ==== Access the matched section of text in the @editor@ -- -- If you want access to the matched string in the @editor@ function, -- then combine the pattern parser @sep@ with 'Text.Megaparsec.match'. -- This will effectively change the type of the @editor@ function -- to @(s,a) -> s@. -- -- This allows us to write an @editor@ function which can choose to not -- edit the match and just leave it as it is. If the @editor@ function -- returns the first item in the tuple, then @streamEdit@ will not change -- the matched string. -- -- So, for all @sep@: -- -- @ -- streamEdit ('Text.Megaparsec.match' sep) 'Data.Tuple.fst' ≡ 'Data.Function.id' -- @ streamEdit :: forall e s a. (Ord e, Stream s, Monoid s, Tokens s ~ s) => Parsec e s a -- ^ The pattern matching parser @sep@ -> (a -> s) -- ^ The @editor@ function. Takes a parsed result of @sep@ -- and returns a new stream section for the replacement. -> s -- ^ The input stream of text to be edited -> s -- ^ The edited input stream streamEdit sep editor = runIdentity . streamEditT sep (Identity . editor) {-# INLINABLE streamEdit #-} -- | -- === Stream editor -- -- Monad transformer version of 'streamEdit'. -- -- Both the parser @sep@ and the @editor@ function will run in the underlying -- monad context. -- -- If you want to do 'IO' operations in the @editor@ function or the -- parser @sep@, then run this in 'IO'. -- -- If you want the @editor@ function or the parser @sep@ to remember some state, -- then run this in a stateful monad. streamEditT :: forall e s m a. (Ord e, Stream s, Monad m, Monoid s, Tokens s ~ s) => ParsecT e s m a -- ^ The pattern matching parser @sep@ -> (a -> m s) -- ^ The @editor@ function. Takes a parsed result of @sep@ -- and returns a new stream section for the replacement. -> s -- ^ The input stream of text to be edited -> m s -- ^ The edited input stream streamEditT sep editor input = do runParserT (sepCap sep) "" input >>= \case (Left _) -> undefined -- sepCap can never fail (Right r) -> mconcat <$> traverse (either return editor) r {-# INLINABLE streamEditT #-} -- | -- === Specialized -- -- Parser combinator to consume input until the @sep@ pattern matches, -- equivalent to -- @'Control.Monad.Combinators.manyTill_' 'Text.Megaparsec.anySingle' sep@. -- On success, returns the prefix before the pattern match and the parsed match. -- -- @sep@ may be a zero-width parser, it may succeed without consuming any -- input. -- -- This combinator will produce a parser which -- acts like 'Text.Megaparsec.takeWhileP' but is predicated beyond more than -- just the next one token. 'anyTill' is also like 'Text.Megaparsec.takeWhileP' -- in that it will be “fast” when applied to an input stream type @s@ -- for which there are specialization re-write rules. anyTill :: forall e s m a. (MonadParsec e s m) => m a -- ^ The pattern matching parser @sep@ -> m (Tokens s, a) -- ^ parser anyTill sep = do (as, end) <- manyTill_ anySingle sep pure (tokensToChunk (Proxy::Proxy s) as, end) {-# INLINE [1] anyTill #-} #if MIN_VERSION_GLASGOW_HASKELL(8,8,1,0) {-# RULES "anyTill/ByteString" [2] forall e. forall. anyTill @e @B.ByteString = anyTillByteString @e @B.ByteString #-} {-# RULES "anyTill/Text" [2] forall e. forall. anyTill @e @T.Text = anyTillText @e @T.Text #-} #elif MIN_VERSION_GLASGOW_HASKELL(8,0,2,0) {-# RULES "anyTill/ByteString" [2] forall (pa :: ParsecT e B.ByteString m a). anyTill @e @B.ByteString @(ParsecT e B.ByteString m) @a pa = anyTillByteString @e @B.ByteString @(ParsecT e B.ByteString m) @a pa #-} {-# RULES "anyTill/Text" [2] forall (pa :: ParsecT e T.Text m a). anyTill @e @T.Text @(ParsecT e T.Text m) @a pa = anyTillText @e @T.Text @(ParsecT e T.Text m) @a pa #-} #endif -- | -- === Separate and capture -- -- Parser combinator to find all of the leftmost non-overlapping occurrences -- of the pattern parser @sep@ in a text stream. -- The 'sepCap' parser will always consume its entire input and can never fail. -- -- @sepCap@ is similar to the @sep*@ family of parser combinators -- found in -- -- and -- , -- but it returns the parsed result of the @sep@ parser instead -- of throwing it away. -- -- ==== Output -- -- The input stream is separated and output into a list of sections: -- -- * Sections which can parsed by the pattern @sep@ will be parsed and captured -- as 'Right'. -- * Non-matching sections of the stream will be captured in 'Left'. -- -- The output list also has these properties: -- -- * If the input is @""@ then the output list will be @[]@. -- * If there are no pattern matches, then -- the entire input stream will be returned as one non-matching 'Left' section. -- * The output list will not contain two consecutive 'Left' sections. -- -- ==== Zero-width matches forbidden -- -- If the pattern matching parser @sep@ would succeed without consuming any -- input then 'sepCap' will force it to fail. -- If we allow @sep@ to match a zero-width pattern, -- then it can match the same zero-width pattern again at the same position -- on the next iteration, which would result in an infinite number of -- overlapping pattern matches. sepCap :: forall e s m a. (MonadParsec e s m) => m a -- ^ The pattern matching parser @sep@ -> m [Either (Tokens s) a] -- ^ parser sepCap sep = (fmap.fmap) (first $ tokensToChunk (Proxy::Proxy s)) $ fmap sequenceLeft $ many $ fmap Right (try nonZeroSep) <|> fmap Left anySingle where sequenceLeft :: [Either l r] -> [Either [l] r] sequenceLeft = {-# SCC sequenceLeft #-} foldr consLeft [] where consLeft :: Either l r -> [Either [l] r] -> [Either [l] r] consLeft (Left l) ((Left ls):xs) = {-# SCC consLeft #-} (Left (l:ls)):xs consLeft (Left l) xs = {-# SCC consLeft #-} (Left [l]):xs consLeft (Right r) xs = {-# SCC consLeft #-} (Right r):xs -- If sep succeeds and consumes 0 input tokens, we must force it to fail, -- otherwise infinite loop nonZeroSep = {-# SCC nonZeroSep #-} do offset1 <- getOffset x <- {-# SCC sep #-} sep offset2 <- getOffset when (offset1 >= offset2) empty return x {-# INLINE [1] sepCap #-} -- https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#specialisation -- What we're missing here is a rule that can pick up non-ParsecT instances -- of MonadParsec for GHC < 8.8. #if MIN_VERSION_GLASGOW_HASKELL(8,8,1,0) {-# RULES "sepCap/ByteString" [2] forall e. forall. sepCap @e @B.ByteString = sepCapByteString @e @B.ByteString #-} {-# RULES "sepCap/Text" [2] forall e. forall. sepCap @e @T.Text = sepCapText @e @T.Text #-} #elif MIN_VERSION_GLASGOW_HASKELL(8,0,2,0) {-# RULES "sepCap/ByteString" [2] forall (pa :: ParsecT e B.ByteString m a). sepCap @e @B.ByteString @(ParsecT e B.ByteString m) @a pa = sepCapByteString @e @B.ByteString @(ParsecT e B.ByteString m) @a pa #-} {-# RULES "sepCap/Text" [2] forall (pa :: ParsecT e T.Text m a). sepCap @e @T.Text @(ParsecT e T.Text m) @a pa = sepCapText @e @T.Text @(ParsecT e T.Text m) @a pa #-} #endif -- | -- === Find all occurences, parse and capture pattern matches -- -- Parser combinator for finding all occurences of a pattern in a stream. -- -- Will call 'sepCap' with the 'Text.Megaparsec.match' combinator so that -- the text which matched the pattern parser @sep@ will be returned in -- the 'Right' sections, along with the result of the parse of @sep@. -- -- Definition: -- -- @ -- findAllCap sep = 'sepCap' ('Text.Megaparsec.match' sep) -- @ findAllCap :: MonadParsec e s m => m a -- ^ The pattern matching parser @sep@ -> m [Either (Tokens s) (Tokens s, a)] -- ^ parser findAllCap sep = sepCap (match sep) {-# INLINABLE findAllCap #-} {-# DEPRECATED findAllCap "replace with `findAllCap sep = sepCap (match sep)`" #-} -- | -- === Find all occurences -- -- Parser combinator for finding all occurences of a pattern in a stream. -- -- Will call 'sepCap' with the 'Text.Megaparsec.match' combinator and -- return the text which matched the pattern parser @sep@ in -- the 'Right' sections. -- -- Definition: -- -- @ -- findAll sep = (fmap.fmap) ('Data.Bifunctor.second' fst) $ 'sepCap' ('Text.Megaparsec.match' sep) -- @ findAll :: MonadParsec e s m => m a -- ^ The pattern matching parser @sep@ -> m [Either (Tokens s) (Tokens s)] -- ^ parser findAll sep = (fmap.fmap) (second fst) $ sepCap (match sep) {-# INLINABLE findAll #-} {-# DEPRECATED findAll "replace with `findAll sep = (fmap.fmap) (second fst) $ sepCap (match sep)`" #-} replace-megaparsec-1.5.0.1/src/Replace/Megaparsec/Internal/0000755000000000000000000000000007346545000021552 5ustar0000000000000000replace-megaparsec-1.5.0.1/src/Replace/Megaparsec/Internal/ByteString.hs0000644000000000000000000000657207346545000024212 0ustar0000000000000000-- | -- Module : Replace.Megaparsec.Internal.ByteString -- Copyright : ©2019 James Brock -- License : BSD2 -- Maintainer: James Brock -- -- This internal module is for 'Data.ByteString.ByteString' specializations. -- -- The functions in this module are intended to be chosen automatically -- by rewrite rules in the "Replace.Megaparsec" module, so you should never -- need to import this module. -- -- Names in this module may change without a major version increment. {-# LANGUAGE RankNTypes #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} module Replace.Megaparsec.Internal.ByteString ( -- * Parser combinator sepCapByteString , anyTillByteString ) where import Control.Monad import qualified Data.ByteString as B import Text.Megaparsec {-# INLINE [1] sepCapByteString #-} sepCapByteString :: forall e s m a. (MonadParsec e s m, s ~ B.ByteString) => m a -- ^ The pattern matching parser @sep@ -> m [Either (Tokens s) a] sepCapByteString sep = getInput >>= go where -- the go function will search for the first pattern match, -- and then capture the pattern match along with the preceding -- unmatched string, and then recurse. -- restBegin is the rest of the buffer after the last pattern -- match. go restBegin = do (<|>) ( do restThis <- getInput -- About 'thisiter': -- It looks stupid and introduces a completely unnecessary -- Maybe, but when I refactor to eliminate 'thisiter' and -- the Maybe then the benchmarks get dramatically worse. thisiter <- (<|>) ( do x <- try sep restAfter <- getInput -- Don't allow a match of a zero-width pattern when (B.length restAfter >= B.length restThis) empty pure $ Just (x, restAfter) ) (anySingle >> pure Nothing) case thisiter of (Just (x, restAfter)) | B.length restThis < B.length restBegin -> do -- we've got a match with some preceding unmatched string let unmatched = B.take (B.length restBegin - B.length restThis) restBegin (Left unmatched:) <$> (Right x:) <$> go restAfter (Just (x, restAfter)) -> do -- we're got a match with no preceding unmatched string (Right x:) <$> go restAfter Nothing -> go restBegin -- no match, try again ) ( do -- We're at the end of the input, so return -- whatever unmatched string we've got since offsetBegin if B.length restBegin > 0 then pure [Left restBegin] else pure [] ) {-# INLINE [1] anyTillByteString #-} anyTillByteString :: forall e s m a. (MonadParsec e s m, s ~ B.ByteString) => m a -- ^ The pattern matching parser @sep@ -> m (Tokens s, a) anyTillByteString sep = do begin <- getInput (end, x) <- go pure (B.take (B.length begin - B.length end) begin, x) where go = do end <- getInput r <- optional $ try sep case r of Nothing -> anySingle >> go Just x -> pure (end, x) replace-megaparsec-1.5.0.1/src/Replace/Megaparsec/Internal/Text.hs0000644000000000000000000000657407346545000023046 0ustar0000000000000000-- | -- Module : Replace.Megaparsec.Internal.Text -- Copyright : ©2019 James Brock -- License : BSD2 -- Maintainer: James Brock -- -- This internal module is for 'Data.Text.Text' specializations. -- -- The functions in this module are intended to be chosen automatically -- by rewrite rules in the "Replace.Megaparsec" module, so you should never -- need to import this module. -- -- Names in this module may change without a major version increment. {-# LANGUAGE RankNTypes #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} module Replace.Megaparsec.Internal.Text ( -- * Parser combinator sepCapText , anyTillText ) where import Control.Monad import qualified Data.Text as T import Data.Text.Internal (Text(..)) import Text.Megaparsec {-# INLINE [1] sepCapText #-} sepCapText :: forall e s m a. (MonadParsec e s m, s ~ T.Text) => m a -- ^ The pattern matching parser @sep@ -> m [Either (Tokens s) a] sepCapText sep = getInput >>= go where -- the go function will search for the first pattern match, -- and then capture the pattern match along with the preceding -- unmatched string, and then recurse. -- restBegin is the rest of the buffer after the last pattern -- match. go restBegin@(Text tarray beginIndx beginLen) = do (<|>) ( do (Text _ _ thisLen) <- getInput -- About 'thisiter': -- It looks stupid and introduces a completely unnecessary -- Maybe, but when I refactor to eliminate 'thisiter' and -- the Maybe then the benchmarks get dramatically worse. thisiter <- (<|>) ( do x <- try sep restAfter@(Text _ _ afterLen) <- getInput -- Don't allow a match of a zero-width pattern when (afterLen >= thisLen) empty pure $ Just (x, restAfter) ) (anySingle >> pure Nothing) case thisiter of (Just (x, restAfter)) | thisLen < beginLen -> do -- we've got a match with some preceding unmatched string let unmatched = Text tarray beginIndx (beginLen - thisLen) (Left unmatched:) <$> (Right x:) <$> go restAfter (Just (x, restAfter)) -> do -- we're got a match with no preceding unmatched string (Right x:) <$> go restAfter Nothing -> go restBegin -- no match, try again ) ( do -- We're at the end of the input, so return -- whatever unmatched string we've got since offsetBegin if beginLen > 0 then pure [Left restBegin] else pure [] ) {-# INLINE [1] anyTillText #-} anyTillText :: forall e s m a. (MonadParsec e s m, s ~ T.Text) => m a -- ^ The pattern matching parser @sep@ -> m (Tokens s, a) anyTillText sep = do (Text tarray beginIndx beginLen) <- getInput (thisLen, x) <- go pure (Text tarray beginIndx (beginLen - thisLen), x) where go = do (Text _ _ thisLen) <- getInput r <- optional $ try sep case r of Nothing -> anySingle >> go Just x -> pure (thisLen, x) replace-megaparsec-1.5.0.1/tests/0000755000000000000000000000000007346545000014727 5ustar0000000000000000replace-megaparsec-1.5.0.1/tests/Test.hs0000644000000000000000000000034107346545000016200 0ustar0000000000000000{-# LANGUAGE BlockArguments #-} module Main ( main ) where import Test.Hspec import TestString import TestByteString import TestText main :: IO () main = hspec do TestString.tests TestByteString.tests TestText.tests replace-megaparsec-1.5.0.1/tests/TestByteString.hs0000644000000000000000000000704307346545000020221 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE CPP #-} {-# LANGUAGE BlockArguments #-} module TestByteString ( tests ) where import Replace.Megaparsec import Text.Megaparsec import Text.Megaparsec.Byte import Text.Megaparsec.Byte.Lexer import Data.Void import qualified Data.ByteString as B import Data.ByteString.Internal (c2w) import GHC.Word import Test.Hspec (describe, shouldBe, it, SpecWith) type Parser = Parsec Void B.ByteString findAllCap' :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s, a)] findAllCap' sep = sepCap (match sep) tests :: SpecWith () tests = describe "input ByteString" do runParserTest "findAll upperChar" (findAllCap' (upperChar :: Parser Word8)) ("aBcD" :: B.ByteString) [Left "a", Right ("B", c2w 'B'), Left "c", Right ("D", c2w 'D')] -- check that sepCap can progress even when parser consumes nothing -- and succeeds. runParserTest "zero-consumption parser" (sepCap (many (upperChar :: Parser Word8))) ("aBcD" :: B.ByteString) [Left "a", Right [c2w 'B'], Left "c", Right [c2w 'D']] runParserTest "scinum" (sepCap scinum) "1E3" [Right (1,3)] runParserTest "getOffset" (sepCap offsetA) "xxAxx" [Left "xx", Right 2, Left "xx"] runParserTest "monad fail" (sepCap (fail "" :: Parser ())) "xxx" [Left "xxx"] #if MIN_VERSION_GLASGOW_HASKELL(8,6,0,0) runParserTest "read fail" (sepCap (return (read "a" :: Int) :: Parser Int)) "a" [Left "a"] #endif runParserTest "empty input" (sepCap (fail "" :: Parser ())) "" [] streamEditTest "x to o" (string "x" :: Parser B.ByteString) (const "o") "x x x" "o o o" streamEditTest "x to o inner" (string "x" :: Parser B.ByteString) (const "o") " x x x " " o o o " streamEditTest "ordering" (string "456" :: Parser B.ByteString) (const "ABC") "123456789" "123ABC789" streamEditTest "empty input" (match (fail "" :: Parser ())) fst "" "" breakCapTest "basic" (upperChar :: Parser Word8) "aAa" (Just ("a", c2w 'A', "a")) breakCapTest "first" (upperChar :: Parser Word8) "Aa" (Just ("", c2w 'A', "a")) breakCapTest "last" (upperChar :: Parser Word8) "aA" (Just ("a", c2w 'A', "")) breakCapTest "fail" (upperChar :: Parser Word8) "aaa" Nothing breakCapTest "match" (match (upperChar :: Parser Word8)) "aAa" (Just ("a", ("A",c2w 'A'), "a")) breakCapTest "zero-width" (lookAhead (upperChar :: Parser Word8)) "aAa" (Just ("a", c2w 'A', "Aa")) breakCapTest "empty input" (upperChar :: Parser Word8) "" Nothing breakCapTest "empty input zero-width" (return () :: Parser ()) "" (Just ("", (), "")) where runParserTest nam p input expected = it nam $ shouldBe (runParser p "" input) (Right expected) streamEditTest nam sep editor input expected = it nam $ shouldBe (streamEdit sep editor input) expected breakCapTest nam sep input expected = it nam $ shouldBe (breakCap sep input) expected scinum :: Parser (Double, Integer) scinum = do -- This won't parse mantissas that contain a decimal point, -- but if we use the Text.Megaparsec.Byte.Lexer.float, then it consumes -- the "E" and the exponent. Whatever, doesn't really matter for this test. m <- (fromIntegral :: Integer -> Double) <$> decimal _ <- chunk "E" e <- decimal return (m, e) offsetA :: Parser Int offsetA = getOffset <* chunk "A" replace-megaparsec-1.5.0.1/tests/TestString.hs0000644000000000000000000000672607346545000017404 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE CPP #-} {-# LANGUAGE BlockArguments #-} module TestString ( tests ) where import Replace.Megaparsec import Text.Megaparsec import Text.Megaparsec.Char import Data.Void import Data.Bifunctor (second) import Test.Hspec (describe, shouldBe, it, SpecWith) type Parser = Parsec Void String findAllCap' :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s, a)] findAllCap' sep = sepCap (match sep) findAll' :: MonadParsec e s f => f b -> f [Either (Tokens s) (Tokens s)] findAll' sep = (fmap.fmap) (second fst) $ sepCap (match sep) tests :: SpecWith () tests = describe "input String" do runParserTest "findAll upperChar" (findAllCap' (upperChar :: Parser Char)) ("aBcD" :: String) [Left "a", Right ("B", 'B'), Left "c", Right ("D", 'D')] -- check that sepCap can progress even when parser consumes nothing -- and succeeds. runParserTest "zero-consumption parser" (sepCap (many (upperChar :: Parser Char))) ("aBcD" :: String) [Left "a", Right "B", Left "c", Right "D"] runParserTest "scinum" (sepCap scinum) "1E3" [Right (1,3)] runParserTest "getOffset" (sepCap offsetA) "xxAxx" [Left "xx", Right 2, Left "xx"] runParserTest "monad fail" (sepCap (fail "" :: Parser ())) "xxx" [Left "xxx"] #if MIN_VERSION_GLASGOW_HASKELL(8,6,0,0) runParserTest "read fail" (sepCap (return (read "a" :: Int) :: Parser Int)) ("a") ([Left "a"]) #endif runParserTest "findAll astral" (findAll' (takeWhileP Nothing (=='𝅘𝅥𝅯') :: Parser String)) "𝄞𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥" [Left "𝄞𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥", Right "𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯", Left "𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥"] runParserTest "empty input" (sepCap (fail "" :: Parser ())) "" [] streamEditTest "x to o" (string "x" :: Parser String) (const "o") "x x x" "o o o" streamEditTest "x to o inner" (string "x" :: Parser String) (const "o") " x x x " " o o o " streamEditTest "ordering" (string "456" :: Parser String) (const "ABC") "123456789" "123ABC789" streamEditTest "empty input" (match (fail "" :: Parser ())) fst "" "" breakCapTest "basic" (upperChar :: Parser Char) "aAa" (Just ("a", 'A', "a")) breakCapTest "first" (upperChar :: Parser Char) "Aa" (Just ("", 'A', "a")) breakCapTest "last" (upperChar :: Parser Char) "aA" (Just ("a", 'A', "")) breakCapTest "fail" (upperChar :: Parser Char) "aaa" Nothing breakCapTest "match" (match (upperChar :: Parser Char)) "aAa" (Just ("a", ("A",'A'), "a")) breakCapTest "zero-width" (lookAhead (upperChar :: Parser Char)) "aAa" (Just ("a", 'A', "Aa")) breakCapTest "empty input" (upperChar :: Parser Char) "" Nothing breakCapTest "empty input zero-width" (return () :: Parser ()) "" (Just ("", (), "")) where runParserTest nam p input expected = it nam $ shouldBe (runParser p "" input) (Right expected) streamEditTest nam sep editor input expected = it nam $ shouldBe (streamEdit sep editor input) expected breakCapTest nam sep input expected = it nam $ shouldBe (breakCap sep input) expected scinum :: Parser (Double, Integer) scinum = do m <- some digitChar _ <- chunk "E" e <- some digitChar return (read m, read e) offsetA :: Parser Int offsetA = getOffset <* chunk "A" replace-megaparsec-1.5.0.1/tests/TestText.hs0000644000000000000000000000774307346545000017062 0ustar0000000000000000{-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE CPP #-} {-# LANGUAGE BlockArguments #-} module TestText ( tests ) where import Replace.Megaparsec import Text.Megaparsec import Text.Megaparsec.Char import Data.Void import qualified Data.Text as T import Data.Bifunctor (second) import Test.Hspec (describe, shouldBe, it, SpecWith) type Parser = Parsec Void T.Text findAllCap' :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s, a)] findAllCap' sep = sepCap (match sep) findAll' :: MonadParsec e s f => f b -> f [Either (Tokens s) (Tokens s)] findAll' sep = (fmap.fmap) (second fst) $ sepCap (match sep) tests :: SpecWith () tests = describe "input Text" do runParserTest "findAll upperChar" (findAllCap' (upperChar :: Parser Char)) ("aBcD" :: T.Text) [Left "a", Right ("B", 'B'), Left "c", Right ("D", 'D')] -- check that sepCap can progress even when parser consumes nothing -- and succeeds. runParserTest "zero-consumption parser" (sepCap (many (upperChar :: Parser Char))) ("aBcD" :: T.Text) [Left "a", Right "B", Left "c", Right "D"] runParserTest "scinum" (sepCap scinum) ("1E3") ([Right (1,3)]) runParserTest "getOffset" (sepCap offsetA) ("xxAxx") ([Left "xx", Right 2, Left "xx"]) runParserTest "monad fail" (sepCap (fail "" :: Parser ())) ("xxx") ([Left "xxx"]) #if MIN_VERSION_GLASGOW_HASKELL(8,6,0,0) runParserTest "read fail" (sepCap (return (read "a" :: Int) :: Parser Int)) ("a") ([Left "a"]) #endif runParserTest "findAll astral" (findAll' (takeWhileP Nothing (=='𝅘𝅥𝅯') :: Parser T.Text)) ("𝄞𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥" :: T.Text) [Left "𝄞𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥", Right "𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯𝅘𝅥𝅯", Left "𝅘𝅥𝅘𝅥𝅘𝅥𝅘𝅥"] runParserTest "empty input" (sepCap (fail "" :: Parser ())) "" [] streamEditTest "x to o" (string "x" :: Parser T.Text) (const "o") "x x x" "o o o" streamEditTest "x to o inner" (string "x" :: Parser T.Text) (const "o") " x x x " " o o o " streamEditTest "ordering" (string "456" :: Parser T.Text) (const "ABC") "123456789" "123ABC789" streamEditTest "empty input" (match (fail "" :: Parser ())) fst "" "" breakCapTest "basic" (upperChar :: Parser Char) "aAa" (Just ("a", 'A', "a")) breakCapTest "first" (upperChar :: Parser Char) "Aa" (Just ("", 'A', "a")) breakCapTest "last" (upperChar :: Parser Char) "aA" (Just ("a", 'A', "")) breakCapTest "fail" (upperChar :: Parser Char) "aaa" Nothing breakCapTest "match" (match (upperChar :: Parser Char)) "aAa" (Just ("a", ("A",'A'), "a")) breakCapTest "zero-width" (lookAhead (upperChar :: Parser Char)) "aAa" (Just ("a", 'A', "Aa")) breakCapTest "empty input" (upperChar :: Parser Char) "" Nothing breakCapTest "empty input zero-width" (return () :: Parser ()) "" (Just ("", (), "")) -- I was unable to write a failing test for -- https://github.com/jamesdbrock/replace-megaparsec/issues/33 -- but adding this "sep backtrack" test anyway splitCapTest "sep backtrack" (match $ between (string "{{") (string "}}") (T.pack <$> many (alphaNumChar <|> spaceChar) :: Parser T.Text)) "{{foo.}}" [Left "{{foo.}}"] where runParserTest nam p input expected = it nam $ shouldBe (runParser p "" input) (Right expected) streamEditTest nam sep editor input expected = it nam $ shouldBe (streamEdit sep editor input) expected breakCapTest nam sep input expected = it nam $ shouldBe (breakCap sep input) expected splitCapTest nam sep input expected = it nam $ shouldBe (splitCap sep input) expected scinum :: Parser (Double, Integer) scinum = do m <- some digitChar _ <- chunk "E" e <- some digitChar return (read m, read e) offsetA :: Parser Int offsetA = getOffset <* chunk "A"