widestring-1.1.0/.cargo_vcs_info.json0000644000000001360000000000100132120ustar { "git": { "sha1": "c73148638884c4fc65cc475d88d57c88cebccc3f" }, "path_in_vcs": "" }widestring-1.1.0/.reuse/dep5000064400000000000000000000003541046102023000137650ustar 00000000000000Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Copyright: Kathryn Long License: MIT OR Apache-2.0 Files: * Copyright: 2021 Kathryn Long License: MIT OR Apache-2.0 widestring-1.1.0/CHANGELOG.md000064400000000000000000000411301046102023000136120ustar 00000000000000# Changelog The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [1.1.0] - 2022-04-06 ### Added - `Utf32String::into_char_vec` and missing conversion to `Vec` for `Utf32String`. Fixes [#37]. - `include_utf16str!` macro to include UTF-16 file at compile-time as `Utf16Str`. By [@daxpedda]. ### Fixed - `U16String::pop_char` panics with surrogate string. Fixes [#38]. - Various import warnings and new clippy warnings, plus stabilized debugger visualizer warnings. ## [1.0.2] - 2022-07-15 ### Fixed - Correctly check for and error on nul values in C-string macros `u16cstr!`, `u32cstr!`, and `widecstr!`. Fixes [#28]. ## [1.0.1] - 2022-06-24 ### Fixed - Reduce collision potential for macros. By [@OpenByteDev]. ## [1.0.0] - 2022-06-21 ### Changed - **Breaking Change** Minimum supported Rust version is now 1.58. - Added `#[must_use]` attributes to many crate functions, as appropriate. - Remove `unsafe` qualifiers from `as_mut_ptr` and `as_mut_ptr_range` to match standard library. By [@yescallop]. ### Added - Added `new` function that creates and empty string to `U16CString` and `U32CString` to match other string types. - Additional `From` implementations for conversion to `OsString`. ## [1.0.0-beta.1] - 2021-11-08 ### Changed - **Breaking Change** Minimum supported Rust version is now 1.56. - **Breaking Change** The following methods on `U16String` and `U32String` have been renamed and replaced by functions with different semantics: - `pop` is now `pop_char` - `remove` is now `remove_char` - `insert` is now `insert_char` - **Breaking Change** Moved and renamed the following iterator types: - `iter::Utf16Chars` renamed to `CharsUtf16` and moved to `ustr` and `ucstr` - `iter::Utf32Chars` renamed to `CharsUtf32` and moved to `ustr` and `ucstr` - `iter::CharsLossy` split and renamed to `CharsLossyUtf16` and `CharsLossyUtf32` and moved to `ustr` and `ucstr` - `iter::Utf16CharIndices` renamed to `CharIndicesUtf16` and moved to `ustr` and `ucstr` - `iter::Utf16CharIndicesLossy` renamed to `CharIndicesLossyUtf16` and moved to `ustr` and `ucstr` - **Breaking Change** `error::FromUtf16Error` and `error::FromUtf32Error` has been renamed to `Utf16Error` and `Utf32Error` respectively and expanded with more details about the error. - Migrated crate to Rust 2021 edition. - The following methods on `U16Str` and `U32Str` are now `const`: - `from_slice` - `as_slice` - `as_ptr` - `len` - `is_empty` - The following methods on `U16CStr` and `U32CStr` are now `const`: - `from_slice_unchecked` - `as_slice_with_nul` - `as_ptr` - `len` - `is_empty` - The following methods on `U16String` and `U32String` are now `const`: - `new` ### Added - Added new UTF-encoded string types and associated types: - `Utf16Str` - `Utf32Str` - `Utf16String` - `Utf32String` - Added macros to convert string literals into `const` wide string slices: - `u16str!` - `u16cstr!` - `u32str!` - `u32cstr!` - `widestr!` - `widecstr!` - `utf16str!` - `utf32str!` - Added `NUL_TERMINATOR` associated constant to `U16CStr`, `U32CStr`, `U16CString`, and `U32CString`. - Added `DoubleEndedIterator` and `ExactSizeIterator` implementations to a number of iterator types. - Added new UTF encoding functions alongside existing decode functions: - `encode_utf8` - `encode_utf16` - `encode_utf32` - Added various methods: - `repeat` on `U16Str`, `U32Str`, `U16CStr`, and `U32CStr` - `shrink_to` on `U16String` and `U32String` - `retain` on `U16String` and `U32String` - `drain` on `U16String` and `U32String` - `replace_range` on `U16String` and `U32String` - `get`, `get_mut`, `get_unchecked`, and `get_unchecked_mut` on `U16CStr` and `U32CStr` - `split_at` and `split_at_mut` on `U16CStr` and `U32CStr` - Added more trait implementations. ### Removed - **Breaking Change** Functions and types deprecated in 0.5 have been removed. - **Breaking Change** The following types and traits, which were implementation details, have been removed. Use the existing non-generic types instead (e.g. use `U16Str` instead of `UStr`). - `UChar` - `UStr` - `UCStr` - `UString` - `UCString` - **Breaking Change** Removed `IndexMut` trait implementation of `U16CString` and `U32CString`. Use the unsafe `get_mut` method instead, which also supports more ranges. ### Fixed - **Breaking Change** The iterator returned by `U16Str::char_indices` and `U16CStr::char_indices` is now over `(usize, Result)` tuples instead of the reverse order, to better match standard library string iterators. The same is true of `U16Str::char_indices_lossy` and `U16CStr::char_indices_lossy`. This matches what was stated in original documentation. - `U32Str::to_string` and `U32CStr::to_string` now only allocate once instead of twice. ## [0.5.1] - 2021-10-23 ### Fixed - Fixed a regression in 0.5.0 where zero-length vectors and strings were incorrectly causing panics in `UCString::from_vec` and `UCString::from_str`. Fixes [#22]. - Modified an implentation detail in `ustr::to_string` & `ustr::to_string_lossy` to remove possibly unsafe behaviour. ## [0.5.0] - 2021-10-12 ### Changed - **Breaking Change** Minimum supported Rust version is now 1.48. - **Breaking Change** Renamed a number of types and functions to increase consistency and clarity. This also meant renaming errors to more clearly convey error and trying to be more consistent with name conventions and functionality across types. Check renamed function docs for any changes in functionality, as there have been some minor tweaks (mostly relaxing/removing error conditions and reducing panics). Old names have been deprecated to ease transition and will be removed in a future release. Fixes [#18]. - `MissingNulError` => `error::MissingNulTerminator` - `FromUtf32Error` => `error::FromUtf32Error` - `NulError` => `error::ContainsNul` - `UCStr::from_ptr_with_nul` => `from_ptr_unchecked` - `UCStr::from_slice_with_nul` => `from_slice_truncate` - `UCStr::from_slice_with_nul_unchecked` => `from_slice_unchecked` - `U32CStr::from_char_ptr_with_nul` => `from_char_ptr_unchecked` - `U32CStr::from_char_slice_with_nul` => `from_char_slice_truncate` - `U32CStr::from_char_slice_with_nul_unchecked` => `from_char_slice_unchecked` - `UCString::new` => `from_vec` - `UCString::from_vec_with_nul` => `from_vec_truncate` - `UCString::from_ustr_with_nul` => `from_ustr_truncate` - `UCString::from_ptr_with_nul` => `from_ptr_truncate` - `UCString::from_str_with_nul` => `from_str_truncate` - `UCString::from_os_str_with_nul` => `from_os_str_truncate` - `U32CString::from_chars_with_nul` => `from_chars_truncate` - `U32CString::from_char_ptr_with_nul` => `from_char_ptr_truncate` - Improved implementations in some areas to reduce unncessary double allocations. - Improved `Debug` implementations. No more debugging lists of raw integer values. - Migrated crate to Rust 2018 edition. - Made crate package [REUSE compliant](https://reuse.software/). - Improved documentation and used intra-doc links. ### Added - Added crate-level functions `decode_utf16`, `decode_utf16_lossy`, `decode_utf32`, and `decode_utf32_lossy` and associated iterators. Note that `decode_utf16` is an alias of `core::char::decode_utf16`, but provided for consistency. - Added `display` method to to both `UStr` and `UCStr` to display strings in formatting without heap allocations, similar to `Path::display`. Fixes [#20]. - Added more trait implementations, including more index operations and string formatting via `Write` trait. Fixes [#19]. - Added new functions: - `UStr::from_ptr_mut` - `UStr::from_slice_mut` - `UStr::as_mut_slice` - `UStr::as_mut_ptr` - `UStr::as_ptr_range` - `UStr::as_mut_ptr_range` - `UStr::get` - `UStr::get_mut` - `UStr::get_unchecked` - `UStr::get_unchecked_mut` - `UStr::split_at` - `UStr::split_at_mut` - `UStr::chars` - `UStr::chars_lossy` - `U16Str::char_indices` - `U16Str::char_indices_lossy` - `U32Str::from_char_ptr_mut` - `U32Str::from_char_slice_mut` - `UCStr::from_ptr` - `UCStr::from_ptr_truncate` - `UCStr::from_slice` - `UCStr::as_ustr` - `UCStr::from_ptr_str_mut` - `UCStr::from_ptr_mut` - `UCStr::from_ptr_truncate_mut` - `UCStr::from_ptr_unchecked_mut` - `UCStr::from_slice_mut` - `UCStr::from_slice_truncate_mut` - `UCStr::from_slice_unchecked_mut` - `UCStr::as_mut_slice` - `UCStr::as_mut_ptr` - `UCStr::as_ustr_with_nul` - `UCStr::as_mut_ustr` - `UCStr::as_ptr_range` - `UCStr::as_mut_ptr_range` - `UCStr::chars` - `UCStr::chars_lossy` - `U16CStr::char_indices` - `U16CStr::char_indices_lossy` - `U32CStr::from_char_ptr_str_mut` - `U32CStr::from_char_ptr_mut` - `U32CStr::from_char_ptr_truncate_mut` - `U32CStr::from_char_ptr_unchecked_mut` - `U32CStr::from_char_slice_mut` - `U32CStr::from_char_slice_truncate_mut` - `U32CStr::from_char_slice_unchecked_mut` - `U32CStr::from_char_ptr` - `U32CStr::from_char_ptr_truncate` - `U32CStr::from_char_slice` - `UString::as_vec` - `UString::as_mut_vec` - `UString::push_char` - `UString::truncate` - `UString::pop` - `UString::remove` - `UString::insert` - `UString::insert_ustr` - `UString::split_off` - `UCString::as_mut_ucstr` - `UCString::into_ustring` - `UCString::into_ustring_with_nul` - `U32CString::from_char_ptr_str` ### Deprecated - Deprecated functions as part of simplifying to increase clarity. These will be removed entirely in a future release. - `MissingNulError`. Use `error::MissingNulTerminator` instead. - `FromUtf32Error`. Use `error::FromUtf32Error` instead. - `NulError`. Use `error::ContainsNul` instead. - `UCStr::from_ptr_with_nul`. Use `from_ptr_unchecked` instead. - `UCStr::from_slice_with_nul`. Use `from_slice_truncate` instead. - `UCStr::from_slice_with_nul_unchecked`. Use `from_slice_unchecked` instead. - `U32CStr::from_char_ptr_with_nul`. Use `from_char_ptr_unchecked` instead. - `U32CStr::from_char_slice_with_nul`. Use `from_char_slice_truncate` instead. - `U32CStr::from_char_slice_with_nul_unchecked`. Use `from_char_slice_unchecked` instead. - `UCString::new`. Use `from_vec` instead. - `UCString::from_vec_with_nul_unchecked`. Use `from_vec_unchecked` instead. - `UCString::from_ustr_with_nul_unchecked`. Use `from_ustr_unchecked` instead. - `UCString::from_ptr_with_nul_unchecked`. Use `from_ptr_unchecked` instead. - `UCString::from_str_with_nul_unchecked`. Use `from_str_unchecked` instead. - `UCString::from_os_str_with_nul_unchecked`. Use `from_os_str_unchecked` instead. - `UCString::from_vec_with_nul`. Use `from_vec_truncate` instead. - `UCString::from_ustr_with_nul`. Use `from_ustr_truncate` instead. - `UCString::from_ptr_with_nul`. Use `from_ptr_truncate` instead. - `UCString::from_str_with_nul`. Use `from_str_truncate` instead. - `UCString::from_os_str_with_nul`. Use `from_os_str_truncate` instead. - `U32CString::from_chars_with_nul_unchecked`. Use `from_chars_unchecked` instead. - `U32CString::from_char_ptr_with_nul_unchecked`. Use `from_char_ptr_unchecked` instead. - `U32CString::from_chars_with_nul`. Use `from_chars_truncate` instead. - `U32CString::from_char_ptr_with_nul`. Use `from_char_ptr_truncate` instead. - Deprecated error types in the crate root. Use the errors directly from `error` module instead. ## [0.4.3] - 2020-10-05 ### Fixed - Fixed undefined behaviours and cleaned up clippy warnings. By [@joshwd36]. ## [0.4.2] - 2020-06-09 ### Fixed - Fixed compile errors on pre-1.36.0 Rust due to unstable `alloc` crate. Minimum supported version is Rust 1.34.2, the rust version for Debian stable. Fixes [#14]. ## [0.4.1] - 2020-06-08 ### ***Yanked*** ### Changed - Now supports `no_std`. Added the `std` and `alloc` features, enabled by default. `U16String`, `U32String`, `U16CString`, and `U32CString` and their aliases all require the `alloc` or `std` feature. By [@nicbn]. ## [0.4.0] - 2018-08-18 ### Added - New `U32String`, `U32Str`, `U32CString`, and `U32CStr` types for dealing with UTF-32 FFI. These new types are roughly equivalent to the existing UTF-16 types. - `WideChar` is a type alias to `u16` on Windows but `u32` on non-Windows platforms. - The generic types `UString`, `UStr`, `UCString` and `UCStr` are used to implement the string types. ### Changed - **Breaking Change** Existing wide string types have been renamed to `U16String`, `U16Str`, `U16CString`, and `U16CStr` (previously `WideString`, `WideStr`, etc.). Some function have also been renamed to reflect this change (`wide_str` to `u16_str`, etc.). - **Breaking Change** `WideString`, `WideStr`, `WideCString`, and `WideCStr` are now type aliases that vary between platforms. On Windows, these are aliases to the `U16` types and are equivalent to the previous version, but on non-Windows platforms these alias the new `U32` types instead. See crate documentation for more details. ## [0.3.0] - 2018-03-17 ### Added - Additional unchecked functions on `WideCString`. - All types now implement `Default`. - `WideString::shrink_to_fit` - `WideString::into_boxed_wide_str` and `Box::into_wide_string`. - `WideCString::into_boxed_wide_c_str` and `Box::into_wide_c_string`. - `From` and `Default` implementations for boxed `WideStr` and boxed `WideCStr`. ### Changed - Renamed `WideCString::from_vec` to replace `WideCString::new`. To create empty string, use `WideCString::default()` now. - `WideCString` now implements `Drop`, which sets the string to an empty string to prevent invalid unsafe code from working correctly when it should otherwise break. Also see `Drop` implementation of `CString`. - Writing changelog manually. - Upgraded winapi dev dependency. - Now requires at least Rust 1.17+ to compile (previously, was Rust 1.8). ## [0.2.2] - 2016-09-09 ### Fixed - Make `WideCString::into_raw` correctly forget the original self. ## [0.2.1] - 2016-08-12 ### Added - `into_raw`/`from_raw` on `WideCString`. Closes [#2]. ## [0.2.0] - 2016-05-31 ### Added - `Default` trait to wide strings. - Traits for conversion of strings to `Cow`. ### Changed - Methods & traits to bring to parity with Rust 1.9 string APIs. ## 0.1.0 - 2016-02-06 ### Added - Initial release. [#2]: https://github.com/starkat99/widestring-rs/issues/2 [#14]: https://github.com/starkat99/widestring-rs/issues/14 [#18]: https://github.com/starkat99/widestring-rs/issues/18 [#19]: https://github.com/starkat99/widestring-rs/issues/19 [#20]: https://github.com/starkat99/widestring-rs/issues/20 [#22]: https://github.com/starkat99/widestring-rs/issues/22 [#28]: https://github.com/starkat99/widestring-rs/issues/28 [#37]: https://github.com/starkat99/widestring-rs/issues/37 [#38]: https://github.com/starkat99/widestring-rs/issues/38 [@nicbn]: https://github.com/nicbn [@joshwd36]: https://github.com/joshwb36 [@yescallop]: https://github.com/yescallop [@OpenByteDev]: https://github.com/OpenByteDev [@daxpedda]: https://github.com/daxpedda [Unreleased]: https://github.com/starkat99/widestring-rs/compare/v1.1.0...HEAD [1.1.0]: https://github.com/starkat99/widestring-rs/compare/v1.0.2...v1.1.0 [1.0.2]: https://github.com/starkat99/widestring-rs/compare/v1.0.1...v1.0.2 [1.0.1]: https://github.com/starkat99/widestring-rs/compare/v1.0.0...v1.0.1 [1.0.0]: https://github.com/starkat99/widestring-rs/compare/v1.0.0-beta.1...v1.0.0 [1.0.0-beta.1]: https://github.com/starkat99/widestring-rs/compare/v0.5.1...v1.0.0-beta.1 [0.5.1]: https://github.com/starkat99/widestring-rs/compare/v0.5.0...v0.5.1 [0.5.0]: https://github.com/starkat99/widestring-rs/compare/v0.4.3...v0.5.0 [0.4.3]: https://github.com/starkat99/widestring-rs/compare/v0.4.2...v0.4.3 [0.4.2]: https://github.com/starkat99/widestring-rs/compare/v0.4.1...v0.4.2 [0.4.1]: https://github.com/starkat99/widestring-rs/compare/v0.4.0...v0.4.1 [0.4.0]: https://github.com/starkat99/widestring-rs/compare/v0.3.0...v0.4.0 [0.3.0]: https://github.com/starkat99/widestring-rs/compare/v0.2.2...v0.3.0 [0.2.2]: https://github.com/starkat99/widestring-rs/compare/v0.2.1...v0.2.2 [0.2.1]: https://github.com/starkat99/widestring-rs/compare/v0.2.0...v0.2.1 [0.2.0]: https://github.com/starkat99/widestring-rs/compare/v0.1.0...v0.2.0 widestring-1.1.0/Cargo.toml0000644000000031470000000000100112150ustar # THIS FILE IS AUTOMATICALLY GENERATED BY CARGO # # When uploading crates to the registry Cargo will automatically # "normalize" Cargo.toml files for maximal compatibility # with all versions of Cargo and also rewrite `path` dependencies # to registry (e.g., crates.io) dependencies. # # If you are reading this file be aware that the original Cargo.toml # will likely look very different (and much more reasonable). # See Cargo.toml.orig for the original contents. [package] edition = "2021" rust-version = "1.58" name = "widestring" version = "1.1.0" exclude = [ ".git*", ".editorconfig", ] description = "A wide string Rust library for converting to and from wide strings, such as those often used in Windows API or other FFI libaries. Both `u16` and `u32` string types are provided, including support for UTF-16 and UTF-32, malformed encoding, C-style strings, etc." readme = "README.md" keywords = [ "wide", "string", "win32", "utf-16", "utf-32", ] categories = [ "text-processing", "encoding", "development-tools::ffi", "no-std", ] license = "MIT OR Apache-2.0" repository = "https://github.com/starkat99/widestring-rs" [package.metadata.docs.rs] rustc-args = [ "--cfg", "docsrs", ] [[test]] name = "debugger_visualizer" path = "tests/debugger_visualizer.rs" test = false required-features = ["debugger_visualizer"] [dev-dependencies.debugger_test] version = "0.1" [dev-dependencies.debugger_test_parser] version = "0.1" [dev-dependencies.winapi] version = "0.3" features = ["winbase"] [features] alloc = [] debugger_visualizer = ["alloc"] default = ["std"] std = ["alloc"] widestring-1.1.0/Cargo.toml.orig000064400000000000000000000022541046102023000146740ustar 00000000000000[package] name = "widestring" # Remember to keep in sync with html_root_url crate attribute version = "1.1.0" description = "A wide string Rust library for converting to and from wide strings, such as those often used in Windows API or other FFI libaries. Both `u16` and `u32` string types are provided, including support for UTF-16 and UTF-32, malformed encoding, C-style strings, etc." repository = "https://github.com/starkat99/widestring-rs" readme = "README.md" keywords = ["wide", "string", "win32", "utf-16", "utf-32"] categories = ["text-processing", "encoding", "development-tools::ffi", "no-std"] license = "MIT OR Apache-2.0" edition = "2021" rust-version = "1.58" exclude = [".git*", ".editorconfig"] [features] default = ["std"] std = ["alloc"] alloc = [] # Enable to use the #[debugger_visualizer] attribute. Requires Rust 1.71+ debugger_visualizer = ["alloc"] [dev-dependencies] debugger_test = "0.1" debugger_test_parser = "0.1" winapi = { version = "0.3", features = ["winbase"] } [package.metadata.docs.rs] rustc-args = ["--cfg", "docsrs"] [[test]] name = "debugger_visualizer" path = "tests/debugger_visualizer.rs" required-features = ["debugger_visualizer"] test = false widestring-1.1.0/LICENSE000064400000000000000000000000221046102023000130010ustar 00000000000000MIT OR Apache-2.0 widestring-1.1.0/LICENSES/Apache-2.0.txt000064400000000000000000000240501046102023000154270ustar 00000000000000Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. widestring-1.1.0/LICENSES/MIT.txt000064400000000000000000000020661046102023000144050ustar 00000000000000MIT License Copyright (c) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. widestring-1.1.0/Makefile.toml000064400000000000000000000057421046102023000144240ustar 00000000000000[config] min_version = "0.35.0" [env] CI_CARGO_TEST_FLAGS = { value = "--locked -- --nocapture", condition = { env_true = [ "CARGO_MAKE_CI", ] } } CARGO_MAKE_CARGO_ALL_FEATURES = { source = "${CARGO_MAKE_RUST_CHANNEL}", default_value = "--features=std", mapping = { "nightly" = "--all-features" } } CARGO_MAKE_CLIPPY_ARGS = { value = "${CARGO_MAKE_CLIPPY_ALL_FEATURES_WARN}", condition = { env_true = [ "CARGO_MAKE_CI", ] } } # Override for CI flag additions [tasks.test] args = [ "test", "@@remove-empty(CARGO_MAKE_CARGO_VERBOSE_FLAGS)", "@@split(CARGO_MAKE_CARGO_BUILD_TEST_FLAGS, )", "@@split(CI_CARGO_TEST_FLAGS, )", ] # Let clippy run on non-nightly CI [tasks.clippy-ci-flow] condition = { env_set = ["CARGO_MAKE_RUN_CLIPPY"] } # Let format check run on non-nightly CI [tasks.check-format-ci-flow] condition = { env_set = ["CARGO_MAKE_RUN_CHECK_FORMAT"] } [tasks.check-docs] description = "Checks docs for errors." category = "Documentation" install_crate = false env = { RUSTDOCFLAGS = "-D warnings" } command = "cargo" args = [ "doc", "--workspace", "--no-deps", "@@remove-empty(CARGO_MAKE_CARGO_VERBOSE_FLAGS)", "${CARGO_MAKE_CARGO_ALL_FEATURES}", ] # Build & Test with no features enabled [tasks.post-ci-flow] run_task = [{ name = ["check-docs", "build-no-std", "test-no-std", "build-alloc", "test-alloc", "build-debugger-visualizer", "test-debugger-visualizer"]}] [tasks.build-no-std] description = "Build without any features" category = "Build" env = { CARGO_MAKE_CARGO_BUILD_TEST_FLAGS = "--no-default-features" } run_task = "build" [tasks.test-no-std] description = "Run tests without any features" category = "Test" env = { CARGO_MAKE_CARGO_BUILD_TEST_FLAGS = "--no-default-features" } run_task = "test" [tasks.build-alloc] description = "Build with only the alloc feature" category = "Build" env = { CARGO_MAKE_CARGO_BUILD_TEST_FLAGS = "--no-default-features --features=alloc" } run_task = "build" [tasks.test-alloc] description = "Run tests with only the alloc feature" category = "Test" env = { CARGO_MAKE_CARGO_BUILD_TEST_FLAGS = "--no-default-features --features=alloc" } run_task = "test" [tasks.build-debugger-visualizer] condition = { channels = ["nightly"] } description = "Build with only the debugger_visualizer feature" category = "Build" env = { CARGO_MAKE_CARGO_BUILD_TEST_FLAGS = "--no-default-features --features=debugger_visualizer" } run_task = "build" # The debugger_visualizer tests rely on debug information being present. # Update the debuginfo level for this task only. [tasks.test-debugger-visualizer] condition = { channels = ["nightly"] } description = "Run tests with only the debugger_visualizer feature which includes the alloc feature" category = "Test" env = { CARGO_MAKE_CARGO_BUILD_TEST_FLAGS = "--test debugger_visualizer --no-default-features --features=debugger_visualizer", CI_CARGO_TEST_FLAGS = "--locked -- --nocapture --test-threads=1", CARGO_PROFILE_TEST_DEBUG = 2 } run_task = "test" dependencies = ["build-debugger-visualizer"] widestring-1.1.0/README.md000064400000000000000000000045151046102023000132660ustar 00000000000000# widestring [![Crates.io](https://img.shields.io/crates/v/widestring.svg)](https://crates.io/crates/widestring/) [![Documentation](https://docs.rs/widestring/badge.svg)](https://docs.rs/widestring/) ![Crates.io](https://img.shields.io/crates/l/widestring) [![Build status](https://github.com/starkat99/widestring-rs/actions/workflows/rust.yml/badge.svg?branch=main&event=push)](https://github.com/starkat99/widestring-rs/actions/workflows/rust.yml) A wide string Rust library for converting to and from wide strings, such as those often used in Windows API or other FFI libaries. Both `u16` and `u32` string types are provided, including support for UTF-16 and UTF-32, malformed encoding, C-style strings, etc. Macros for converting string literals to UTF-16 and UTF-32 strings at compile time are also included. *Requires Rust 1.58 or greater.* If you need support for older versions of Rust, use 0.x versions of this crate. ## Documentation - [Crate API Reference](https://docs.rs/widestring/) - [Latest Changes](CHANGELOG.md) ### Optional Features - **`alloc`** - Enabled by default. Enable use of the [`alloc`](https://doc.rust-lang.org/alloc/) crate when not using the `std` library. This enables the owned string types and aliases. - **`std`** - Enabled by default. Enable features that depend on the Rust `std` library, including everything in the `alloc` feature. - **`debugger_visualizer`** Add debugger visualizer data for crate types. _Requires Rust 1.71 or newer_ ## License This library is distributed under the terms of either of: * [MIT License](LICENSES/MIT.txt) ([http://opensource.org/licenses/MIT](http://opensource.org/licenses/MIT)) * [Apache License, Version 2.0](LICENSES/Apache-2.0.txt) ([http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)) at your option. This project is [REUSE-compliant](https://reuse.software/spec/). Copyrights are retained by their contributors. Some files may include explicit copyright notices and/or license [SPDX identifiers](https://spdx.dev/ids/). For full authorship information, see the version control history. ### Contributing Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions. widestring-1.1.0/debug_metadata/README.md000064400000000000000000000121641046102023000162130ustar 00000000000000## Debugger Visualizers Many languages and debuggers enable developers to control how a type is displayed in a debugger. These are called "debugger visualizations" or "debugger views". The Windows debuggers (WinDbg\CDB) support defining custom debugger visualizations using the `Natvis` framework. To use Natvis, developers write XML documents using the natvis schema that describe how debugger types should be displayed with the `.natvis` extension. (See: https://docs.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects?view=vs-2019) The Natvis files provide patterns which match type names a description of how to display those types. The Natvis schema can be found either online (See: https://code.visualstudio.com/docs/cpp/natvis#_schema) or locally at `\Xml\Schemas\1033\natvis.xsd`. The GNU debugger (GDB) supports defining custom debugger views using Pretty Printers. Pretty printers are written as python scripts that describe how a type should be displayed when loaded up in GDB/LLDB. (See: https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html#Pretty-Printing) The pretty printers provide patterns, which match type names, and for matching types, descibe how to display those types. (For writing a pretty printer, see: https://sourceware.org/gdb/onlinedocs/gdb/Writing-a-Pretty_002dPrinter.html#Writing-a-Pretty_002dPrinter). ### Embedding Visualizers Through the use of the currently unstable `#[debugger_visualizer]` attribute, the `widestring` crate can embed debugger visualizers into the crate metadata. Currently the two types of visualizers supported are Natvis and Pretty printers. For Natvis files, when linking an executable with a crate that includes Natvis files, the MSVC linker will embed the contents of all Natvis files into the generated `PDB`. For pretty printers, the compiler will encode the contents of the pretty printer in the `.debug_gdb_scripts` section of the `ELF` generated. ### Testing Visualizers The `widestring` crate supports testing debugger visualizers defined for this crate. The entry point for these tests are `tests/test_visualizer.rs`. These tests are defined using the `debugger_test` and `debugger_test_parser` crates. The `debugger_test` crate is a proc macro crate which defines a single proc macro attribute, `#[debugger_test]`. For more detailed information about this crate, see https://crates.io/crates/debugger_test. The CI pipeline for the `widestring` crate has been updated to run the debugger visualizer tests to ensure debugger visualizers do not become broken/stale. The `#[debugger_test]` proc macro attribute may only be used on test functions and will run the function under the debugger specified by the `debugger` meta item. This proc macro attribute has 3 required values: 1. The first required meta item, `debugger`, takes a string value which specifies the debugger to launch. 2. The second required meta item, `commands`, takes a string of new line (`\n`) separated list of debugger commands to run. 3. The third required meta item, `expected_statements`, takes a string of new line (`\n`) separated list of statements that must exist in the debugger output. Pattern matching through regular expressions is also supported by using the `pattern:` prefix for each expected statement. #### Example: ```rust #[debugger_test( debugger = "cdb", commands = "command1\ncommand2\ncommand3", expected_statements = "statement1\nstatement2\nstatement3")] fn test() { } ``` Using a multiline string is also supported, with a single debugger command/expected statement per line: ```rust #[debugger_test( debugger = "cdb", commands = " command1 command2 command3", expected_statements = " statement1 pattern:statement[0-9]+ statement3")] fn test() { } ``` In the example above, the second expected statement uses pattern matching through a regular expression by using the `pattern:` prefix. #### Testing Locally Currently, only Natvis visualizations have been defined for the `widestring` crate via `debug_metadata/widestring.natvis`, which means the `tests/debugger_visualizer.rs` tests need to be run on Windows using the `*-pc-windows-msvc` targets. To run these tests locally, first ensure the debugging tools for Windows are installed or install them following the steps listed here, [Debugging Tools for Windows](https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/). Once the debugging tools have been installed, the tests can be run in the same manner as they are in the CI pipeline. #### Note When running the debugger visualizer tests, `tests/debugger_visualizer.rs`, they need to be run consecutively and not in parallel. This can be achieved by passing the flag `--test-threads=1` to rustc. This is due to how the debugger tests are run. Each test marked with the `#[debugger_test]` attribute launches a debugger and attaches it to the current test process. If tests are running in parallel, the test will try to attach a debugger to the current process which may already have a debugger attached causing the test to fail. For example: ``` cargo test --test debugger_visualizer --features debugger_visualizer -- --test-threads=1 ``` widestring-1.1.0/debug_metadata/widestring.natvis000064400000000000000000000063171046102023000203440ustar 00000000000000 {(char16_t*)this,su} {(char32_t*)this,s32} {(char16_t*)inner.data_ptr,[inner.length]su} inner.length inner.length (char16_t*)inner.data_ptr {(char32_t*)inner.data_ptr,[inner.length]s32} inner.length inner.length (char32_t*)inner.data_ptr {(char16_t*)inner.buf.ptr.pointer.pointer,[inner.len]su} inner.len inner.len (char16_t*)inner.buf.ptr.pointer.pointer {(char32_t*)inner.buf.ptr.pointer.pointer,[inner.len]s32} inner.len inner.len (char32_t*)inner.buf.ptr.pointer.pointer {(char16_t*)inner.buf.ptr.pointer.pointer,[inner.len]su} inner.len inner.len (char16_t*)inner.buf.ptr.pointer.pointer {(char32_t*)inner.buf.ptr.pointer.pointer,[inner.len]s32} inner.len inner.len (char32_t*)inner.buf.ptr.pointer.pointer widestring-1.1.0/src/error.rs000064400000000000000000000231601046102023000142720ustar 00000000000000//! Errors returned by functions in this crate. #[cfg(feature = "alloc")] #[allow(unused_imports)] use alloc::vec::Vec; /// An error returned to indicate a problem with nul values occurred. /// /// The error will either being a [`MissingNulTerminator`] or [`ContainsNul`]. /// The error optionally returns the ownership of the invalid vector whenever a vector was owned. #[derive(Debug, Clone)] pub enum NulError { /// A terminating nul value was missing. MissingNulTerminator(MissingNulTerminator), /// An interior nul value was found. ContainsNul(ContainsNul), } impl NulError { /// Consumes this error, returning the underlying vector of values which generated the error in /// the first place. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_vec(self) -> Option> { match self { Self::MissingNulTerminator(_) => None, Self::ContainsNul(e) => e.into_vec(), } } } impl core::fmt::Display for NulError { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { match self { Self::MissingNulTerminator(e) => e.fmt(f), Self::ContainsNul(e) => e.fmt(f), } } } #[cfg(feature = "std")] impl std::error::Error for NulError where C: core::fmt::Debug + 'static, { #[inline] fn source(&self) -> Option<&(dyn std::error::Error + 'static)> { match self { Self::MissingNulTerminator(e) => Some(e), Self::ContainsNul(e) => Some(e), } } } impl From for NulError { #[inline] fn from(value: MissingNulTerminator) -> Self { Self::MissingNulTerminator(value) } } impl From> for NulError { #[inline] fn from(value: ContainsNul) -> Self { Self::ContainsNul(value) } } /// An error returned from to indicate that a terminating nul value was missing. #[derive(Debug, Clone)] pub struct MissingNulTerminator { _unused: (), } impl MissingNulTerminator { pub(crate) fn new() -> Self { Self { _unused: () } } } impl core::fmt::Display for MissingNulTerminator { fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { write!(f, "missing terminating nul value") } } #[cfg(feature = "std")] impl std::error::Error for MissingNulTerminator {} /// An error returned to indicate that an invalid nul value was found in a string. /// /// The error indicates the position in the vector where the nul value was found, as well as /// returning the ownership of the invalid vector. #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[derive(Debug, Clone)] pub struct ContainsNul { index: usize, #[cfg(feature = "alloc")] pub(crate) inner: Option>, #[cfg(not(feature = "alloc"))] _p: core::marker::PhantomData, } impl ContainsNul { #[cfg(feature = "alloc")] pub(crate) fn new(index: usize, v: Vec) -> Self { Self { index, inner: Some(v), } } #[cfg(feature = "alloc")] pub(crate) fn empty(index: usize) -> Self { Self { index, inner: None } } #[cfg(not(feature = "alloc"))] pub(crate) fn empty(index: usize) -> Self { Self { index, _p: core::marker::PhantomData, } } /// Returns the index of the invalid nul value in the slice. #[inline] #[must_use] pub fn nul_position(&self) -> usize { self.index } /// Consumes this error, returning the underlying vector of values which generated the error in /// the first place. /// /// If the sequence that generated the error was a reference to a slice instead of a [`Vec`], /// this will return [`None`]. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_vec(self) -> Option> { self.inner } } impl core::fmt::Display for ContainsNul { fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { write!(f, "invalid nul value found at position {}", self.index) } } #[cfg(feature = "std")] impl std::error::Error for ContainsNul where C: core::fmt::Debug {} /// An error that can be returned when decoding UTF-16 code points. /// /// This struct is created when using the [`DecodeUtf16`][crate::iter::DecodeUtf16] iterator. #[derive(Debug, Clone, PartialEq, Eq)] pub struct DecodeUtf16Error { unpaired_surrogate: u16, } impl DecodeUtf16Error { pub(crate) fn new(unpaired_surrogate: u16) -> Self { Self { unpaired_surrogate } } /// Returns the unpaired surrogate which caused this error. #[must_use] pub fn unpaired_surrogate(&self) -> u16 { self.unpaired_surrogate } } impl core::fmt::Display for DecodeUtf16Error { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { write!(f, "unpaired surrogate found: {:x}", self.unpaired_surrogate) } } #[cfg(feature = "std")] impl std::error::Error for DecodeUtf16Error {} /// An error that can be returned when decoding UTF-32 code points. /// /// This error occurs when a [`u32`] value is outside the 21-bit Unicode code point range /// (>`U+10FFFF`) or is a UTF-16 surrogate value. #[derive(Debug, Clone)] pub struct DecodeUtf32Error { code: u32, } impl DecodeUtf32Error { pub(crate) fn new(code: u32) -> Self { Self { code } } /// Returns the invalid code point value which caused the error. #[must_use] pub fn invalid_code_point(&self) -> u32 { self.code } } impl core::fmt::Display for DecodeUtf32Error { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { write!(f, "invalid UTF-32 code point: {:x}", self.code) } } #[cfg(feature = "std")] impl std::error::Error for DecodeUtf32Error {} /// Errors which can occur when attempting to interpret a sequence of `u16` as UTF-16. #[derive(Debug, Clone)] pub struct Utf16Error { index: usize, source: DecodeUtf16Error, #[cfg(feature = "alloc")] inner: Option>, } impl Utf16Error { #[cfg(feature = "alloc")] pub(crate) fn new(inner: Vec, index: usize, source: DecodeUtf16Error) -> Self { Self { inner: Some(inner), index, source, } } #[cfg(feature = "alloc")] pub(crate) fn empty(index: usize, source: DecodeUtf16Error) -> Self { Self { index, source, inner: None, } } #[cfg(not(feature = "alloc"))] pub(crate) fn empty(index: usize, source: DecodeUtf16Error) -> Self { Self { index, source } } /// Returns the index in the given string at which the invalid UTF-16 value occurred. #[must_use] pub fn index(&self) -> usize { self.index } /// Consumes this error, returning the underlying vector of values which generated the error in /// the first place. /// /// If the sequence that generated the error was a reference to a slice instead of a [`Vec`], /// this will return [`None`]. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_vec(self) -> Option> { self.inner } } impl core::fmt::Display for Utf16Error { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { write!( f, "unpaired UTF-16 surrogate {:x} at index {}", self.source.unpaired_surrogate(), self.index ) } } #[cfg(feature = "std")] impl std::error::Error for Utf16Error { fn source(&self) -> Option<&(dyn std::error::Error + 'static)> { Some(&self.source) } } /// Errors which can occur when attempting to interpret a sequence of `u32` as UTF-32. #[derive(Debug, Clone)] pub struct Utf32Error { index: usize, source: DecodeUtf32Error, #[cfg(feature = "alloc")] inner: Option>, } impl Utf32Error { #[cfg(feature = "alloc")] pub(crate) fn new(inner: Vec, index: usize, source: DecodeUtf32Error) -> Self { Self { inner: Some(inner), index, source, } } #[cfg(feature = "alloc")] pub(crate) fn empty(index: usize, source: DecodeUtf32Error) -> Self { Self { index, source, inner: None, } } #[cfg(not(feature = "alloc"))] pub(crate) fn empty(index: usize, source: DecodeUtf32Error) -> Self { Self { index, source } } /// Returns the index in the given string at which the invalid UTF-32 value occurred. #[must_use] pub fn index(&self) -> usize { self.index } /// Consumes this error, returning the underlying vector of values which generated the error in /// the first place. /// /// If the sequence that generated the error was a reference to a slice instead of a [`Vec`], /// this will return [`None`]. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_vec(self) -> Option> { self.inner } } impl core::fmt::Display for Utf32Error { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { write!( f, "invalid UTF-32 value {:x} at index {}", self.source.invalid_code_point(), self.index ) } } #[cfg(feature = "std")] impl std::error::Error for Utf32Error { fn source(&self) -> Option<&(dyn std::error::Error + 'static)> { Some(&self.source) } } widestring-1.1.0/src/example.txt000064400000000000000000000000241046102023000147610ustar 00000000000000My stringwidestring-1.1.0/src/iter.rs000064400000000000000000000264001046102023000141040ustar 00000000000000//! Iterators for encoding and decoding slices of string data. use crate::{ decode_utf16_surrogate_pair, error::{DecodeUtf16Error, DecodeUtf32Error}, is_utf16_high_surrogate, is_utf16_low_surrogate, is_utf16_surrogate, }; #[allow(unused_imports)] use core::{ char, iter::{DoubleEndedIterator, ExactSizeIterator, FusedIterator}, }; /// An iterator that decodes UTF-16 encoded code points from an iterator of [`u16`]s. /// /// This struct is created by [`decode_utf16`][crate::decode_utf16]. See its documentation for more. /// /// This struct is identical to [`char::DecodeUtf16`] except it is a [`DoubleEndedIterator`] if /// `I` is. #[derive(Debug, Clone)] pub struct DecodeUtf16 where I: Iterator, { iter: I, forward_buf: Option, back_buf: Option, } impl DecodeUtf16 where I: Iterator, { pub(crate) fn new(iter: I) -> Self { Self { iter, forward_buf: None, back_buf: None, } } } impl Iterator for DecodeUtf16 where I: Iterator, { type Item = Result; fn next(&mut self) -> Option { // Copied from char::DecodeUtf16 let u = match self.forward_buf.take() { Some(buf) => buf, None => self.iter.next().or_else(|| self.back_buf.take())?, }; if !is_utf16_surrogate(u) { // SAFETY: not a surrogate Some(Ok(unsafe { char::from_u32_unchecked(u as u32) })) } else if is_utf16_low_surrogate(u) { // a trailing surrogate Some(Err(DecodeUtf16Error::new(u))) } else { let u2 = match self.iter.next().or_else(|| self.back_buf.take()) { Some(u2) => u2, // eof None => return Some(Err(DecodeUtf16Error::new(u))), }; if !is_utf16_low_surrogate(u2) { // not a trailing surrogate so we're not a valid // surrogate pair, so rewind to redecode u2 next time. self.forward_buf = Some(u2); return Some(Err(DecodeUtf16Error::new(u))); } // all ok, so lets decode it. // SAFETY: verified the surrogate pair unsafe { Some(Ok(decode_utf16_surrogate_pair(u, u2))) } } } #[inline] fn size_hint(&self) -> (usize, Option) { let (low, high) = self.iter.size_hint(); // we could be entirely valid surrogates (2 elements per // char), or entirely non-surrogates (1 element per char) (low / 2, high) } } impl DoubleEndedIterator for DecodeUtf16 where I: Iterator + DoubleEndedIterator, { fn next_back(&mut self) -> Option { let u2 = match self.back_buf.take() { Some(buf) => buf, None => self.iter.next_back().or_else(|| self.forward_buf.take())?, }; if !is_utf16_surrogate(u2) { // SAFETY: not a surrogate Some(Ok(unsafe { char::from_u32_unchecked(u2 as u32) })) } else if is_utf16_high_surrogate(u2) { // a leading surrogate Some(Err(DecodeUtf16Error::new(u2))) } else { let u = match self.iter.next_back().or_else(|| self.forward_buf.take()) { Some(u) => u, // eof None => return Some(Err(DecodeUtf16Error::new(u2))), }; if !is_utf16_high_surrogate(u) { // not a leading surrogate so we're not a valid // surrogate pair, so rewind to redecode u next time. self.back_buf = Some(u); return Some(Err(DecodeUtf16Error::new(u2))); } // all ok, so lets decode it. // SAFETY: verified the surrogate pair unsafe { Some(Ok(decode_utf16_surrogate_pair(u, u2))) } } } } impl FusedIterator for DecodeUtf16 where I: Iterator + FusedIterator {} /// An iterator that lossily decodes possibly ill-formed UTF-16 encoded code points from an iterator /// of [`u16`]s. /// /// Any unpaired UTF-16 surrogate values are replaced by /// [`U+FFFD REPLACEMENT_CHARACTER`][char::REPLACEMENT_CHARACTER] (๏ฟฝ). #[derive(Debug, Clone)] pub struct DecodeUtf16Lossy where I: Iterator, { pub(crate) iter: DecodeUtf16, } impl Iterator for DecodeUtf16Lossy where I: Iterator, { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter .next() .map(|res| res.unwrap_or(char::REPLACEMENT_CHARACTER)) } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl DoubleEndedIterator for DecodeUtf16Lossy where I: Iterator + DoubleEndedIterator, { #[inline] fn next_back(&mut self) -> Option { self.iter .next_back() .map(|res| res.unwrap_or(char::REPLACEMENT_CHARACTER)) } } impl FusedIterator for DecodeUtf16Lossy where I: Iterator + FusedIterator {} /// An iterator that decodes UTF-32 encoded code points from an iterator of `u32`s. #[derive(Debug, Clone)] pub struct DecodeUtf32 where I: Iterator, { pub(crate) iter: I, } impl Iterator for DecodeUtf32 where I: Iterator, { type Item = Result; #[inline] fn next(&mut self) -> Option { self.iter .next() .map(|u| char::from_u32(u).ok_or_else(|| DecodeUtf32Error::new(u))) } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl DoubleEndedIterator for DecodeUtf32 where I: Iterator + DoubleEndedIterator, { #[inline] fn next_back(&mut self) -> Option { self.iter .next_back() .map(|u| char::from_u32(u).ok_or_else(|| DecodeUtf32Error::new(u))) } } impl FusedIterator for DecodeUtf32 where I: Iterator + FusedIterator {} impl ExactSizeIterator for DecodeUtf32 where I: Iterator + ExactSizeIterator, { #[inline] fn len(&self) -> usize { self.iter.len() } } /// An iterator that lossily decodes possibly ill-formed UTF-32 encoded code points from an iterator /// of `u32`s. /// /// Any invalid UTF-32 values are replaced by /// [`U+FFFD REPLACEMENT_CHARACTER`][core::char::REPLACEMENT_CHARACTER] (๏ฟฝ). #[derive(Debug, Clone)] pub struct DecodeUtf32Lossy where I: Iterator, { pub(crate) iter: DecodeUtf32, } impl Iterator for DecodeUtf32Lossy where I: Iterator, { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter .next() .map(|res| res.unwrap_or(core::char::REPLACEMENT_CHARACTER)) } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl DoubleEndedIterator for DecodeUtf32Lossy where I: Iterator + DoubleEndedIterator, { #[inline] fn next_back(&mut self) -> Option { self.iter .next_back() .map(|res| res.unwrap_or(core::char::REPLACEMENT_CHARACTER)) } } impl FusedIterator for DecodeUtf32Lossy where I: Iterator + FusedIterator {} impl ExactSizeIterator for DecodeUtf32Lossy where I: Iterator + ExactSizeIterator, { #[inline] fn len(&self) -> usize { self.iter.len() } } /// An iterator that encodes an iterator of [`char`][prim@char]s into UTF-8 bytes. /// /// This struct is created by [`encode_utf8`][crate::encode_utf8]. See its documentation for more. #[derive(Debug, Clone)] pub struct EncodeUtf8 where I: Iterator, { iter: I, buf: [u8; 4], idx: u8, len: u8, } impl EncodeUtf8 where I: Iterator, { pub(crate) fn new(iter: I) -> Self { Self { iter, buf: [0; 4], idx: 0, len: 0, } } } impl Iterator for EncodeUtf8 where I: Iterator, { type Item = u8; #[inline] fn next(&mut self) -> Option { if self.idx >= self.len { let c = self.iter.next()?; self.idx = 0; self.len = c.encode_utf8(&mut self.buf).len() as u8; } self.idx += 1; let idx = (self.idx - 1) as usize; Some(self.buf[idx]) } #[inline] fn size_hint(&self) -> (usize, Option) { let (lower, upper) = self.iter.size_hint(); (lower, upper.and_then(|len| len.checked_mul(4))) // Max 4 UTF-8 bytes per char } } impl FusedIterator for EncodeUtf8 where I: Iterator + FusedIterator {} /// An iterator that encodes an iterator of [`char`][prim@char]s into UTF-16 [`u16`] code units. /// /// This struct is created by [`encode_utf16`][crate::encode_utf16]. See its documentation for more. #[derive(Debug, Clone)] pub struct EncodeUtf16 where I: Iterator, { iter: I, buf: Option, } impl EncodeUtf16 where I: Iterator, { pub(crate) fn new(iter: I) -> Self { Self { iter, buf: None } } } impl Iterator for EncodeUtf16 where I: Iterator, { type Item = u16; #[inline] fn next(&mut self) -> Option { self.buf.take().or_else(|| { let c = self.iter.next()?; let mut buf = [0; 2]; let buf = c.encode_utf16(&mut buf); if buf.len() > 1 { self.buf = Some(buf[1]); } Some(buf[0]) }) } #[inline] fn size_hint(&self) -> (usize, Option) { let (lower, upper) = self.iter.size_hint(); (lower, upper.and_then(|len| len.checked_mul(2))) // Max 2 UTF-16 code units per char } } impl FusedIterator for EncodeUtf16 where I: Iterator + FusedIterator {} /// An iterator that encodes an iterator of [`char`][prim@char]s into UTF-32 [`u32`] values. /// /// This struct is created by [`encode_utf32`][crate::encode_utf32]. See its documentation for more. #[derive(Debug, Clone)] pub struct EncodeUtf32 where I: Iterator, { iter: I, } impl EncodeUtf32 where I: Iterator, { pub(crate) fn new(iter: I) -> Self { Self { iter } } } impl Iterator for EncodeUtf32 where I: Iterator, { type Item = u32; #[inline] fn next(&mut self) -> Option { self.iter.next().map(|c| c as u32) } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl FusedIterator for EncodeUtf32 where I: Iterator + FusedIterator {} impl ExactSizeIterator for EncodeUtf32 where I: Iterator + ExactSizeIterator, { #[inline] fn len(&self) -> usize { self.iter.len() } } impl DoubleEndedIterator for EncodeUtf32 where I: Iterator + DoubleEndedIterator, { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back().map(|c| c as u32) } } widestring-1.1.0/src/lib.rs000064400000000000000000000651451046102023000137200ustar 00000000000000//! A wide string library for converting to and from wide string variants. //! //! This library provides multiple types of wide strings, each corresponding to a string types in //! the Rust standard library. [`Utf16String`] and [`Utf32String`] are analogous to the standard //! [`String`] type, providing a similar interface, and are always encoded as valid UTF-16 and //! UTF-32, respectively. They are the only type in this library that can losslessly and infallibly //! convert to and from [`String`], and are the easiest type to work with. They are not designed for //! working with FFI, but do support efficient conversions from the FFI types. //! //! [`U16String`] and [`U32String`], on the other hand, are similar to (but not the same as), //! [`OsString`], and are designed around working with FFI. Unlike the UTF variants, these strings //! do not have a defined encoding, and can work with any wide character strings, regardless of //! the encoding. They can be converted to and from [`OsString`] (but may require an encoding //! conversion depending on the platform), although that string type is an OS-specified //! encoding, so take special care. //! //! [`U16String`] and [`U32String`] also allow access and mutation that relies on the user //! to enforce any constraints on the data. Some methods do assume a UTF encoding, but do so in a //! way that handles malformed encoding data. For FFI, use [`U16String`] or [`U32String`] when you //! simply need to pass-through string data, or when you're not dealing with a nul-terminated data. //! //! Finally, [`U16CString`] and [`U32CString`] are wide version of the standard [`CString`] type. //! Like [`U16String`] and [`U32String`], they do not have defined encoding, but are designed to //! work with FFI, particularly C-style nul-terminated wide string data. These C-style strings are //! always terminated in a nul value, and are guaranteed to contain no interior nul values (unless //! unchecked methods are used). Again, these types may contain ill-formed encoding data, and //! methods handle it appropriately. Use [`U16CString`] or [`U32CString`] anytime you must properly //! handle nul values for when dealing with wide string C FFI. //! //! Like the standard Rust string types, each wide string type has its corresponding wide string //! slice type, as shown in the following table: //! //! | String Type | Slice Type | //! |-----------------|--------------| //! | [`Utf16String`] | [`Utf16Str`] | //! | [`Utf32String`] | [`Utf32Str`] | //! | [`U16String`] | [`U16Str`] | //! | [`U32String`] | [`U32Str`] | //! | [`U16CString`] | [`U16CStr`] | //! | [`U32CString`] | [`U32CStr`] | //! //! All the string types in this library can be converted between string types of the same bit //! width, as well as appropriate standard Rust types, but be lossy and/or require knowledge of the //! underlying encoding. The UTF strings additionally can be converted between the two sizes of //! string, re-encoding the strings. //! //! # Wide string literals //! //! Macros are provided for each wide string slice type that convert standard Rust [`str`] literals //! into UTF-16 or UTF-32 encoded versions of the slice type at *compile time*. //! //! ``` //! use widestring::u16str; //! let hello = u16str!("Hello, world!"); // `hello` will be a &U16Str value //! ``` //! //! These can be used anywhere a `const` function can be used, and provide a convenient method of //! specifying wide string literals instead of coding values by hand. The resulting string slices //! are always valid UTF encoding, and the [`u16cstr!`] and [`u32cstr!`] macros are automatically //! nul-terminated. //! //! # Cargo features //! //! This crate supports `no_std` when default cargo features are disabled. The `std` and `alloc` //! cargo features (enabled by default) enable the owned string types: [`U16String`], [`U32String`], //! [`U16CString`], [`U32CString`], [`Utf16String`], and [`Utf32String`] types and their modules. //! Other types such as the string slices do not require allocation and can be used in a `no_std` //! environment, even without the [`alloc`](https://doc.rust-lang.org/stable/alloc/index.html) //! crate. //! //! # Remarks on UTF-16 and UTF-32 //! //! UTF-16 encoding is a variable-length encoding. The 16-bit code units can specificy Unicode code //! points either as single units or in _surrogate pairs_. Because every value might be part of a //! surrogate pair, many regular string operations on UTF-16 data, including indexing, writing, or //! even iterating, require considering either one or two values at a time. This library provides //! safe methods for these operations when the data is known to be UTF-16, such as with //! [`Utf16String`]. In those cases, keep in mind that the number of elements (`len()`) of the //! wide string is _not_ equivalent to the number of Unicode code points in the string, but is //! instead the number of code unit values. //! //! For [`U16String`] and [`U16CString`], which do not define an encoding, these same operations //! (indexing, mutating, iterating) do _not_ take into account UTF-16 encoding and may result in //! sequences that are ill-formed UTF-16. Some methods are provided that do make an exception to //! this and treat the strings as malformed UTF-16, which are specified in their documentation as to //! how they handle the invalid data. //! //! UTF-32 simply encodes Unicode code points as-is in 32-bit Unicode Scalar Values, but Unicode //! character code points are reserved only for 21-bits, and UTF-16 surrogates are invalid in //! UTF-32. Since UTF-32 is a fixed-width encoding, it is much easier to deal with, but equivalent //! methods to the 16-bit strings are provided for compatibility. //! //! All the 32-bit wide strings provide efficient methods to convert to and from sequences of //! [`char`] data, as the representation of UTF-32 strings is functionally equivalent to sequences //! of [`char`]s. Keep in mind that only [`Utf32String`] guaruntees this equivalence, however, since //! the other strings may contain invalid values. //! //! # FFI with C/C++ `wchar_t` //! //! C/C++'s `wchar_t` (and C++'s corresponding `widestring`) varies in size depending on compiler //! and platform. Typically, `wchar_t` is 16-bits on Windows and 32-bits on most Unix-based //! platforms. For convenience when using `wchar_t`-based FFI's, type aliases for the corresponding //! string types are provided: [`WideString`] aliases [`U16String`] on Windows or [`U32String`] //! elsewhere, [`WideCString`] aliases [`U16CString`] or [`U32CString`], and [`WideUtfString`] //! aliases [`Utf16String`] or [`Utf32String`]. [`WideStr`], [`WideCStr`], and [`WideUtfStr`] are //! provided for the string slice types. The [`WideChar`] alias is also provided, aliasing [`u16`] //! or [`u32`] depending on platform. //! //! When not interacting with a FFI that uses `wchar_t`, it is recommended to use the string types //! directly rather than via the wide alias. //! //! # Nul values //! //! This crate uses the term legacy ASCII term "nul" to refer to Unicode code point `U+0000 NULL` //! and its associated code unit representation as zero-value bytes. This is to disambiguate this //! zero value from null pointer values. C-style strings end in a nul value, while regular Rust //! strings allow interior nul values and are not terminated with nul. //! //! # Examples //! //! The following example uses [`U16String`] to get Windows error messages, since `FormatMessageW` //! returns a string length for us and we don't need to pass error messages into other FFI //! functions so we don't need to worry about nul values. //! //! ```rust //! # #[cfg(any(not(windows), not(feature = "alloc")))] //! # fn main() {} //! # extern crate winapi; //! # extern crate widestring; //! # #[cfg(all(windows, feature = "alloc"))] //! # fn main() { //! use winapi::um::winbase::{FormatMessageW, LocalFree, FORMAT_MESSAGE_FROM_SYSTEM, //! FORMAT_MESSAGE_ALLOCATE_BUFFER, FORMAT_MESSAGE_IGNORE_INSERTS}; //! use winapi::shared::ntdef::LPWSTR; //! use winapi::shared::minwindef::HLOCAL; //! use std::ptr; //! use widestring::U16String; //! # use winapi::shared::minwindef::DWORD; //! # let error_code: DWORD = 0; //! //! let s: U16String; //! unsafe { //! // First, get a string buffer from some windows api such as FormatMessageW... //! let mut buffer: LPWSTR = ptr::null_mut(); //! let strlen = FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM | //! FORMAT_MESSAGE_ALLOCATE_BUFFER | //! FORMAT_MESSAGE_IGNORE_INSERTS, //! ptr::null(), //! error_code, // error code from GetLastError() //! 0, //! (&mut buffer as *mut LPWSTR) as LPWSTR, //! 0, //! ptr::null_mut()); //! //! // Get the buffer as a wide string //! s = U16String::from_ptr(buffer, strlen as usize); //! // Since U16String creates an owned copy, it's safe to free original buffer now //! // If you didn't want an owned copy, you could use &U16Str. //! LocalFree(buffer as HLOCAL); //! } //! // Convert to a regular Rust String and use it to your heart's desire! //! let message = s.to_string_lossy(); //! # assert_eq!(message, "The operation completed successfully.\r\n"); //! # } //! ``` //! //! The following example is the functionally the same, only using [`U16CString`] instead. //! //! ```rust //! # #[cfg(any(not(windows), not(feature = "alloc")))] //! # fn main() {} //! # extern crate winapi; //! # extern crate widestring; //! # #[cfg(all(windows, feature = "alloc"))] //! # fn main() { //! use winapi::um::winbase::{FormatMessageW, LocalFree, FORMAT_MESSAGE_FROM_SYSTEM, //! FORMAT_MESSAGE_ALLOCATE_BUFFER, FORMAT_MESSAGE_IGNORE_INSERTS}; //! use winapi::shared::ntdef::LPWSTR; //! use winapi::shared::minwindef::HLOCAL; //! use std::ptr; //! use widestring::U16CString; //! # use winapi::shared::minwindef::DWORD; //! # let error_code: DWORD = 0; //! //! let s: U16CString; //! unsafe { //! // First, get a string buffer from some windows api such as FormatMessageW... //! let mut buffer: LPWSTR = ptr::null_mut(); //! FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM | //! FORMAT_MESSAGE_ALLOCATE_BUFFER | //! FORMAT_MESSAGE_IGNORE_INSERTS, //! ptr::null(), //! error_code, // error code from GetLastError() //! 0, //! (&mut buffer as *mut LPWSTR) as LPWSTR, //! 0, //! ptr::null_mut()); //! //! // Get the buffer as a wide string //! s = U16CString::from_ptr_str(buffer); //! // Since U16CString creates an owned copy, it's safe to free original buffer now //! // If you didn't want an owned copy, you could use &U16CStr. //! LocalFree(buffer as HLOCAL); //! } //! // Convert to a regular Rust String and use it to your heart's desire! //! let message = s.to_string_lossy(); //! # assert_eq!(message, "The operation completed successfully.\r\n"); //! # } //! ``` //! //! [`OsString`]: std::ffi::OsString //! [`OsStr`]: std::ffi::OsStr //! [`CString`]: std::ffi::CString //! [`CStr`]: std::ffi::CStr #![warn( missing_docs, missing_debug_implementations, trivial_casts, trivial_numeric_casts, future_incompatible )] #![allow(renamed_and_removed_lints, stable_features)] // Until min version gets bumped #![cfg_attr(not(feature = "std"), no_std)] #![doc(html_root_url = "https://docs.rs/widestring/1.1.0")] #![doc(test(attr(deny(warnings), allow(unused))))] #![cfg_attr(docsrs, feature(doc_cfg))] #![cfg_attr( feature = "debugger_visualizer", feature(debugger_visualizer), debugger_visualizer(natvis_file = "../debug_metadata/widestring.natvis") )] #[cfg(feature = "alloc")] extern crate alloc; use crate::error::{DecodeUtf16Error, DecodeUtf32Error}; #[cfg(feature = "alloc")] #[allow(unused_imports)] use alloc::vec::Vec; use core::fmt::Write; pub mod error; pub mod iter; mod macros; #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] mod platform; pub mod ucstr; #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub mod ucstring; pub mod ustr; #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub mod ustring; pub mod utfstr; #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub mod utfstring; #[doc(hidden)] pub use macros::internals; pub use ucstr::{U16CStr, U32CStr, WideCStr}; #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub use ucstring::{U16CString, U32CString, WideCString}; pub use ustr::{U16Str, U32Str, WideStr}; #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub use ustring::{U16String, U32String, WideString}; pub use utfstr::{Utf16Str, Utf32Str, WideUtfStr}; #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub use utfstring::{Utf16String, Utf32String, WideUtfString}; #[cfg(not(windows))] /// Alias for [`u16`] or [`u32`] depending on platform. Intended to match typical C `wchar_t` size /// on platform. pub type WideChar = u32; #[cfg(windows)] /// Alias for [`u16`] or [`u32`] depending on platform. Intended to match typical C `wchar_t` size /// on platform. pub type WideChar = u16; /// Creates an iterator over the UTF-16 encoded code points in `iter`, returning unpaired surrogates /// as `Err`s. /// /// # Examples /// /// Basic usage: /// /// ``` /// use std::char::decode_utf16; /// /// // ๐„žmusic /// let v = [ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]; /// /// assert_eq!( /// decode_utf16(v.iter().cloned()) /// .map(|r| r.map_err(|e| e.unpaired_surrogate())) /// .collect::>(), /// vec![ /// Ok('๐„ž'), /// Ok('m'), Ok('u'), Ok('s'), /// Err(0xDD1E), /// Ok('i'), Ok('c'), /// Err(0xD834) /// ] /// ); /// ``` /// /// A lossy decoder can be obtained by replacing Err results with the replacement character: /// /// ``` /// use std::char::{decode_utf16, REPLACEMENT_CHARACTER}; /// /// // ๐„žmusic /// let v = [ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]; /// /// assert_eq!( /// decode_utf16(v.iter().cloned()) /// .map(|r| r.unwrap_or(REPLACEMENT_CHARACTER)) /// .collect::(), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` #[must_use] pub fn decode_utf16>(iter: I) -> iter::DecodeUtf16 { iter::DecodeUtf16::new(iter.into_iter()) } /// Creates a lossy decoder iterator over the possibly ill-formed UTF-16 encoded code points in /// `iter`. /// /// This is equivalent to [`char::decode_utf16`][core::char::decode_utf16] except that any unpaired /// UTF-16 surrogate values are replaced by /// [`U+FFFD REPLACEMENT_CHARACTER`][core::char::REPLACEMENT_CHARACTER] (๏ฟฝ) instead of returning /// errors. /// /// # Examples /// /// ``` /// use widestring::decode_utf16_lossy; /// /// // ๐„žmusic /// let v = [ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]; /// /// assert_eq!( /// decode_utf16_lossy(v.iter().copied()).collect::(), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` #[inline] #[must_use] pub fn decode_utf16_lossy>( iter: I, ) -> iter::DecodeUtf16Lossy { iter::DecodeUtf16Lossy { iter: decode_utf16(iter), } } /// Creates a decoder iterator over UTF-32 encoded code points in `iter`, returning invalid values /// as `Err`s. /// /// # Examples /// /// ``` /// use widestring::decode_utf32; /// /// // ๐„žmusic /// let v = [ /// 0x1D11E, 0x6d, 0x75, 0x73, 0xDD1E, 0x69, 0x63, 0x23FD5A, /// ]; /// /// assert_eq!( /// decode_utf32(v.iter().copied()) /// .map(|r| r.map_err(|e| e.invalid_code_point())) /// .collect::>(), /// vec![ /// Ok('๐„ž'), /// Ok('m'), Ok('u'), Ok('s'), /// Err(0xDD1E), /// Ok('i'), Ok('c'), /// Err(0x23FD5A) /// ] /// ); /// ``` #[inline] #[must_use] pub fn decode_utf32>(iter: I) -> iter::DecodeUtf32 { iter::DecodeUtf32 { iter: iter.into_iter(), } } /// Creates a lossy decoder iterator over the possibly ill-formed UTF-32 encoded code points in /// `iter`. /// /// This is equivalent to [`decode_utf32`] except that any invalid UTF-32 values are replaced by /// [`U+FFFD REPLACEMENT_CHARACTER`][core::char::REPLACEMENT_CHARACTER] (๏ฟฝ) instead of returning /// errors. /// /// # Examples /// /// ``` /// use widestring::decode_utf32_lossy; /// /// // ๐„žmusic /// let v = [ /// 0x1D11E, 0x6d, 0x75, 0x73, 0xDD1E, 0x69, 0x63, 0x23FD5A, /// ]; /// /// assert_eq!( /// decode_utf32_lossy(v.iter().copied()).collect::(), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` #[inline] #[must_use] pub fn decode_utf32_lossy>( iter: I, ) -> iter::DecodeUtf32Lossy { iter::DecodeUtf32Lossy { iter: decode_utf32(iter), } } /// Creates an iterator that encodes an iterator over [`char`]s into UTF-8 bytes. /// /// # Examples /// /// ``` /// use widestring::encode_utf8; /// /// let music = "๐„žmusic"; /// /// let encoded: Vec = encode_utf8(music.chars()).collect(); /// /// assert_eq!(encoded, music.as_bytes()); /// ``` #[must_use] pub fn encode_utf8>(iter: I) -> iter::EncodeUtf8 { iter::EncodeUtf8::new(iter.into_iter()) } /// Creates an iterator that encodes an iterator over [`char`]s into UTF-16 [`u16`] code units. /// /// # Examples /// /// ``` /// use widestring::encode_utf16; /// /// let encoded: Vec = encode_utf16("๐„žmusic".chars()).collect(); /// /// let v = [ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0x0069, 0x0063, /// ]; /// /// assert_eq!(encoded, v); /// ``` #[must_use] pub fn encode_utf16>(iter: I) -> iter::EncodeUtf16 { iter::EncodeUtf16::new(iter.into_iter()) } /// Creates an iterator that encodes an iterator over [`char`]s into UTF-32 [`u32`] values. /// /// This iterator is a simple type cast from [`char`] to [`u32`], as any sequence of [`char`]s is /// valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::encode_utf32; /// /// let encoded: Vec = encode_utf32("๐„žmusic".chars()).collect(); /// /// let v = [ /// 0x1D11E, 0x006d, 0x0075, 0x0073, 0x0069, 0x0063, /// ]; /// /// assert_eq!(encoded, v); /// ``` #[must_use] pub fn encode_utf32>(iter: I) -> iter::EncodeUtf32 { iter::EncodeUtf32::new(iter.into_iter()) } /// Debug implementation for any U16 string slice. /// /// Properly encoded input data will output valid strings with escape sequences, however invalid /// encoding will purposefully output any unpaired surrogate as \ which is not a valid escape /// sequence. This is intentional, as debug output is not meant to be parsed but read by humans. fn debug_fmt_u16(s: &[u16], fmt: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { debug_fmt_utf16_iter(decode_utf16(s.iter().copied()), fmt) } /// Debug implementation for any U16 string iterator. /// /// Properly encoded input data will output valid strings with escape sequences, however invalid /// encoding will purposefully output any unpaired surrogate as \ which is not a valid escape /// sequence. This is intentional, as debug output is not meant to be parsed but read by humans. fn debug_fmt_utf16_iter( iter: impl Iterator>, fmt: &mut core::fmt::Formatter<'_>, ) -> core::fmt::Result { fmt.write_char('"')?; for res in iter { match res { Ok(ch) => { for c in ch.escape_debug() { fmt.write_char(c)?; } } Err(e) => { write!(fmt, "\\<{:X}>", e.unpaired_surrogate())?; } } } fmt.write_char('"') } /// Debug implementation for any U16 string slice. /// /// Properly encoded input data will output valid strings with escape sequences, however invalid /// encoding will purposefully output any invalid code point as \ which is not a valid escape /// sequence. This is intentional, as debug output is not meant to be parsed but read by humans. fn debug_fmt_u32(s: &[u32], fmt: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { debug_fmt_utf32_iter(decode_utf32(s.iter().copied()), fmt) } /// Debug implementation for any U16 string iterator. /// /// Properly encoded input data will output valid strings with escape sequences, however invalid /// encoding will purposefully output any invalid code point as \ which is not a valid escape /// sequence. This is intentional, as debug output is not meant to be parsed but read by humans. fn debug_fmt_utf32_iter( iter: impl Iterator>, fmt: &mut core::fmt::Formatter<'_>, ) -> core::fmt::Result { fmt.write_char('"')?; for res in iter { match res { Ok(ch) => { for c in ch.escape_debug() { fmt.write_char(c)?; } } Err(e) => { write!(fmt, "\\<{:X}>", e.invalid_code_point())?; } } } fmt.write_char('"') } /// Debug implementation for any `char` iterator. fn debug_fmt_char_iter( iter: impl Iterator, fmt: &mut core::fmt::Formatter<'_>, ) -> core::fmt::Result { fmt.write_char('"')?; iter.flat_map(|c| c.escape_debug()) .try_for_each(|c| fmt.write_char(c))?; fmt.write_char('"') } /// Returns whether the code unit a UTF-16 surrogate value. #[inline(always)] #[allow(dead_code)] const fn is_utf16_surrogate(u: u16) -> bool { u >= 0xD800 && u <= 0xDFFF } /// Returns whether the code unit a UTF-16 high surrogate value. #[inline(always)] #[allow(dead_code)] const fn is_utf16_high_surrogate(u: u16) -> bool { u >= 0xD800 && u <= 0xDBFF } /// Returns whether the code unit a UTF-16 low surrogate value. #[inline(always)] const fn is_utf16_low_surrogate(u: u16) -> bool { u >= 0xDC00 && u <= 0xDFFF } /// Convert a UTF-16 surrogate pair to a `char`. Does not validate if the surrogates are valid. #[inline(always)] unsafe fn decode_utf16_surrogate_pair(high: u16, low: u16) -> char { let c: u32 = (((high - 0xD800) as u32) << 10 | ((low) - 0xDC00) as u32) + 0x1_0000; // SAFETY: we checked that it's a legal unicode value core::char::from_u32_unchecked(c) } /// Validates whether a slice of 16-bit values is valid UTF-16, returning an error if it is not. #[inline(always)] fn validate_utf16(s: &[u16]) -> Result<(), crate::error::Utf16Error> { for (index, result) in crate::decode_utf16(s.iter().copied()).enumerate() { if let Err(e) = result { return Err(crate::error::Utf16Error::empty(index, e)); } } Ok(()) } /// Validates whether a vector of 16-bit values is valid UTF-16, returning an error if it is not. #[inline(always)] #[cfg(feature = "alloc")] fn validate_utf16_vec(v: Vec) -> Result, crate::error::Utf16Error> { for (index, result) in crate::decode_utf16(v.iter().copied()).enumerate() { if let Err(e) = result { return Err(crate::error::Utf16Error::new(v, index, e)); } } Ok(v) } /// Validates whether a slice of 32-bit values is valid UTF-32, returning an error if it is not. #[inline(always)] fn validate_utf32(s: &[u32]) -> Result<(), crate::error::Utf32Error> { for (index, result) in crate::decode_utf32(s.iter().copied()).enumerate() { if let Err(e) = result { return Err(crate::error::Utf32Error::empty(index, e)); } } Ok(()) } /// Validates whether a vector of 32-bit values is valid UTF-32, returning an error if it is not. #[inline(always)] #[cfg(feature = "alloc")] fn validate_utf32_vec(v: Vec) -> Result, crate::error::Utf32Error> { for (index, result) in crate::decode_utf32(v.iter().copied()).enumerate() { if let Err(e) = result { return Err(crate::error::Utf32Error::new(v, index, e)); } } Ok(v) } /// Copy of unstable core::slice::range to soundly handle ranges /// TODO: Replace with core::slice::range when it is stabilized #[track_caller] #[allow(dead_code, clippy::redundant_closure)] fn range(range: R, bounds: core::ops::RangeTo) -> core::ops::Range where R: core::ops::RangeBounds, { #[inline(never)] #[cold] #[track_caller] fn slice_end_index_len_fail(index: usize, len: usize) -> ! { panic!( "range end index {} out of range for slice of length {}", index, len ); } #[inline(never)] #[cold] #[track_caller] fn slice_index_order_fail(index: usize, end: usize) -> ! { panic!("slice index starts at {} but ends at {}", index, end); } #[inline(never)] #[cold] #[track_caller] fn slice_start_index_overflow_fail() -> ! { panic!("attempted to index slice from after maximum usize"); } #[inline(never)] #[cold] #[track_caller] fn slice_end_index_overflow_fail() -> ! { panic!("attempted to index slice up to maximum usize"); } use core::ops::Bound::*; let len = bounds.end; let start = range.start_bound(); let start = match start { Included(&start) => start, Excluded(start) => start .checked_add(1) .unwrap_or_else(|| slice_start_index_overflow_fail()), Unbounded => 0, }; let end = range.end_bound(); let end = match end { Included(end) => end .checked_add(1) .unwrap_or_else(|| slice_end_index_overflow_fail()), Excluded(&end) => end, Unbounded => len, }; if start > end { slice_index_order_fail(start, end); } if end > len { slice_end_index_len_fail(end, len); } core::ops::Range { start, end } } /// Similar to core::slice::range, but returns [`None`] instead of panicking. fn range_check(range: R, bounds: core::ops::RangeTo) -> Option> where R: core::ops::RangeBounds, { use core::ops::Bound::*; let len = bounds.end; let start = range.start_bound(); let start = match start { Included(&start) => start, Excluded(start) => start.checked_add(1)?, Unbounded => 0, }; let end = range.end_bound(); let end = match end { Included(end) => end.checked_add(1)?, Excluded(&end) => end, Unbounded => len, }; if start > end || end > len { return None; } Some(core::ops::Range { start, end }) } widestring-1.1.0/src/macros.rs000064400000000000000000000456311046102023000144340ustar 00000000000000macro_rules! implement_utf16_macro { ($(#[$m:meta])* $name:ident $extra_len:literal $str:ident $fn:ident) => { $(#[$m])* #[macro_export] macro_rules! $name { ($text:expr) => {{ const _WIDESTRING_U16_MACRO_UTF8: &$crate::internals::core::primitive::str = $text; const _WIDESTRING_U16_MACRO_LEN: $crate::internals::core::primitive::usize = $crate::internals::length_as_utf16(_WIDESTRING_U16_MACRO_UTF8) + $extra_len; const _WIDESTRING_U16_MACRO_UTF16: [$crate::internals::core::primitive::u16; _WIDESTRING_U16_MACRO_LEN] = { let mut _widestring_buffer: [$crate::internals::core::primitive::u16; _WIDESTRING_U16_MACRO_LEN] = [0; _WIDESTRING_U16_MACRO_LEN]; let mut _widestring_bytes = _WIDESTRING_U16_MACRO_UTF8.as_bytes(); let mut _widestring_i = 0; while let $crate::internals::core::option::Option::Some((_widestring_ch, _widestring_rest)) = $crate::internals::next_code_point(_widestring_bytes) { _widestring_bytes = _widestring_rest; if $extra_len > 0 && _widestring_ch == 0 { panic!("invalid NUL value found in string literal"); } // https://doc.rust-lang.org/std/primitive.char.html#method.encode_utf16 if _widestring_ch & 0xFFFF == _widestring_ch { _widestring_buffer[_widestring_i] = _widestring_ch as $crate::internals::core::primitive::u16; _widestring_i += 1; } else { let _widestring_code = _widestring_ch - 0x1_0000; _widestring_buffer[_widestring_i] = 0xD800 | ((_widestring_code >> 10) as $crate::internals::core::primitive::u16); _widestring_buffer[_widestring_i + 1] = 0xDC00 | ((_widestring_code as $crate::internals::core::primitive::u16) & 0x3FF); _widestring_i += 2; } } _widestring_buffer }; #[allow(unused_unsafe)] unsafe { $crate::$str::$fn(&_WIDESTRING_U16_MACRO_UTF16) } }}; } } } implement_utf16_macro! { /// Converts a string literal into a `const` UTF-16 string slice of type /// [`Utf16Str`][crate::Utf16Str]. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{utf16str, Utf16Str, Utf16String}; /// /// const STRING: &Utf16Str = utf16str!("My string"); /// assert_eq!(Utf16String::from_str("My string"), STRING); /// # } /// ``` utf16str 0 Utf16Str from_slice_unchecked } implement_utf16_macro! { /// Converts a string literal into a `const` UTF-16 string slice of type /// [`U16Str`][crate::U16Str]. /// /// The resulting `const` string slice will always be valid UTF-16. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{u16str, U16Str, U16String}; /// /// const STRING: &U16Str = u16str!("My string"); /// assert_eq!(U16String::from_str("My string"), STRING); /// # } /// ``` u16str 0 U16Str from_slice } implement_utf16_macro! { /// Converts a string literal into a `const` UTF-16 string slice of type /// [`U16CStr`][crate::U16CStr]. /// /// The resulting `const` string slice will always be valid UTF-16 and include a nul terminator. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{u16cstr, U16CStr, U16CString}; /// /// const STRING: &U16CStr = u16cstr!("My string"); /// assert_eq!(U16CString::from_str("My string").unwrap(), STRING); /// # } /// ``` u16cstr 1 U16CStr from_slice_unchecked } macro_rules! implement_utf32_macro { ($(#[$m:meta])* $name:ident $extra_len:literal $str:ident $fn:ident) => { $(#[$m])* #[macro_export] macro_rules! $name { ($text:expr) => {{ const _WIDESTRING_U32_MACRO_UTF8: &$crate::internals::core::primitive::str = $text; const _WIDESTRING_U32_MACRO_LEN: $crate::internals::core::primitive::usize = $crate::internals::length_as_utf32(_WIDESTRING_U32_MACRO_UTF8) + $extra_len; const _WIDESTRING_U32_MACRO_UTF32: [$crate::internals::core::primitive::u32; _WIDESTRING_U32_MACRO_LEN] = { let mut _widestring_buffer: [$crate::internals::core::primitive::u32; _WIDESTRING_U32_MACRO_LEN] = [0; _WIDESTRING_U32_MACRO_LEN]; let mut _widestring_bytes = _WIDESTRING_U32_MACRO_UTF8.as_bytes(); let mut _widestring_i = 0; while let $crate::internals::core::option::Option::Some((_widestring_ch, _widestring_rest)) = $crate::internals::next_code_point(_widestring_bytes) { if $extra_len > 0 && _widestring_ch == 0 { panic!("invalid NUL value found in string literal"); } _widestring_bytes = _widestring_rest; _widestring_buffer[_widestring_i] = _widestring_ch; _widestring_i += 1; } _widestring_buffer }; #[allow(unused_unsafe)] unsafe { $crate::$str::$fn(&_WIDESTRING_U32_MACRO_UTF32) } }}; } } } implement_utf32_macro! { /// Converts a string literal into a `const` UTF-32 string slice of type /// [`Utf32Str`][crate::Utf32Str]. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{utf32str, Utf32Str, Utf32String}; /// /// const STRING: &Utf32Str = utf32str!("My string"); /// assert_eq!(Utf32String::from_str("My string"), STRING); /// # } /// ``` utf32str 0 Utf32Str from_slice_unchecked } implement_utf32_macro! { /// Converts a string literal into a `const` UTF-32 string slice of type /// [`U32Str`][crate::U32Str]. /// /// The resulting `const` string slice will always be valid UTF-32. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{u32str, U32Str, U32String}; /// /// const STRING: &U32Str = u32str!("My string"); /// assert_eq!(U32String::from_str("My string"), STRING); /// # } /// ``` u32str 0 U32Str from_slice } implement_utf32_macro! { /// Converts a string literal into a `const` UTF-32 string slice of type /// [`U32CStr`][crate::U32CStr]. /// /// The resulting `const` string slice will always be valid UTF-32 and include a nul terminator. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{u32cstr, U32CStr, U32CString}; /// /// const STRING: &U32CStr = u32cstr!("My string"); /// assert_eq!(U32CString::from_str("My string").unwrap(), STRING); /// # } /// ``` u32cstr 1 U32CStr from_slice_unchecked } /// Alias for [`u16str`] or [`u32str`] macros depending on platform. Intended to be used when using /// [`WideStr`][crate::WideStr] type alias. #[cfg(not(windows))] #[macro_export] macro_rules! widestr { ($text:expr) => {{ use $crate::*; u32str!($text) }}; } /// Alias for [`utf16str`] or [`utf32str`] macros depending on platform. Intended to be used when /// using [`WideUtfStr`][crate::WideUtfStr] type alias. #[cfg(not(windows))] #[macro_export] macro_rules! wideutfstr { ($text:expr) => {{ use $crate::*; utf32str!($text) }}; } /// Alias for [`u16cstr`] or [`u32cstr`] macros depending on platform. Intended to be used when /// using [`WideCStr`][crate::WideCStr] type alias. #[cfg(not(windows))] #[macro_export] macro_rules! widecstr { ($text:expr) => {{ use $crate::*; u32cstr!($text) }}; } /// Alias for [`u16str`] or [`u32str`] macros depending on platform. Intended to be used when using /// [`WideStr`][crate::WideStr] type alias. #[cfg(windows)] #[macro_export] macro_rules! widestr { ($text:expr) => {{ use $crate::*; u16str!($text) }}; } /// Alias for [`utf16str`] or [`utf32str`] macros depending on platform. Intended to be used when /// using [`WideUtfStr`][crate::WideUtfStr] type alias. #[cfg(windows)] #[macro_export] macro_rules! wideutfstr { ($text:expr) => {{ use $crate::*; utf16str!($text) }}; } /// Alias for [`u16cstr`] or [`u32cstr`] macros depending on platform. Intended to be used when /// using [`WideCStr`][crate::WideCStr] type alias. #[cfg(windows)] #[macro_export] macro_rules! widecstr { ($text:expr) => {{ use $crate::*; u16cstr!($text) }}; } /// Includes a UTF-16 encoded file as a [`Utf16Str`][crate::Utf16Str]. /// /// This uses [`include_bytes`](core::include_bytes) to accomplish this. /// /// # Examples /// /// ``` /// # #[cfg(feature = "alloc")] { /// use widestring::{include_utf16str, Utf16Str, Utf16String}; /// /// const STRING: &Utf16Str = include_utf16str!("example.txt"); /// assert_eq!(Utf16String::from_str("My string"), STRING); /// # } /// ``` #[macro_export] macro_rules! include_utf16str { ($text:expr) => {{ const _WIDESTRING_U16_INCLUDE_MACRO_U8: &[$crate::internals::core::primitive::u8] = $crate::internals::core::include_bytes!($text); const _WIDESTRING_U16_INCLUDE_MACRO_LEN: $crate::internals::core::primitive::usize = { let _widestring_len = <[$crate::internals::core::primitive::u8]>::len(_WIDESTRING_U16_INCLUDE_MACRO_U8); if _widestring_len % $crate::internals::core::mem::size_of::() != 0 { panic!("file not encoded as UTF-16") } _widestring_len / 2 }; const _WIDESTRING_U16_INCLUDE_MACRO_UTF16: ( [$crate::internals::core::primitive::u16; _WIDESTRING_U16_INCLUDE_MACRO_LEN], bool, bool, ) = { let mut _widestring_buffer: [$crate::internals::core::primitive::u16; _WIDESTRING_U16_INCLUDE_MACRO_LEN] = [0; _WIDESTRING_U16_INCLUDE_MACRO_LEN]; let mut _widestring_bytes = _WIDESTRING_U16_INCLUDE_MACRO_U8; let mut _widestring_i = 0; let mut _widestring_decode = $crate::internals::DecodeUtf16 { bom: $crate::internals::core::option::Option::None, eof: false, next: $crate::internals::core::option::Option::None, forward_buf: $crate::internals::core::option::Option::None, back_buf: $crate::internals::core::option::Option::None, }; loop { match $crate::internals::DecodeUtf16::next_code_point( _widestring_decode, _widestring_bytes, ) { Ok((_widestring_new_decode, _widestring_ch, _widestring_rest)) => { _widestring_decode = _widestring_new_decode; _widestring_bytes = _widestring_rest; _widestring_buffer[_widestring_i] = _widestring_ch; _widestring_i += 1; } Err(_widestring_new_decode) => { _widestring_decode = _widestring_new_decode; break; } } } ( _widestring_buffer, if let Some(Some(_)) = _widestring_decode.bom { true } else { false }, _widestring_decode.eof, ) }; const _WIDESTRING_U16_INCLUDE_MACRO_UTF16_TRIMMED: &[$crate::internals::core::primitive::u16] = { match &_WIDESTRING_U16_INCLUDE_MACRO_UTF16 { (buffer, false, false) => buffer, ([_bom, rest @ ..], true, false) => rest, ([rest @ .., _eof], false, true) => rest, ([_bom, rest @ .., _eof], true, true) => rest, } }; #[allow(unused_unsafe)] unsafe { $crate::Utf16Str::from_slice_unchecked(_WIDESTRING_U16_INCLUDE_MACRO_UTF16_TRIMMED) } }}; } #[doc(hidden)] #[allow(missing_debug_implementations)] pub mod internals { pub use core; // A const implementation of https://github.com/rust-lang/rust/blob/d902752866cbbdb331e3cf28ff6bba86ab0f6c62/library/core/src/str/mod.rs#L509-L537 // Assumes `utf8` is a valid &str pub const fn next_code_point(utf8: &[u8]) -> Option<(u32, &[u8])> { const CONT_MASK: u8 = 0b0011_1111; match utf8 { [one @ 0..=0b0111_1111, rest @ ..] => Some((*one as u32, rest)), [one @ 0b1100_0000..=0b1101_1111, two, rest @ ..] => Some(( (((*one & 0b0001_1111) as u32) << 6) | ((*two & CONT_MASK) as u32), rest, )), [one @ 0b1110_0000..=0b1110_1111, two, three, rest @ ..] => Some(( (((*one & 0b0000_1111) as u32) << 12) | (((*two & CONT_MASK) as u32) << 6) | ((*three & CONT_MASK) as u32), rest, )), [one, two, three, four, rest @ ..] => Some(( (((*one & 0b0000_0111) as u32) << 18) | (((*two & CONT_MASK) as u32) << 12) | (((*three & CONT_MASK) as u32) << 6) | ((*four & CONT_MASK) as u32), rest, )), [..] => None, } } pub enum BoM { Little, Big, } pub struct DecodeUtf16 { pub bom: Option>, pub eof: bool, pub next: Option, pub forward_buf: Option, pub back_buf: Option, } impl DecodeUtf16 { pub const fn next_code_point( mut self, mut utf16: &[u8], ) -> Result<(Self, u16, &[u8]), Self> { if let [one, two] = utf16 { if u16::from_le_bytes([*one, *two]) == 0x0000 { self.eof = true; } } if self.bom.is_none() { if let [one, two, ..] = utf16 { let ch = u16::from_le_bytes([*one, *two]); if ch == 0xfeff { self.bom = Some(Some(BoM::Little)); } else if ch == 0xfffe { self.bom = Some(Some(BoM::Big)); } else { self.bom = Some(None); } } } // Copied from `DecodeUtf16` if let Some(u) = self.next { self.next = None; return Ok((self, u, utf16)); } let u = if let Some(u) = self.forward_buf { self.forward_buf = None; u } else if let [one, two, rest @ ..] = utf16 { utf16 = rest; match self.bom { Some(Some(BoM::Big)) => u16::from_be_bytes([*one, *two]), _ => u16::from_le_bytes([*one, *two]), } } else if let Some(u) = self.back_buf { self.back_buf = None; u } else { return Err(self); }; if !crate::is_utf16_surrogate(u) { Ok((self, u, utf16)) } else if crate::is_utf16_low_surrogate(u) { panic!("unpaired surrogate found") } else { let u2 = if let [one, two, rest @ ..] = utf16 { utf16 = rest; match self.bom { Some(Some(BoM::Big)) => u16::from_be_bytes([*one, *two]), _ => u16::from_le_bytes([*one, *two]), } } else if let Some(u) = self.back_buf { self.back_buf = None; u } else { panic!("unpaired surrogate found") }; if !crate::is_utf16_low_surrogate(u2) { panic!("unpaired surrogate found") } self.next = Some(u2); Ok((self, u, utf16)) } } } // A const implementation of `s.chars().map(|ch| ch.len_utf16()).sum()` pub const fn length_as_utf16(s: &str) -> usize { let mut bytes = s.as_bytes(); let mut len = 0; while let Some((ch, rest)) = next_code_point(bytes) { bytes = rest; len += if (ch & 0xFFFF) == ch { 1 } else { 2 }; } len } // A const implementation of `s.chars().len()` pub const fn length_as_utf32(s: &str) -> usize { let mut bytes = s.as_bytes(); let mut len = 0; while let Some((_, rest)) = next_code_point(bytes) { bytes = rest; len += 1; } len } } #[cfg(all(test, feature = "alloc"))] mod test { use crate::{ U16CStr, U16Str, U32CStr, U32Str, Utf16Str, Utf16String, Utf32Str, Utf32String, WideCStr, WideStr, WideString, }; const UTF16STR_TEST: &Utf16Str = utf16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const UTF16STR_INCLUDE_LE_TEST: &Utf16Str = include_utf16str!("test_le.txt"); const UTF16STR_INCLUDE_BE_TEST: &Utf16Str = include_utf16str!("test_be.txt"); const U16STR_TEST: &U16Str = u16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const U16CSTR_TEST: &U16CStr = u16cstr!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const UTF32STR_TEST: &Utf32Str = utf32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const U32STR_TEST: &U32Str = u32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const U32CSTR_TEST: &U32CStr = u32cstr!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const WIDESTR_TEST: &WideStr = widestr!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); const WIDECSTR_TEST: &WideCStr = widecstr!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); #[test] fn str_macros() { let str = Utf16String::from_str("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); assert_eq!(&str, UTF16STR_TEST); assert_eq!(&str, UTF16STR_INCLUDE_LE_TEST); assert_eq!(&str, UTF16STR_INCLUDE_BE_TEST); assert_eq!(&str, U16STR_TEST); assert_eq!(&str, U16CSTR_TEST); assert!(matches!(U16CSTR_TEST.as_slice_with_nul().last(), Some(&0))); let str = Utf32String::from_str("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); assert_eq!(&str, UTF32STR_TEST); assert_eq!(&str, U32STR_TEST); assert_eq!(&str, U32CSTR_TEST); assert!(matches!(U32CSTR_TEST.as_slice_with_nul().last(), Some(&0))); let str = WideString::from_str("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); assert_eq!(&str, WIDESTR_TEST); assert_eq!(&str, WIDECSTR_TEST); assert!(matches!(WIDECSTR_TEST.as_slice_with_nul().last(), Some(&0))); } } widestring-1.1.0/src/platform/mod.rs000064400000000000000000000002431046102023000155410ustar 00000000000000#[cfg(windows)] mod windows; #[cfg(windows)] pub(crate) use self::windows::*; #[cfg(not(windows))] mod other; #[cfg(not(windows))] pub(crate) use self::other::*; widestring-1.1.0/src/platform/other.rs000064400000000000000000000003561046102023000161100ustar 00000000000000use std::ffi::{OsStr, OsString}; pub(crate) fn os_to_wide(s: &OsStr) -> Vec { s.to_string_lossy().encode_utf16().collect() } pub(crate) fn os_from_wide(s: &[u16]) -> OsString { OsString::from(String::from_utf16_lossy(s)) } widestring-1.1.0/src/platform/windows.rs000064400000000000000000000004131046102023000164530ustar 00000000000000#![cfg(windows)] use std::ffi::{OsStr, OsString}; use std::os::windows::ffi::{OsStrExt, OsStringExt}; pub(crate) fn os_to_wide(s: &OsStr) -> Vec { s.encode_wide().collect() } pub(crate) fn os_from_wide(s: &[u16]) -> OsString { OsString::from_wide(s) } widestring-1.1.0/src/test_be.txt000064400000000000000000000000301046102023000147500ustar 00000000000000&งุ<฿๓ &ง'กswidestring-1.1.0/src/test_le.txt000064400000000000000000000000301046102023000147620ustar 00000000000000ง&<ุ๓฿ ง&ก'swidestring-1.1.0/src/ucstr.rs000064400000000000000000002504231046102023000143050ustar 00000000000000//! C-style wide string slices. //! //! This module contains wide C string slices and related types. use crate::{ error::{ContainsNul, MissingNulTerminator, NulError}, U16Str, U32Str, }; #[cfg(feature = "alloc")] #[allow(unused_imports)] use alloc::{borrow::ToOwned, boxed::Box, string::String}; use core::{ fmt::Write, ops::{Index, Range}, slice::{self, SliceIndex}, }; #[doc(inline)] pub use crate::ustr::{ CharIndicesLossyUtf16, CharIndicesLossyUtf32, CharIndicesUtf16, CharIndicesUtf32, CharsLossyUtf16, CharsLossyUtf32, CharsUtf16, CharsUtf32, }; macro_rules! ucstr_common_impl { { $(#[$ucstr_meta:meta])* struct $ucstr:ident([$uchar:ty]); type UCString = $ucstring:ident; type UStr = $ustr:ident; type UString = $ustring:ident; $(#[$to_ustring_meta:meta])* fn to_ustring() -> {} $(#[$into_ucstring_meta:meta])* fn into_ucstring() -> {} $(#[$display_meta:meta])* fn display() -> {} } => { $(#[$ucstr_meta])* #[allow(clippy::derive_hash_xor_eq)] #[derive(PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct $ucstr { inner: [$uchar], } impl $ucstr { /// The nul terminator character value. pub const NUL_TERMINATOR: $uchar = 0; /// Coerces a value into a wide C string slice. #[inline] #[must_use] pub fn new + ?Sized>(s: &S) -> &Self { s.as_ref() } /// Constructs a wide C string slice from a nul-terminated string pointer. /// /// This will scan for nul values beginning with `p`. The first nul value will be used /// as the nul terminator for the string, similar to how libc string functions such as /// `strlen` work. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or /// has a nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. In particular, the returned string reference *must not /// be mutated* for the duration of lifetime `'a`, except inside an /// [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[must_use] pub unsafe fn from_ptr_str<'a>(p: *const $uchar) -> &'a Self { assert!(!p.is_null()); let mut i = 0; while *p.add(i) != Self::NUL_TERMINATOR { i += 1; } Self::from_ptr_unchecked(p, i) } /// Constructs a mutable wide C string slice from a mutable nul-terminated string /// pointer. /// /// This will scan for nul values beginning with `p`. The first nul value will be used /// as the nul terminator for the string, similar to how libc string functions such as /// `strlen` work. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or /// has a nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[must_use] pub unsafe fn from_ptr_str_mut<'a>(p: *mut $uchar) -> &'a mut Self { assert!(!p.is_null()); let mut i = 0; while *p.add(i) != Self::NUL_TERMINATOR { i += 1; } Self::from_ptr_unchecked_mut(p, i) } /// Constructs a wide C string slice from a pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and /// means that `p` is a pointer directly to the nul terminator of the string. /// /// # Errors /// /// This will scan the pointer string for an interior nul value and error if one is /// found before the nul terminator at `len` offset. To avoid scanning for interior /// nuls, [`from_ptr_unchecked`][Self::from_ptr_unchecked] may be used instead. /// /// An error is returned if the value at `len` offset is not a nul terminator. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len + 1` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. In particular, the returned string reference *must not /// be mutated* for the duration of lifetime `'a`, except inside an /// [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. pub unsafe fn from_ptr<'a>( p: *const $uchar, len: usize, ) -> Result<&'a Self, NulError<$uchar>> { assert!(!p.is_null()); if *p.add(len) != Self::NUL_TERMINATOR { return Err(MissingNulTerminator::new().into()); } for i in 0..len { if *p.add(i) == Self::NUL_TERMINATOR { return Err(ContainsNul::empty(i).into()); } } Ok(Self::from_ptr_unchecked(p, len)) } /// Constructs a mutable wide C string slice from a mutable pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and /// means that `p` is a pointer directly to the nul terminator of the string. /// /// # Errors /// /// This will scan the pointer string for an interior nul value and error if one is /// found before the nul terminator at `len` offset. To avoid scanning for interior /// nuls, [`from_ptr_unchecked_mut`][Self::from_ptr_unchecked_mut] may be used instead. /// /// An error is returned if the value at `len` offset is not a nul terminator. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len + 1` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. pub unsafe fn from_ptr_mut<'a>( p: *mut $uchar, len: usize, ) -> Result<&'a mut Self, NulError<$uchar>> { assert!(!p.is_null()); if *p.add(len) != Self::NUL_TERMINATOR { return Err(MissingNulTerminator::new().into()); } for i in 0..len { if *p.add(i) == Self::NUL_TERMINATOR { return Err(ContainsNul::empty(i).into()); } } Ok(Self::from_ptr_unchecked_mut(p, len)) } /// Constructs a wide C string slice from a pointer and a length, truncating at the /// first nul terminator. /// /// The `len` argument is the number of elements, **not** the number of bytes. This will /// scan for nul values beginning with `p` until offset `len`. The first nul value will /// be used as the nul terminator for the string, ignoring any remaining values left /// before `len`. /// /// # Errors /// /// If no nul terminator is found after `len` + 1 elements, an error is returned. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or /// has a nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. In particular, the returned string reference *must not /// be mutated* for the duration of lifetime `'a`, except inside an /// [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for thev string, or by explicit annotation. pub unsafe fn from_ptr_truncate<'a>( p: *const $uchar, len: usize, ) -> Result<&'a Self, MissingNulTerminator> { assert!(!p.is_null()); for i in 0..=len { if *p.add(i) == Self::NUL_TERMINATOR { return Ok(Self::from_ptr_unchecked(p, i)); } } Err(MissingNulTerminator::new()) } /// Constructs a mutable wide C string slice from a mutable pointer and a length, /// truncating at the first nul terminator. /// /// The `len` argument is the number of elements, **not** the number of bytes. This will /// scan for nul values beginning with `p` until offset `len`. The first nul value will /// be used as the nul terminator for the string, ignoring any remaining values left /// before `len`. /// /// # Errors /// /// If no nul terminator is found after `len` + 1 elements, an error is returned. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or /// has a nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. pub unsafe fn from_ptr_truncate_mut<'a>( p: *mut $uchar, len: usize, ) -> Result<&'a mut Self, MissingNulTerminator> { assert!(!p.is_null()); for i in 0..=len { if *p.add(i) == Self::NUL_TERMINATOR { return Ok(Self::from_ptr_unchecked_mut(p, i)); } } Err(MissingNulTerminator::new()) } /// Constructs a wide C string slice from a pointer and a length without checking for /// any nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and /// means that `p` is a pointer directly to the nul terminator of the string. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len + 1` elements, nor that it has a terminating nul value. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. In particular, the returned string reference *must not /// be mutated* for the duration of lifetime `'a`, except inside an /// [`UnsafeCell`][std::cell::UnsafeCell]. /// /// The interior values of the pointer are not scanned for nul. Any interior nul values /// or a missing nul terminator at pointer offset `len` + 1 will result in an invalid /// string slice. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[must_use] pub unsafe fn from_ptr_unchecked<'a>(p: *const $uchar, len: usize) -> &'a Self { assert!(!p.is_null()); let ptr: *const [$uchar] = slice::from_raw_parts(p, len + 1); &*(ptr as *const Self) } /// Constructs a mutable wide C string slice from a mutable pointer and a length without /// checking for any nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and /// means that `p` is a pointer directly to the nul terminator of the string. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len + 1` elements, nor that is has a terminating nul value. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts_mut]. /// /// The interior values of the pointer are not scanned for nul. Any interior nul values /// or a missing nul terminator at pointer offset `len` + 1 will result in an invalid /// string slice. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[must_use] pub unsafe fn from_ptr_unchecked_mut<'a>(p: *mut $uchar, len: usize) -> &'a mut Self { assert!(!p.is_null()); let ptr: *mut [$uchar] = slice::from_raw_parts_mut(p, len + 1); &mut *(ptr as *mut Self) } /// Constructs a wide C string slice from a slice of values with a terminating nul, /// checking for invalid interior nul values. /// /// The slice must have at least one item, the nul terminator, even for an empty string. /// /// # Errors /// /// If there are nul values in the slice except for the last value, an error is /// returned. /// /// An error is also returned if the last value of the slice is not a nul terminator. pub fn from_slice(slice: &[$uchar]) -> Result<&Self, NulError<$uchar>> { if slice.last() != Some(&Self::NUL_TERMINATOR) { return Err(MissingNulTerminator::new().into()); } match slice[..slice.len() - 1] .iter() .position(|x| *x == Self::NUL_TERMINATOR) { None => Ok(unsafe { Self::from_slice_unchecked(slice) }), Some(i) => Err(ContainsNul::empty(i).into()), } } /// Constructs a mutable wide C string slice from a mutable slice of values with a /// terminating nul, checking for invalid interior nul values. /// /// The slice must have at least one item, the nul terminator, even for an empty string. /// /// # Errors /// /// If there are nul values in the slice except for the last value, an error is /// returned. /// /// An error is also returned if the last value of the slice is not a nul terminator. pub fn from_slice_mut(slice: &mut [$uchar]) -> Result<&mut Self, NulError<$uchar>> { if slice.last() != Some(&Self::NUL_TERMINATOR) { return Err(MissingNulTerminator::new().into()); } match slice[..slice.len() - 1] .iter() .position(|x| *x == Self::NUL_TERMINATOR) { None => Ok(unsafe { Self::from_slice_unchecked_mut(slice) }), Some(i) => Err(ContainsNul::empty(i).into()), } } /// Constructs a wide C string slice from a slice of values, truncating at the first nul /// terminator. /// /// The slice will be scanned for nul values. When a nul value is found, it is treated /// as the terminator for the string, and the string slice will be truncated to that /// nul. /// /// # Errors /// /// If there are no nul values in the slice, an error is returned. pub fn from_slice_truncate(slice: &[$uchar]) -> Result<&Self, MissingNulTerminator> { match slice.iter().position(|x| *x == Self::NUL_TERMINATOR) { None => Err(MissingNulTerminator::new()), Some(i) => Ok(unsafe { Self::from_slice_unchecked(&slice[..i + 1]) }), } } /// Constructs a mutable wide C string slice from a mutable slice of values, truncating /// at the first nul terminator. /// /// The slice will be scanned for nul values. When a nul value is found, it is treated /// as the terminator for the string, and the string slice will be truncated to that /// nul. /// /// # Errors /// /// If there are no nul values in the slice, an error is returned. pub fn from_slice_truncate_mut( slice: &mut [$uchar], ) -> Result<&mut Self, MissingNulTerminator> { match slice.iter().position(|x| *x == Self::NUL_TERMINATOR) { None => Err(MissingNulTerminator::new()), Some(i) => Ok(unsafe { Self::from_slice_unchecked_mut(&mut slice[..i + 1]) }), } } /// Constructs a wide C string slice from a slice of values without checking for a /// terminating or interior nul values. /// /// # Safety /// /// This function is unsafe because it can lead to invalid string slice values when the /// slice is missing a terminating nul value or there are non-terminating interior nul /// values in the slice. In particular, an empty slice will result in an invalid /// string slice. #[must_use] pub const unsafe fn from_slice_unchecked(slice: &[$uchar]) -> &Self { let ptr: *const [$uchar] = slice; &*(ptr as *const Self) } /// Constructs a mutable wide C string slice from a mutable slice of values without /// checking for a terminating or interior nul values. /// /// # Safety /// /// This function is unsafe because it can lead to invalid string slice values when the /// slice is missing a terminating nul value or there are non-terminating interior nul /// values in the slice. In particular, an empty slice will result in an invalid /// string slice. #[must_use] pub unsafe fn from_slice_unchecked_mut(slice: &mut [$uchar]) -> &mut Self { let ptr: *mut [$uchar] = slice; &mut *(ptr as *mut Self) } /// Copies the string reference to a new owned wide C string. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_ucstring(&self) -> crate::$ucstring { unsafe { crate::$ucstring::from_vec_unchecked(self.inner.to_owned()) } } $(#[$to_ustring_meta])* #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_ustring(&self) -> crate::$ustring { crate::$ustring::from_vec(self.as_slice()) } /// Converts to a slice of the underlying elements. /// /// The slice will **not** include the nul terminator. #[inline] #[must_use] pub fn as_slice(&self) -> &[$uchar] { &self.inner[..self.len()] } /// Converts to a mutable slice of the underlying elements. /// /// The slice will **not** include the nul terminator. /// /// # Safety /// /// This method is unsafe because you can violate the invariants of this type when /// mutating the slice (i.e. by adding interior nul values). #[inline] #[must_use] pub unsafe fn as_mut_slice(&mut self) -> &mut [$uchar] { let len = self.len(); &mut self.inner[..len] } /// Converts to a slice of the underlying elements, including the nul terminator. #[inline] #[must_use] pub const fn as_slice_with_nul(&self) -> &[$uchar] { &self.inner } /// Returns a raw pointer to the string. /// /// The caller must ensure that the string outlives the pointer this function returns, /// or else it will end up pointing to garbage. /// /// The caller must also ensure that the memory the pointer (non-transitively) points to /// is never written to (except inside an `UnsafeCell`) using this pointer or any /// pointer derived from it. If you need to mutate the contents of the string, use /// [`as_mut_ptr`][Self::as_mut_ptr]. /// /// Modifying the container referenced by this string may cause its buffer to be /// reallocated, which would also make any pointers to it invalid. #[inline] #[must_use] pub const fn as_ptr(&self) -> *const $uchar { self.inner.as_ptr() } /// Returns a mutable raw pointer to the string. /// /// The caller must ensure that the string outlives the pointer this function returns, /// or else it will end up pointing to garbage. /// /// Modifying the container referenced by this string may cause its buffer to be /// reallocated, which would also make any pointers to it invalid. #[inline] #[must_use] pub fn as_mut_ptr(&mut self) -> *mut $uchar { self.inner.as_mut_ptr() } /// Returns the two raw pointers spanning the string slice. /// /// The returned range is half-open, which means that the end pointer points one past /// the last element of the slice. This way, an empty slice is represented by two equal /// pointers, and the difference between the two pointers represents the size of the /// slice. /// /// See [`as_ptr`][Self::as_ptr] for warnings on using these pointers. The end pointer /// requires extra caution, as it does not point to a valid element in the slice. /// /// This function is useful for interacting with foreign interfaces which use two /// pointers to refer to a range of elements in memory, as is common in C++. #[inline] #[must_use] pub fn as_ptr_range(&self) -> Range<*const $uchar> { self.inner.as_ptr_range() } /// Returns the two unsafe mutable pointers spanning the string slice. /// /// The returned range is half-open, which means that the end pointer points one past /// the last element of the slice. This way, an empty slice is represented by two equal /// pointers, and the difference between the two pointers represents the size of the /// slice. /// /// See [`as_mut_ptr`][Self::as_mut_ptr] for warnings on using these pointers. The end /// pointer requires extra caution, as it does not point to a valid element in the /// slice. /// /// This function is useful for interacting with foreign interfaces which use two /// pointers to refer to a range of elements in memory, as is common in C++. #[inline] #[must_use] pub fn as_mut_ptr_range(&mut self) -> Range<*mut $uchar> { self.inner.as_mut_ptr_range() } /// Returns the length of the string as number of elements (**not** number of bytes) /// **not** including nul terminator. #[inline] #[must_use] pub const fn len(&self) -> usize { self.inner.len() - 1 } /// Returns whether this string contains no data (i.e. is only the nul terminator). #[inline] #[must_use] pub const fn is_empty(&self) -> bool { self.len() == 0 } $(#[$into_ucstring_meta])* #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_ucstring(self: Box) -> crate::$ucstring { let raw = Box::into_raw(self) as *mut [$uchar]; crate::$ucstring { inner: unsafe { Box::from_raw(raw) }, } } /// Returns a wide string slice to this wide C string slice. /// /// The wide string slice will *not* include the nul-terminator. #[inline] #[must_use] pub fn as_ustr(&self) -> &$ustr { $ustr::from_slice(self.as_slice()) } /// Returns a wide string slice to this wide C string slice. /// /// The wide string slice will include the nul-terminator. #[inline] #[must_use] pub fn as_ustr_with_nul(&self) -> &$ustr { $ustr::from_slice(self.as_slice_with_nul()) } /// Returns a mutable wide string slice to this wide C string slice. /// /// The wide string slice will *not* include the nul-terminator. /// /// # Safety /// /// This method is unsafe because you can violate the invariants of this type when /// mutating the string (i.e. by adding interior nul values). #[inline] #[must_use] pub unsafe fn as_mut_ustr(&mut self) -> &mut $ustr { $ustr::from_slice_mut(self.as_mut_slice()) } #[cfg(feature = "alloc")] pub(crate) fn from_inner(slice: &[$uchar]) -> &$ucstr { let ptr: *const [$uchar] = slice; unsafe { &*(ptr as *const $ucstr) } } #[cfg(feature = "alloc")] pub(crate) fn from_inner_mut(slice: &mut [$uchar]) -> &mut $ucstr { let ptr: *mut [$uchar] = slice; unsafe { &mut *(ptr as *mut $ucstr) } } $(#[$display_meta])* #[inline] #[must_use] pub fn display(&self) -> Display<'_, $ucstr> { Display { str: self } } /// Returns a subslice of the string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] /// whenever equivalent indexing operation would panic. #[inline] #[must_use] pub fn get(&self, i: I) -> Option<&$ustr> where I: SliceIndex<[$uchar], Output = [$uchar]>, { self.as_slice().get(i).map($ustr::from_slice) } /// Returns a mutable subslice of the string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] /// whenever equivalent indexing operation would panic. /// /// # Safety /// /// This method is unsafe because you can violate the invariants of this type when /// mutating the memory the pointer points to (i.e. by adding interior nul values). #[inline] #[must_use] pub unsafe fn get_mut(&mut self, i: I) -> Option<&mut $ustr> where I: SliceIndex<[$uchar], Output = [$uchar]>, { self.as_mut_slice().get_mut(i).map($ustr::from_slice_mut) } /// Returns an unchecked subslice of the string. /// /// This is the unchecked alternative to indexing the string. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice. /// /// Failing that, the returned string slice may reference invalid memory. #[inline] #[must_use] pub unsafe fn get_unchecked(&self, i: I) -> &$ustr where I: SliceIndex<[$uchar], Output = [$uchar]>, { $ustr::from_slice(self.as_slice().get_unchecked(i)) } /// Returns aa mutable, unchecked subslice of the string. /// /// This is the unchecked alternative to indexing the string. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice. /// /// Failing that, the returned string slice may reference invalid memory. /// /// This method is unsafe because you can violate the invariants of this type when /// mutating the memory the pointer points to (i.e. by adding interior nul values). #[inline] #[must_use] pub unsafe fn get_unchecked_mut(&mut self, i: I) -> &mut $ustr where I: SliceIndex<[$uchar], Output = [$uchar]>, { $ustr::from_slice_mut(self.as_mut_slice().get_unchecked_mut(i)) } /// Divide one string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. /// /// The two slices returned go from the start of the string slice to `mid`, and from /// `mid` to the end of the string slice. /// /// To get mutable string slices instead, see the [`split_at_mut`][Self::split_at_mut] /// method. #[inline] #[must_use] pub fn split_at(&self, mid: usize) -> (&$ustr, &$ustr) { let split = self.as_slice().split_at(mid); ($ustr::from_slice(split.0), $ustr::from_slice(split.1)) } /// Divide one mutable string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. /// /// The two slices returned go from the start of the string slice to `mid`, and from /// `mid` to the end of the string slice. /// /// To get immutable string slices instead, see the [`split_at`][Self::split_at] method. /// /// # Safety /// /// This method is unsafe because you can violate the invariants of this type when /// mutating the memory the pointer points to (i.e. by adding interior nul values). #[inline] #[must_use] pub unsafe fn split_at_mut(&mut self, mid: usize) -> (&mut $ustr, &mut $ustr) { let split = self.as_mut_slice().split_at_mut(mid); ($ustr::from_slice_mut(split.0), $ustr::from_slice_mut(split.1)) } /// Creates a new owned string by repeating this string `n` times. /// /// # Panics /// /// This function will panic if the capacity would overflow. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn repeat(&self, n: usize) -> crate::$ucstring { unsafe { crate::$ucstring::from_vec_unchecked(self.as_slice().repeat(n)) } } } impl AsMut<$ucstr> for $ucstr { #[inline] fn as_mut(&mut self) -> &mut $ucstr { self } } impl AsRef<$ucstr> for $ucstr { #[inline] fn as_ref(&self) -> &Self { self } } impl AsRef<[$uchar]> for $ucstr { #[inline] fn as_ref(&self) -> &[$uchar] { self.as_slice() } } impl AsRef<$ustr> for $ucstr { #[inline] fn as_ref(&self) -> &$ustr { self.as_ustr() } } impl<'a> Default for &'a $ucstr { #[inline] fn default() -> Self { const SLICE: &[$uchar] = &[$ucstr::NUL_TERMINATOR]; unsafe { $ucstr::from_slice_unchecked(SLICE) } } } #[cfg(feature = "alloc")] impl Default for Box<$ucstr> { #[inline] fn default() -> Box<$ucstr> { let boxed: Box<[$uchar]> = Box::from([$ucstr::NUL_TERMINATOR]); unsafe { Box::from_raw(Box::into_raw(boxed) as *mut $ucstr) } } } #[cfg(feature = "alloc")] impl<'a> From<&'a $ucstr> for Box<$ucstr> { #[inline] fn from(s: &'a $ucstr) -> Box<$ucstr> { let boxed: Box<[$uchar]> = Box::from(s.as_slice_with_nul()); unsafe { Box::from_raw(Box::into_raw(boxed) as *mut $ucstr) } } } #[cfg(feature = "std")] impl From<&$ucstr> for std::ffi::OsString { #[inline] fn from(s: &$ucstr) -> std::ffi::OsString { s.to_os_string() } } impl Index for $ucstr where I: SliceIndex<[$uchar], Output = [$uchar]>, { type Output = $ustr; #[inline] fn index(&self, index: I) -> &Self::Output { $ustr::from_slice(&self.as_slice()[index]) } } impl PartialEq<$ucstr> for &$ucstr { #[inline] fn eq(&self, other: &$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<&$ucstr> for $ucstr { #[inline] fn eq(&self, other: &&$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$ustr> for $ucstr { #[inline] fn eq(&self, other: &$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$ustr> for &$ucstr { #[inline] fn eq(&self, other: &$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<&$ustr> for $ucstr { #[inline] fn eq(&self, other: &&$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialOrd<$ustr> for $ucstr { #[inline] fn partial_cmp(&self, other: &$ustr) -> Option { self.as_ustr().partial_cmp(other) } } }; } ucstr_common_impl! { /// C-style 16-bit wide string slice for [`U16CString`][crate::U16CString]. /// /// [`U16CStr`] is to [`U16CString`][crate::U16CString] as [`CStr`][std::ffi::CStr] is to /// [`CString`][std::ffi::CString]. /// /// [`U16CStr`] are string slices that do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-16 data, they may be used for /// any wide encoded string. /// /// # Nul termination /// /// [`U16CStr`] is aware of nul (`0`) values. Unless unchecked conversions are used, all /// [`U16CStr`] strings end with a nul-terminator in the underlying buffer and contain no /// internal nul values. These strings are intended to be used with C FFI functions that /// require nul-terminated strings. /// /// Because of the nul termination requirement, multiple classes methods for provided for /// construction a [`U16CStr`] under various scenarios. By default, methods such as /// [`from_ptr`][Self::from_ptr] and [`from_slice`][Self::from_slice] return an error if the /// input does not terminate with a nul value, or if it contains any interior nul values before /// the terminator. /// /// `_truncate` methods on the other hand, such as /// [`from_ptr_truncate`][Self::from_ptr_truncate] and /// [`from_slice_truncate`][Self::from_slice_truncate], construct a slice that terminates with /// the first nul value encountered in the string, only returning an error if the slice contains /// no nul values at all. Use this to mimic the behavior of C functions such as `strlen` when /// you don't know if the input is clean of interior nuls. /// /// Finally, unsafe `_unchecked` variants of these methods, such as /// [`from_ptr_unchecked`][Self::from_ptr_unchecked] and /// [`from_slice_unchecked`][Self::from_slice_unchecked] allow bypassing any checks for nul /// values, when the input has already been ensured to have a nul terminator and no interior /// nul values. /// /// # Examples /// /// The easiest way to use [`U16CStr`] outside of FFI is with the [`u16cstr!`][crate::u16cstr] /// macro to convert string literals into nul-terminated UTF-16 string slices at compile time: /// /// ``` /// use widestring::u16cstr; /// let hello = u16cstr!("Hello, world!"); /// ``` /// /// You can also convert any [`u16`] slice directly, as long as it has a nul terminator: /// /// ``` /// use widestring::{u16cstr, U16CStr}; /// /// let sparkle_heart = [0xd83d, 0xdc96, 0x0]; /// let sparkle_heart = U16CStr::from_slice(&sparkle_heart).unwrap(); /// /// assert_eq!(u16cstr!("๐Ÿ’–"), sparkle_heart); /// /// // This unpaired UTf-16 surrogate is invalid UTF-16, but is perfectly valid in U16CStr /// let malformed_utf16 = [0xd83d, 0x0]; /// let s = U16CStr::from_slice(&malformed_utf16).unwrap(); /// /// assert_eq!(s.len(), 1); /// ``` /// /// When working with a FFI, it is useful to create a [`U16CStr`] from a pointer: /// /// ``` /// use widestring::{u16cstr, U16CStr}; /// /// let sparkle_heart = [0xd83d, 0xdc96, 0x0]; /// let s = unsafe { /// // Note the string and pointer length does not include the nul terminator /// U16CStr::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len() - 1).unwrap() /// }; /// assert_eq!(u16cstr!("๐Ÿ’–"), s); /// /// // Alternatively, if the length of the pointer is unknown but definitely terminates in nul, /// // a C-style string version can be used /// let s = unsafe { U16CStr::from_ptr_str(sparkle_heart.as_ptr()) }; /// /// assert_eq!(u16cstr!("๐Ÿ’–"), s); /// ``` struct U16CStr([u16]); type UCString = U16CString; type UStr = U16Str; type UString = U16String; /// Copies the string reference to a new owned wide string. /// /// The resulting wide string will **not** have a nul terminator. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let wcstr = U16CString::from_str("MyString").unwrap(); /// // Convert U16CString to a U16String /// let wstr = wcstr.to_ustring(); /// /// // U16CString will have a terminating nul /// let wcvec = wcstr.into_vec_with_nul(); /// assert_eq!(wcvec[wcvec.len()-1], 0); /// // The resulting U16String will not have the terminating nul /// let wvec = wstr.into_vec(); /// assert_ne!(wvec[wvec.len()-1], 0); /// ``` fn to_ustring() -> {} /// Converts a boxed wide C string slice into an wide C string without copying or /// allocating. /// /// # Examples /// /// ``` /// use widestring::U16CString; /// /// let v = vec![102u16, 111u16, 111u16]; // "foo" /// let c_string = U16CString::from_vec(v.clone()).unwrap(); /// let boxed = c_string.into_boxed_ucstr(); /// assert_eq!(boxed.into_ucstring(), U16CString::from_vec(v).unwrap()); /// ``` fn into_ucstring() -> {} /// Returns an object that implements [`Display`][std::fmt::Display] for printing /// strings that may contain non-Unicode data. /// /// A wide C string might data of any encoding. This function assumes the string is encoded in /// UTF-16, and returns a struct implements the /// [`Display`][std::fmt::Display] trait in a way that decoding the string is lossy but /// no heap allocations are performed, such as by /// [`to_string_lossy`][Self::to_string_lossy]. /// /// By default, invalid Unicode data is replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). If you wish /// to simply skip any invalid Uncode data and forego the replacement, you may use the /// [alternate formatting][std::fmt#sign0] with `{:#}`. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::U16CStr; /// /// // ๐„žmusic /// let s = U16CStr::from_slice(&[ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, 0x0000, /// ]).unwrap(); /// /// assert_eq!(format!("{}", s.display()), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` /// /// Using alternate formatting style to skip invalid values entirely: /// /// ``` /// use widestring::U16CStr; /// /// // ๐„žmusic /// let s = U16CStr::from_slice(&[ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, 0x0000, /// ]).unwrap(); /// /// assert_eq!(format!("{:#}", s.display()), /// "๐„žmusic" /// ); /// ``` fn display() -> {} } ucstr_common_impl! { /// C-style 32-bit wide string slice for [`U32CString`][crate::U32CString]. /// /// [`U32CStr`] is to [`U32CString`][crate::U32CString] as [`CStr`][std::ffi::CStr] is to /// [`CString`][std::ffi::CString]. /// /// [`U32CStr`] are string slices that do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-32 data, they may be used for /// any wide encoded string. /// /// # Nul termination /// /// [`U32CStr`] is aware of nul (`0`) values. Unless unchecked conversions are used, all /// [`U32CStr`] strings end with a nul-terminator in the underlying buffer and contain no /// internal nul values. These strings are intended to be used with C FFI functions that /// require nul-terminated strings. /// /// Because of the nul termination requirement, multiple classes methods for provided for /// construction a [`U32CStr`] under various scenarios. By default, methods such as /// [`from_ptr`][Self::from_ptr] and [`from_slice`][Self::from_slice] return an error if the /// input does not terminate with a nul value, or if it contains any interior nul values before /// the terminator. /// /// `_truncate` methods on the other hand, such as /// [`from_ptr_truncate`][Self::from_ptr_truncate] and /// [`from_slice_truncate`][Self::from_slice_truncate], construct a slice that terminates with /// the first nul value encountered in the string, only returning an error if the slice contains /// no nul values at all. Use this to mimic the behavior of C functions such as `strlen` when /// you don't know if the input is clean of interior nuls. /// /// Finally, unsafe `_unchecked` variants of these methods, such as /// [`from_ptr_unchecked`][Self::from_ptr_unchecked] and /// [`from_slice_unchecked`][Self::from_slice_unchecked] allow bypassing any checks for nul /// values, when the input has already been ensured to have a nul terminator and no interior /// nul values. /// /// # Examples /// /// The easiest way to use [`U32CStr`] outside of FFI is with the [`u32cstr!`][crate::u32cstr] /// macro to convert string literals into nul-terminated UTF-32 string slices at compile time: /// /// ``` /// use widestring::u32cstr; /// let hello = u32cstr!("Hello, world!"); /// ``` /// /// You can also convert any [`u32`] slice directly, as long as it has a nul terminator: /// /// ``` /// use widestring::{u32cstr, U32CStr}; /// /// let sparkle_heart = [0x1f496, 0x0]; /// let sparkle_heart = U32CStr::from_slice(&sparkle_heart).unwrap(); /// /// assert_eq!(u32cstr!("๐Ÿ’–"), sparkle_heart); /// /// // This UTf-16 surrogate is invalid UTF-32, but is perfectly valid in U32CStr /// let malformed_utf32 = [0xd83d, 0x0]; /// let s = U32CStr::from_slice(&malformed_utf32).unwrap(); /// /// assert_eq!(s.len(), 1); /// ``` /// /// When working with a FFI, it is useful to create a [`U32CStr`] from a pointer: /// /// ``` /// use widestring::{u32cstr, U32CStr}; /// /// let sparkle_heart = [0x1f496, 0x0]; /// let s = unsafe { /// // Note the string and pointer length does not include the nul terminator /// U32CStr::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len() - 1).unwrap() /// }; /// assert_eq!(u32cstr!("๐Ÿ’–"), s); /// /// // Alternatively, if the length of the pointer is unknown but definitely terminates in nul, /// // a C-style string version can be used /// let s = unsafe { U32CStr::from_ptr_str(sparkle_heart.as_ptr()) }; /// /// assert_eq!(u32cstr!("๐Ÿ’–"), s); /// ``` struct U32CStr([u32]); type UCString = U32CString; type UStr = U32Str; type UString = U32String; /// Copies the string reference to a new owned wide string. /// /// The resulting wide string will **not** have a nul terminator. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let wcstr = U32CString::from_str("MyString").unwrap(); /// // Convert U32CString to a U32String /// let wstr = wcstr.to_ustring(); /// /// // U32CString will have a terminating nul /// let wcvec = wcstr.into_vec_with_nul(); /// assert_eq!(wcvec[wcvec.len()-1], 0); /// // The resulting U32String will not have the terminating nul /// let wvec = wstr.into_vec(); /// assert_ne!(wvec[wvec.len()-1], 0); /// ``` fn to_ustring() -> {} /// Converts a boxed wide C string slice into an owned wide C string without copying or /// allocating. /// /// # Examples /// /// ``` /// use widestring::U32CString; /// /// let v = vec![102u32, 111u32, 111u32]; // "foo" /// let c_string = U32CString::from_vec(v.clone()).unwrap(); /// let boxed = c_string.into_boxed_ucstr(); /// assert_eq!(boxed.into_ucstring(), U32CString::from_vec(v).unwrap()); /// ``` fn into_ucstring() -> {} /// Returns an object that implements [`Display`][std::fmt::Display] for printing /// strings that may contain non-Unicode data. /// /// A wide C string might data of any encoding. This function assumes the string is encoded in /// UTF-32, and returns a struct implements the /// [`Display`][std::fmt::Display] trait in a way that decoding the string is lossy but /// no heap allocations are performed, such as by /// [`to_string_lossy`][Self::to_string_lossy]. /// /// By default, invalid Unicode data is replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). If you wish /// to simply skip any invalid Uncode data and forego the replacement, you may use the /// [alternate formatting][std::fmt#sign0] with `{:#}`. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::U32CStr; /// /// // ๐„žmusic /// let s = U32CStr::from_slice(&[ /// 0x1d11e, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, 0x0000, /// ]).unwrap(); /// /// assert_eq!(format!("{}", s.display()), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` /// /// Using alternate formatting style to skip invalid values entirely: /// /// ``` /// use widestring::U32CStr; /// /// // ๐„žmusic /// let s = U32CStr::from_slice(&[ /// 0x1d11e, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, 0x0000, /// ]).unwrap(); /// /// assert_eq!(format!("{:#}", s.display()), /// "๐„žmusic" /// ); /// ``` fn display() -> {} } impl U16CStr { /// Copys a string to an owned [`OsString`][std::ffi::OsString]. /// /// This makes a string copy of the [`U16CStr`]. Since [`U16CStr`] makes no guarantees that it /// is valid UTF-16, there is no guarantee that the resulting [`OsString`][std::ffi::OsString] /// will be valid data. The [`OsString`][std::ffi::OsString] will **not** have a nul /// terminator. /// /// Note that the encoding of [`OsString`][std::ffi::OsString] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms (such as /// windows) no changes to the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// use std::ffi::OsString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16CString::from_str(s).unwrap(); /// // Create an OsString from the wide string /// let osstr = wstr.to_os_string(); /// /// assert_eq!(osstr, OsString::from(s)); /// ``` #[inline] #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[must_use] pub fn to_os_string(&self) -> std::ffi::OsString { crate::platform::os_from_wide(self.as_slice()) } /// Copies the string to a [`String`] if it contains valid UTF-16 data. /// /// This method assumes this string is encoded as UTF-16 and attempts to decode it as such. It /// will **not* have a nul terminator. /// /// # Errors /// /// Returns an error if the string contains any invalid UTF-16 data. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16CString::from_str(s).unwrap(); /// // Create a regular string from the wide string /// let s2 = wstr.to_string().unwrap(); /// /// assert_eq!(s2, s); /// ``` #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub fn to_string(&self) -> Result { self.as_ustr().to_string() } /// Decodes the string reference to a [`String`] even if it is invalid UTF-16 data. /// /// This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any /// invalid sequences are replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which looks like this: /// ๏ฟฝ. It will **not* have a nul terminator. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16CString::from_str(s).unwrap(); /// // Create a regular string from the wide string /// let s2 = wstr.to_string_lossy(); /// /// assert_eq!(s2, s); /// ``` #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_string_lossy(&self) -> String { String::from_utf16_lossy(self.as_slice()) } /// Returns an iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-16. Since it /// may consist of invalid UTF-16, the iterator returned by this method /// is an iterator over `Result` instead of [`char`][prim@char]s /// directly. If you would like a lossy iterator over [`chars`][prim@char]s directly, instead /// use [`chars_lossy`][Self::chars_lossy]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars(&self) -> CharsUtf16<'_> { CharsUtf16::new(self.as_slice()) } /// Returns a lossy iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-16. Since it /// may consist of invalid UTF-16, the iterator returned by this method will replace unpaired /// surrogates with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). This is a lossy /// version of [`chars`][Self::chars]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars_lossy(&self) -> CharsLossyUtf16<'_> { CharsLossyUtf16::new(self.as_slice()) } /// Returns an iterator over the chars of a string slice, and their positions. /// /// As this string has no defined encoding, this method assumes the string is UTF-16. Since it /// may consist of invalid UTF-16, the iterator returned by this method is an iterator over /// is an iterator over `Result` as well as their positions, instead of /// [`char`][prim@char]s directly. If you would like a lossy indices iterator over /// [`chars`][prim@char]s directly, instead use /// [`char_indices_lossy`][Self::char_indices_lossy]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices(&self) -> CharIndicesUtf16<'_> { CharIndicesUtf16::new(self.as_slice()) } /// Returns a lossy iterator over the chars of a string slice, and their positions. /// /// As this string slice may consist of invalid UTF-16, the iterator returned by this method /// will replace unpaired surrogates with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ), as well as the /// positions of all characters. This is a lossy version of /// [`char_indices`][Self::char_indices]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf16<'_> { CharIndicesLossyUtf16::new(self.as_slice()) } } impl U32CStr { /// Constructs a string reference from a [`char`] nul-terminated string pointer. /// /// This will scan for nul values beginning with `p`. The first nul value will be used as the /// nul terminator for the string, similar to how libc string functions such as `strlen` work. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or has a /// nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// In particular, the returned string reference *must not be mutated* for the duration of /// lifetime `'a`, except inside an [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr_str<'a>(p: *const char) -> &'a Self { Self::from_ptr_str(p as *const u32) } /// Constructs a mutable string reference from a mutable [`char`] nul-terminated string pointer. /// /// This will scan for nul values beginning with `p`. The first nul value will be used as the /// nul terminator for the string, similar to how libc string functions such as `strlen` work. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or has a /// nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr_str_mut<'a>(p: *mut char) -> &'a mut Self { Self::from_ptr_str_mut(p as *mut u32) } /// Constructs a string reference from a [`char`] pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and means /// that `p` is a pointer directly to the nul terminator of the string. /// /// # Errors /// /// This will scan the pointer string for an interior nul value and error if one is found /// before the nul terminator at `len` offset. To avoid scanning for interior nuls, /// [`from_ptr_unchecked`][Self::from_ptr_unchecked] may be used instead. /// /// An error is returned if the value at `len` offset is not a nul terminator. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len + /// 1` elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// In particular, the returned string reference *must not be mutated* for the duration of /// lifetime `'a`, except inside an [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. pub unsafe fn from_char_ptr<'a>(p: *const char, len: usize) -> Result<&'a Self, NulError> { Self::from_ptr(p as *const u32, len) } /// Constructs a mutable string reference from a mutable [`char`] pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and means /// that `p` is a pointer directly to the nul terminator of the string. /// /// # Errors /// /// This will scan the pointer string for an interior nul value and error if one is found /// before the nul terminator at `len` offset. To avoid scanning for interior nuls, /// [`from_ptr_unchecked_mut`][Self::from_ptr_unchecked_mut] may be used instead. /// /// An error is returned if the value at `len` offset is not a nul terminator. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len + /// 1` elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. pub unsafe fn from_char_ptr_mut<'a>( p: *mut char, len: usize, ) -> Result<&'a mut Self, NulError> { Self::from_ptr_mut(p as *mut u32, len) } /// Constructs a string reference from a [`char`] pointer and a length, truncating at the first /// nul terminator. /// /// The `len` argument is the number of elements, **not** the number of bytes. This will scan /// for nul values beginning with `p` until offset `len`. The first nul value will be used as /// the nul terminator for the string, ignoring any remaining values left before `len`. /// /// # Errors /// /// If no nul terminator is found after `len` + 1 elements, an error is returned. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or has a /// nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// In particular, the returned string reference *must not be mutated* for the duration of /// lifetime `'a`, except inside an [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. pub unsafe fn from_char_ptr_truncate<'a>( p: *const char, len: usize, ) -> Result<&'a Self, MissingNulTerminator> { Self::from_ptr_truncate(p as *const u32, len) } /// Constructs a mutable string reference from a mutable [`char`] pointer and a length, /// truncating at the first nul terminator. /// /// The `len` argument is the number of elements, **not** the number of bytes. This will scan /// for nul values beginning with `p` until offset `len`. The first nul value will be used as /// the nul terminator for the string, ignoring any remaining values left before `len`. /// /// # Errors /// /// If no nul terminator is found after `len` + 1 elements, an error is returned. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or has a /// nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. pub unsafe fn from_char_ptr_truncate_mut<'a>( p: *mut char, len: usize, ) -> Result<&'a mut Self, MissingNulTerminator> { Self::from_ptr_truncate_mut(p as *mut u32, len) } /// Constructs a string reference from a [`char`] pointer and a length without checking for any /// nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and means /// that `p` is a pointer directly to the nul terminator of the string. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len + /// 1` elements, nor that is has a terminating nul value. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// In particular, the returned string reference *must not be mutated* for the duration of /// lifetime `'a`, except inside an [`UnsafeCell`][std::cell::UnsafeCell]. /// /// The interior values of the pointer are not scanned for nul. Any interior nul values or /// a missing nul terminator at pointer offset `len` + 1 will result in an invalid string slice. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr_unchecked<'a>(p: *const char, len: usize) -> &'a Self { Self::from_ptr_unchecked(p as *const u32, len) } /// Constructs a mutable string reference from a mutable [`char`] pointer and a length without /// checking for any nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. Thus, a `len` of 0 is valid and means /// that `p` is a pointer directly to the nul terminator of the string. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len + /// 1` elements, nor that is has a terminating nul value. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts_mut]. /// /// The interior values of the pointer are not scanned for nul. Any interior nul values or /// a missing nul terminator at pointer offset `len` + 1 will result in an invalid string slice. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr_unchecked_mut<'a>(p: *mut char, len: usize) -> &'a mut Self { Self::from_ptr_unchecked_mut(p as *mut u32, len) } /// Constructs a string reference from a [`char`] slice with a terminating nul, checking for /// invalid interior nul values. /// /// The slice must have at least one item, the nul terminator, even for an empty string. /// /// # Errors /// /// If there are nul values in the slice except for the last value, an error is returned. /// /// An error is also returned if the last value of the slice is not a nul terminator. pub fn from_char_slice(slice: &[char]) -> Result<&Self, NulError> { let ptr: *const [char] = slice; Self::from_slice(unsafe { &*(ptr as *const [u32]) }) } /// Constructs a mutable string reference from a mutable [`char`] slice with a terminating nul, /// checking for invalid interior nul values. /// /// The slice must have at least one item, the nul terminator, even for an empty string. /// /// # Errors /// /// If there are nul values in the slice except for the last value, an error is returned. /// /// An error is also returned if the last value of the slice is not a nul terminator. pub fn from_char_slice_mut(slice: &mut [char]) -> Result<&mut Self, NulError> { let ptr: *mut [char] = slice; Self::from_slice_mut(unsafe { &mut *(ptr as *mut [u32]) }) } /// Constructs a string reference from a slice of [`char`] values, truncating at the first nul /// terminator. /// /// The slice will be scanned for nul values. When a nul value is found, it is treated as the /// terminator for the string, and the string slice will be truncated to that nul. /// /// # Errors /// /// If there are no nul values in the slice, an error is returned. #[inline] pub fn from_char_slice_truncate(slice: &[char]) -> Result<&Self, MissingNulTerminator> { let ptr: *const [char] = slice; Self::from_slice_truncate(unsafe { &*(ptr as *const [u32]) }) } /// Constructs a mutable string reference from a mutable slice of [`char`] values, truncating at /// the first nul terminator. /// /// The slice will be scanned for nul values. When a nul value is found, it is treated as the /// terminator for the string, and the string slice will be truncated to that nul. /// /// # Errors /// /// If there are no nul values in the slice, an error is returned. #[inline] pub fn from_char_slice_truncate_mut( slice: &mut [char], ) -> Result<&mut Self, MissingNulTerminator> { let ptr: *mut [char] = slice; Self::from_slice_truncate_mut(unsafe { &mut *(ptr as *mut [u32]) }) } /// Constructs a string reference from a [`char`] slice without checking for a terminating or /// interior nul values. /// /// # Safety /// /// This function is unsafe because it can lead to invalid C string slice values when the slice /// is missing a terminating nul value or there are non-terminating interior nul values /// in the slice. In particular, an empty slice will result in an invalid string slice. #[inline] #[must_use] pub unsafe fn from_char_slice_unchecked(slice: &[char]) -> &Self { let ptr: *const [char] = slice; Self::from_slice_unchecked(&*(ptr as *const [u32])) } /// Constructs a mutable string reference from a mutable [`char`] slice without checking for a /// terminating or interior nul values. /// /// # Safety /// /// This function is unsafe because it can lead to invalid C string slice values when the slice /// is missing a terminating nul value or there are non-terminating interior nul values /// in the slice. In particular, an empty slice will result in an invalid string slice. #[inline] #[must_use] pub unsafe fn from_char_slice_unchecked_mut(slice: &mut [char]) -> &mut Self { let ptr: *mut [char] = slice; Self::from_slice_unchecked_mut(&mut *(ptr as *mut [u32])) } /// Decodes a string reference to an owned [`OsString`][std::ffi::OsString]. /// /// This makes a string copy of this reference. Since [`U32CStr`] makes no guarantees that it /// is valid UTF-32, there is no guarantee that the resulting [`OsString`][std::ffi::OsString] /// will be valid data. The [`OsString`][std::ffi::OsString] will **not** have a nul /// terminator. /// /// Note that the encoding of [`OsString`][std::ffi::OsString] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms no changes to /// the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// use std::ffi::OsString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32CString::from_str(s).unwrap(); /// // Create an OsString from the wide string /// let osstr = wstr.to_os_string(); /// /// assert_eq!(osstr, OsString::from(s)); /// ``` #[inline] #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[must_use] pub fn to_os_string(&self) -> std::ffi::OsString { self.as_ustr().to_os_string() } /// Decodes the string reference to a [`String`] if it contains valid UTF-32 data. /// /// This method assumes this string is encoded as UTF-32 and attempts to decode it as such. It /// will **not* have a nul terminator. /// /// # Errors /// /// Returns an error if the string contains any invalid UTF-32 data. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32CString::from_str(s).unwrap(); /// // Create a regular string from the wide string /// let s2 = wstr.to_string().unwrap(); /// /// assert_eq!(s2, s); /// ``` #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub fn to_string(&self) -> Result { self.as_ustr().to_string() } /// Decodes the string reference to a [`String`] even if it is invalid UTF-32 data. /// /// This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any /// invalid sequences are replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which looks like this: /// ๏ฟฝ. It will **not* have a nul terminator. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32CString::from_str(s).unwrap(); /// // Create a regular string from the wide string /// let s2 = wstr.to_string_lossy(); /// /// assert_eq!(s2, s); /// ``` #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_string_lossy(&self) -> String { self.as_ustr().to_string_lossy() } /// Returns an iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-32. Since it /// may consist of invalid UTF-32, the iterator returned by this method /// is an iterator over `Result` instead of [`char`][prim@char]s /// directly. If you would like a lossy iterator over [`chars`][prim@char]s directly, instead /// use [`chars_lossy`][Self::chars_lossy]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars(&self) -> CharsUtf32<'_> { CharsUtf32::new(self.as_slice()) } /// Returns a lossy iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-32. Since it /// may consist of invalid UTF-32, the iterator returned by this method will replace invalid /// data with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). This is a lossy /// version of [`chars`][Self::chars]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars_lossy(&self) -> CharsLossyUtf32<'_> { CharsLossyUtf32::new(self.as_slice()) } /// Returns an iterator over the chars of a string slice, and their positions. /// /// As this string has no defined encoding, this method assumes the string is UTF-32. Since it /// may consist of invalid UTF-32, the iterator returned by this method is an iterator over /// `Result` as well as their positions, instead of /// [`char`][prim@char]s directly. If you would like a lossy indices iterator over /// [`chars`][prim@char]s directly, instead use /// [`char_indices_lossy`][Self::char_indices_lossy]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices(&self) -> CharIndicesUtf32<'_> { CharIndicesUtf32::new(self.as_slice()) } /// Returns a lossy iterator over the chars of a string slice, and their positions. /// /// As this string slice may consist of invalid UTF-32, the iterator returned by this method /// will replace invalid values with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ), as well as the /// positions of all characters. This is a lossy version of /// [`char_indices`][Self::char_indices]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf32<'_> { CharIndicesLossyUtf32::new(self.as_slice()) } } impl core::fmt::Debug for U16CStr { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u16(self.as_slice_with_nul(), f) } } impl core::fmt::Debug for U32CStr { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u32(self.as_slice_with_nul(), f) } } /// Alias for [`U16CStr`] or [`U32CStr`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(not(windows))] pub type WideCStr = U32CStr; /// Alias for [`U16CStr`] or [`U32CStr`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(windows)] pub type WideCStr = U16CStr; /// Helper struct for printing wide C string values with [`format!`] and `{}`. /// /// A wide C string might contain ill-formed UTF encoding. This struct implements the /// [`Display`][std::fmt::Display] trait in a way that decoding the string is lossy but no heap /// allocations are performed, such as by [`to_string_lossy`][U16CStr::to_string_lossy]. It is /// created by the [`display`][U16CStr::display] method on [`U16CStr`] and [`U32CStr`]. /// /// By default, invalid Unicode data is replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). If you wish to simply /// skip any invalid Uncode data and forego the replacement, you may use the /// [alternate formatting][std::fmt#sign0] with `{:#}`. pub struct Display<'a, S: ?Sized> { str: &'a S, } impl<'a> core::fmt::Debug for Display<'a, U16CStr> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(&self.str, f) } } impl<'a> core::fmt::Debug for Display<'a, U32CStr> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(&self.str, f) } } impl<'a> core::fmt::Display for Display<'a, U16CStr> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { for c in crate::decode_utf16_lossy(self.str.as_slice().iter().copied()) { // Allow alternate {:#} format which skips replacment chars entirely if c != core::char::REPLACEMENT_CHARACTER || !f.alternate() { f.write_char(c)?; } } Ok(()) } } impl<'a> core::fmt::Display for Display<'a, U32CStr> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { for c in crate::decode_utf32_lossy(self.str.as_slice().iter().copied()) { // Allow alternate {:#} format which skips replacment chars entirely if c != core::char::REPLACEMENT_CHARACTER || !f.alternate() { f.write_char(c)?; } } Ok(()) } } widestring-1.1.0/src/ucstring.rs000064400000000000000000001760251046102023000150100ustar 00000000000000//! C-style owned, growable wide strings. //! //! This module contains wide C strings and related types. use crate::{error::ContainsNul, U16CStr, U16Str, U16String, U32CStr, U32Str, U32String}; #[allow(unused_imports)] use alloc::{ borrow::{Cow, ToOwned}, boxed::Box, vec::Vec, }; use core::{ borrow::{Borrow, BorrowMut}, cmp, mem::{self, ManuallyDrop}, ops::{Deref, DerefMut, Index}, ptr, slice::{self, SliceIndex}, }; macro_rules! ucstring_common_impl { { $(#[$ucstring_meta:meta])* struct $ucstring:ident([$uchar:ty]); type UCStr = $ucstr:ident; type UString = $ustring:ident; type UStr = $ustr:ident; $(#[$from_vec_meta:meta])* fn from_vec() -> {} $(#[$from_vec_truncate_meta:meta])* fn from_vec_truncate() -> {} $(#[$into_boxed_ucstr_meta:meta])* fn into_boxed_ucstr() -> {} } => { $(#[$ucstring_meta])* #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct $ucstring { pub(crate) inner: Box<[$uchar]>, } impl $ucstring { /// The nul terminator character value. pub const NUL_TERMINATOR: $uchar = 0; /// Constructs a new empty wide C string. #[inline] #[must_use] pub fn new() -> Self { unsafe { Self::from_vec_unchecked(Vec::new()) } } $(#[$from_vec_meta])* pub fn from_vec(v: impl Into>) -> Result> { let v = v.into(); // Check for nul vals, ignoring nul terminator match v.iter().position(|&val| val == Self::NUL_TERMINATOR) { None => Ok(unsafe { Self::from_vec_unchecked(v) }), Some(pos) if pos == v.len() - 1 => Ok(unsafe { Self::from_vec_unchecked(v) }), Some(pos) => Err(ContainsNul::new(pos, v)), } } $(#[$from_vec_truncate_meta])* #[must_use] pub fn from_vec_truncate(v: impl Into>) -> Self { let mut v = v.into(); // Check for nul vals if let Some(pos) = v.iter().position(|&val| val == Self::NUL_TERMINATOR) { v.truncate(pos + 1); } unsafe { Self::from_vec_unchecked(v) } } /// Constructs a wide C string from a vector without checking for interior nul values. /// /// A terminating nul value will be appended if the vector does not already have a /// terminating nul. /// /// # Safety /// /// This method is equivalent to [`from_vec`][Self::from_vec] except that no runtime /// assertion is made that `v` contains no interior nul values. Providing a vector with /// any nul values that are not the last value in the vector will result in an invalid /// C string. #[must_use] pub unsafe fn from_vec_unchecked(v: impl Into>) -> Self { let mut v = v.into(); match v.last() { None => v.push(Self::NUL_TERMINATOR), Some(&c) if c != Self::NUL_TERMINATOR => v.push(Self::NUL_TERMINATOR), Some(_) => (), } Self { inner: v.into_boxed_slice(), } } /// Constructs a wide C string from anything that can be converted to a wide string /// slice. /// /// The string will be scanned for invalid interior nul values. /// /// # Errors /// /// This function will return an error if the data contains a nul value that is not the /// terminating nul. /// The returned error will contain a [`Vec`] as well as the position of the nul value. #[inline] pub fn from_ustr(s: impl AsRef<$ustr>) -> Result> { Self::from_vec(s.as_ref().as_slice()) } /// Constructs a wide C string from anything that can be converted to a wide string /// slice, truncating at the first nul terminator. /// /// The string will be truncated at the first nul value in the string. #[inline] #[must_use] pub fn from_ustr_truncate(s: impl AsRef<$ustr>) -> Self { Self::from_vec_truncate(s.as_ref().as_slice()) } /// Constructs a wide C string from anything that can be converted to a wide string /// slice, without scanning for invalid nul values. /// /// # Safety /// /// This method is equivalent to [`from_ustr`][Self::from_ustr] except that no runtime /// assertion is made that `v` contains no interior nul values. Providing a string with /// any nul values that are not the last value in the vector will result in an invalid /// C string. #[inline] #[must_use] pub unsafe fn from_ustr_unchecked(s: impl AsRef<$ustr>) -> Self { Self::from_vec_unchecked(s.as_ref().as_slice()) } /// Constructs a new wide C string copied from a nul-terminated string pointer. /// /// This will scan for nul values beginning with `p`. The first nul value will be used /// as the nul terminator for the string, similar to how libc string functions such as /// `strlen` work. /// /// If you wish to avoid copying the string pointer, use [`U16CStr::from_ptr_str`] or /// [`U32CStr::from_ptr_str`] instead. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or /// has a nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_ptr_str(p: *const $uchar) -> Self { $ucstr::from_ptr_str(p).to_ucstring() } /// Constructs a wide C string copied from a pointer and a length, checking for invalid /// interior nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. If `len` is `0`, `p` is allowed to /// be a null pointer. /// /// The resulting string will always be nul-terminated even if the pointer data is not. /// /// # Errors /// /// This will scan the pointer string for an interior nul value and error if one is /// found. To avoid scanning for interior nuls, /// [`from_ptr_unchecked`][Self::from_ptr_unchecked] may be used instead. /// The returned error will contain a [`Vec`] as well as the position of the nul value. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. pub unsafe fn from_ptr( p: *const $uchar, len: usize, ) -> Result> { if len == 0 { return Ok(Self::default()); } assert!(!p.is_null()); let slice = slice::from_raw_parts(p, len); Self::from_vec(slice) } /// Constructs a wide C string copied from a pointer and a length, truncating at the /// first nul terminator. /// /// The `len` argument is the number of elements, **not** the number of bytes. This will /// scan for nul values beginning with `p` until offset `len`. The first nul value will /// be used as the nul terminator for the string, ignoring any remaining values left /// before `len`. If no nul value is found, the whole string of length `len` is used, /// and a new nul-terminator will be added to the resulting string. If `len` is `0`, `p` /// is allowed to be a null pointer. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[must_use] pub unsafe fn from_ptr_truncate(p: *const $uchar, len: usize) -> Self { if len == 0 { return Self::default(); } assert!(!p.is_null()); let slice = slice::from_raw_parts(p, len); Self::from_vec_truncate(slice) } /// Constructs a wide C string copied from a pointer and a length without checking for /// any nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. If `len` is `0`, `p` is allowed to /// be a null pointer. /// /// The resulting string will always be nul-terminated even if the pointer data is not. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. /// /// The interior values of the pointer are not scanned for nul. Any interior nul values /// or will result in an invalid C string. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[must_use] pub unsafe fn from_ptr_unchecked(p: *const $uchar, len: usize) -> Self { if len == 0 { return Self::default(); } assert!(!p.is_null()); let slice = slice::from_raw_parts(p, len); Self::from_vec_unchecked(slice) } /// Converts to a wide C string slice. #[inline] #[must_use] pub fn as_ucstr(&self) -> &$ucstr { $ucstr::from_inner(&self.inner) } /// Converts to a mutable wide C string slice. #[inline] #[must_use] pub fn as_mut_ucstr(&mut self) -> &mut $ucstr { $ucstr::from_inner_mut(&mut self.inner) } /// Converts this string into a wide string without a nul terminator. /// /// The resulting string will **not** contain a nul-terminator, and will contain no /// other nul values. #[inline] #[must_use] pub fn into_ustring(self) -> $ustring { $ustring::from_vec(self.into_vec()) } /// Converts this string into a wide string with a nul terminator. /// /// The resulting vector will contain a nul-terminator and no interior nul values. #[inline] #[must_use] pub fn into_ustring_with_nul(self) -> $ustring { $ustring::from_vec(self.into_vec_with_nul()) } /// Converts the string into a [`Vec`] without a nul terminator, consuming the string in /// the process. /// /// The resulting vector will **not** contain a nul-terminator, and will contain no /// other nul values. #[inline] #[must_use] pub fn into_vec(self) -> Vec<$uchar> { let mut v = self.into_inner().into_vec(); v.pop(); v } /// Converts the string into a [`Vec`], consuming the string in the process. /// /// The resulting vector will contain a nul-terminator and no interior nul values. #[inline] #[must_use] pub fn into_vec_with_nul(self) -> Vec<$uchar> { self.into_inner().into_vec() } /// Transfers ownership of the string to a C caller. /// /// # Safety /// /// The pointer _must_ be returned to Rust and reconstituted using /// [`from_raw`][Self::from_raw] to be properly deallocated. Specifically, one should /// _not_ use the standard C `free` function to deallocate this string. Failure to call /// [`from_raw`][Self::from_raw] will lead to a memory leak. #[inline] #[must_use] pub fn into_raw(self) -> *mut $uchar { Box::into_raw(self.into_inner()) as *mut $uchar } /// Retakes ownership of a wide C string that was transferred to C. /// /// This should only be used in combination with [`into_raw`][Self::into_raw]. To /// construct a new wide C string from a pointer, use /// [`from_ptr_str`][Self::from_ptr_str]. /// /// # Safety /// /// This should only ever be called with a pointer that was earlier obtained by calling /// [`into_raw`][Self::into_raw]. Additionally, the length of the string will be /// recalculated from the pointer by scanning for the nul-terminator. /// /// # Panics /// /// Panics if `p` is a null pointer. #[must_use] pub unsafe fn from_raw(p: *mut $uchar) -> Self { assert!(!p.is_null()); let mut i: isize = 0; while *p.offset(i) != Self::NUL_TERMINATOR { i += 1; } let slice = slice::from_raw_parts_mut(p, i as usize + 1); Self { inner: Box::from_raw(slice), } } $(#[$into_boxed_ucstr_meta])* #[inline] #[must_use] pub fn into_boxed_ucstr(self) -> Box<$ucstr> { unsafe { Box::from_raw(Box::into_raw(self.into_inner()) as *mut $ucstr) } } /// Bypass "move out of struct which implements [`Drop`] trait" restriction. fn into_inner(self) -> Box<[$uchar]> { let v = ManuallyDrop::new(self); unsafe { ptr::read(&v.inner) } } } impl AsMut<$ucstr> for $ucstring { fn as_mut(&mut self) -> &mut $ucstr { self.as_mut_ucstr() } } impl AsRef<$ucstr> for $ucstring { #[inline] fn as_ref(&self) -> &$ucstr { self.as_ucstr() } } impl AsRef<[$uchar]> for $ucstring { #[inline] fn as_ref(&self) -> &[$uchar] { self.as_slice() } } impl AsRef<$ustr> for $ucstring { #[inline] fn as_ref(&self) -> &$ustr { self.as_ustr() } } impl Borrow<$ucstr> for $ucstring { #[inline] fn borrow(&self) -> &$ucstr { self.as_ucstr() } } impl BorrowMut<$ucstr> for $ucstring { #[inline] fn borrow_mut(&mut self) -> &mut $ucstr { self.as_mut_ucstr() } } impl Default for $ucstring { #[inline] fn default() -> Self { unsafe { Self::from_vec_unchecked(Vec::new()) } } } impl Deref for $ucstring { type Target = $ucstr; #[inline] fn deref(&self) -> &$ucstr { self.as_ucstr() } } impl DerefMut for $ucstring { #[inline] fn deref_mut(&mut self) -> &mut Self::Target { self.as_mut_ucstr() } } // Turns this `UCString` into an empty string to prevent // memory unsafe code from working by accident. Inline // to prevent LLVM from optimizing it away in debug builds. impl Drop for $ucstring { #[inline] fn drop(&mut self) { unsafe { *self.inner.get_unchecked_mut(0) = Self::NUL_TERMINATOR; } } } impl From<$ucstring> for Vec<$uchar> { #[inline] fn from(value: $ucstring) -> Self { value.into_vec() } } impl<'a> From<$ucstring> for Cow<'a, $ucstr> { #[inline] fn from(s: $ucstring) -> Cow<'a, $ucstr> { Cow::Owned(s) } } #[cfg(feature = "std")] impl From<$ucstring> for std::ffi::OsString { #[inline] fn from(s: $ucstring) -> std::ffi::OsString { s.to_os_string() } } impl From<$ucstring> for $ustring { #[inline] fn from(s: $ucstring) -> Self { s.to_ustring() } } impl<'a, T: ?Sized + AsRef<$ucstr>> From<&'a T> for $ucstring { #[inline] fn from(s: &'a T) -> Self { s.as_ref().to_ucstring() } } impl<'a> From<&'a $ucstr> for Cow<'a, $ucstr> { #[inline] fn from(s: &'a $ucstr) -> Cow<'a, $ucstr> { Cow::Borrowed(s) } } impl From> for $ucstring { #[inline] fn from(s: Box<$ucstr>) -> Self { s.into_ucstring() } } impl From<$ucstring> for Box<$ucstr> { #[inline] fn from(s: $ucstring) -> Box<$ucstr> { s.into_boxed_ucstr() } } impl Index for $ucstring where I: SliceIndex<[$uchar], Output = [$uchar]>, { type Output = $ustr; #[inline] fn index(&self, index: I) -> &Self::Output { &self.as_ucstr()[index] } } impl PartialEq<$ustr> for $ucstring { #[inline] fn eq(&self, other: &$ustr) -> bool { self.as_ucstr() == other } } impl PartialEq<$ucstr> for $ucstring { #[inline] fn eq(&self, other: &$ucstr) -> bool { self.as_ucstr() == other } } impl<'a> PartialEq<&'a $ustr> for $ucstring { #[inline] fn eq(&self, other: &&'a $ustr) -> bool { self.as_ucstr() == *other } } impl<'a> PartialEq<&'a $ucstr> for $ucstring { #[inline] fn eq(&self, other: &&'a $ucstr) -> bool { self.as_ucstr() == *other } } impl<'a> PartialEq> for $ucstring { #[inline] fn eq(&self, other: &Cow<'a, $ustr>) -> bool { self.as_ucstr() == other.as_ref() } } impl<'a> PartialEq> for $ucstring { #[inline] fn eq(&self, other: &Cow<'a, $ucstr>) -> bool { self.as_ucstr() == other.as_ref() } } impl PartialEq<$ustring> for $ucstring { #[inline] fn eq(&self, other: &$ustring) -> bool { self.as_ustr() == other.as_ustr() } } impl PartialEq<$ucstring> for $ustr { #[inline] fn eq(&self, other: &$ucstring) -> bool { self == other.as_ustr() } } impl PartialEq<$ucstring> for $ucstr { #[inline] fn eq(&self, other: &$ucstring) -> bool { self == other.as_ucstr() } } impl PartialEq<$ucstring> for &$ucstr { #[inline] fn eq(&self, other: &$ucstring) -> bool { self == other.as_ucstr() } } impl PartialEq<$ucstring> for &$ustr { #[inline] fn eq(&self, other: &$ucstring) -> bool { self == other.as_ucstr() } } impl PartialOrd<$ustr> for $ucstring { #[inline] fn partial_cmp(&self, other: &$ustr) -> Option { self.as_ucstr().partial_cmp(other) } } impl PartialOrd<$ucstr> for $ucstring { #[inline] fn partial_cmp(&self, other: &$ucstr) -> Option { self.as_ucstr().partial_cmp(other) } } impl<'a> PartialOrd<&'a $ustr> for $ucstring { #[inline] fn partial_cmp(&self, other: &&'a $ustr) -> Option { self.as_ucstr().partial_cmp(*other) } } impl<'a> PartialOrd<&'a $ucstr> for $ucstring { #[inline] fn partial_cmp(&self, other: &&'a $ucstr) -> Option { self.as_ucstr().partial_cmp(*other) } } impl<'a> PartialOrd> for $ucstring { #[inline] fn partial_cmp(&self, other: &Cow<'a, $ustr>) -> Option { self.as_ucstr().partial_cmp(other.as_ref()) } } impl<'a> PartialOrd> for $ucstring { #[inline] fn partial_cmp(&self, other: &Cow<'a, $ucstr>) -> Option { self.as_ucstr().partial_cmp(other.as_ref()) } } impl PartialOrd<$ustring> for $ucstring { #[inline] fn partial_cmp(&self, other: &$ustring) -> Option { self.as_ustr().partial_cmp(other.as_ustr()) } } impl ToOwned for $ucstr { type Owned = $ucstring; #[inline] fn to_owned(&self) -> $ucstring { self.to_ucstring() } } }; } ucstring_common_impl! { /// An owned, mutable C-style 16-bit wide string for FFI that is nul-aware and nul-terminated. /// /// The string slice of a [`U16CString`] is [`U16CStr`]. /// /// [`U16CString`] strings do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-16 data, they may be used for /// any wide encoded string. /// /// # Nul termination /// /// [`U16CString`] is aware of nul (`0`) values. Unless unchecked conversions are used, all /// [`U16CString`] strings end with a nul-terminator in the underlying buffer and contain no /// internal nul values. These strings are intended to be used with FFI functions that require /// nul-terminated strings. /// /// Because of the nul termination requirement, multiple classes methods for provided for /// construction a [`U16CString`] under various scenarios. By default, methods such as /// [`from_ptr`][Self::from_ptr] and [`from_vec`][Self::from_vec] return an error if it contains /// any interior nul values before the terminator. For these methods, the input does not need to /// contain the terminating nul; it is added if it is does not exist. /// /// `_truncate` methods on the other hand, such as /// [`from_ptr_truncate`][Self::from_ptr_truncate] and /// [`from_vec_truncate`][Self::from_vec_truncate], construct a string that terminates with /// the first nul value encountered in the string, and do not return an error. They /// automatically ensure the string is terminated in a nul value even if it was not originally. /// /// Finally, unsafe `_unchecked` variants of these methods, such as /// [`from_ptr_unchecked`][Self::from_ptr_unchecked] and /// [`from_vec_unchecked`][Self::from_vec_unchecked] allow bypassing any checks for nul /// values, when the input has already been ensured to no interior nul values. Again, any /// missing nul terminator is automatically added if necessary. /// /// # Examples /// /// The easiest way to use [`U16CString`] outside of FFI is with the /// [`u16cstr!`][crate::u16cstr] macro to convert string literals into nul-terminated UTF-16 /// strings at compile time: /// /// ``` /// use widestring::{u16cstr, U16CString}; /// let hello = U16CString::from(u16cstr!("Hello, world!")); /// ``` /// /// You can also convert any [`u16`] slice or vector directly: /// /// ``` /// use widestring::{u16cstr, U16CString}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; /// let sparkle_heart = U16CString::from_vec(sparkle_heart).unwrap(); /// // The string will add the missing nul terminator /// /// assert_eq!(u16cstr!("๐Ÿ’–"), sparkle_heart); /// /// // This unpaired UTf-16 surrogate is invalid UTF-16, but is perfectly valid in U16CString /// let malformed_utf16 = vec![0xd83d, 0x0]; /// let s = U16CString::from_vec(malformed_utf16).unwrap(); /// /// assert_eq!(s.len(), 1); // Note the terminating nul is not counted in the length /// ``` /// /// When working with a FFI, it is useful to create a [`U16CString`] from a pointer: /// /// ``` /// use widestring::{u16cstr, U16CString}; /// /// let sparkle_heart = [0xd83d, 0xdc96, 0x0]; /// let s = unsafe { /// // Note the string and pointer length does not include the nul terminator /// U16CString::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len() - 1).unwrap() /// }; /// assert_eq!(u16cstr!("๐Ÿ’–"), s); /// /// // Alternatively, if the length of the pointer is unknown but definitely terminates in nul, /// // a C-style string version can be used /// let s = unsafe { U16CString::from_ptr_str(sparkle_heart.as_ptr()) }; /// /// assert_eq!(u16cstr!("๐Ÿ’–"), s); /// ``` struct U16CString([u16]); type UCStr = U16CStr; type UString = U16String; type UStr = U16Str; /// Constructs a wide C string from a container of wide character data. /// /// This method will consume the provided data and use the underlying elements to /// construct a new string. The data will be scanned for invalid interior nul values. /// /// # Errors /// /// This function will return an error if the data contains a nul value that is not the /// terminating nul. /// The returned error will contain the original [`Vec`] as well as the position of the /// nul value. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let v = vec![84u16, 104u16, 101u16]; // 'T' 'h' 'e' /// # let cloned = v.clone(); /// // Create a wide string from the vector /// let wcstr = U16CString::from_vec(v).unwrap(); /// # assert_eq!(wcstr.into_vec(), cloned); /// ``` /// /// Empty vectors are valid and will return an empty string with a nul terminator: /// /// ``` /// use widestring::U16CString; /// let wcstr = U16CString::from_vec(vec![]).unwrap(); /// assert_eq!(wcstr, U16CString::default()); /// ``` /// /// The following example demonstrates errors from nul values in a vector. /// /// ```rust /// use widestring::U16CString; /// let v = vec![84u16, 0u16, 104u16, 101u16]; // 'T' NUL 'h' 'e' /// // Create a wide string from the vector /// let res = U16CString::from_vec(v); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 1); /// ``` fn from_vec() -> {} /// Constructs a wide C string from a container of wide character data, truncating at /// the first nul terminator. /// /// The string will be truncated at the first nul value in the data. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let v = vec![84u16, 104u16, 101u16, 0u16]; // 'T' 'h' 'e' NUL /// # let cloned = v[..3].to_owned(); /// // Create a wide string from the vector /// let wcstr = U16CString::from_vec_truncate(v); /// # assert_eq!(wcstr.into_vec(), cloned); /// ``` fn from_vec_truncate() -> {} /// Converts this wide C string into a boxed wide C string slice. /// /// # Examples /// /// ``` /// use widestring::{U16CString, U16CStr}; /// /// let mut v = vec![102u16, 111u16, 111u16]; // "foo" /// let c_string = U16CString::from_vec(v.clone()).unwrap(); /// let boxed = c_string.into_boxed_ucstr(); /// v.push(0); /// assert_eq!(&*boxed, U16CStr::from_slice(&v).unwrap()); /// ``` fn into_boxed_ucstr() -> {} } ucstring_common_impl! { /// An owned, mutable C-style 32-bit wide string for FFI that is nul-aware and nul-terminated. /// /// The string slice of a [`U32CString`] is [`U32CStr`]. /// /// [`U32CString`] strings do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-32 data, they may be used for /// any wide encoded string. /// /// # Nul termination /// /// [`U32CString`] is aware of nul (`0`) values. Unless unchecked conversions are used, all /// [`U32CString`] strings end with a nul-terminator in the underlying buffer and contain no /// internal nul values. These strings are intended to be used with FFI functions that require /// nul-terminated strings. /// /// Because of the nul termination requirement, multiple classes methods for provided for /// construction a [`U32CString`] under various scenarios. By default, methods such as /// [`from_ptr`][Self::from_ptr] and [`from_vec`][Self::from_vec] return an error if it contains /// any interior nul values before the terminator. For these methods, the input does not need to /// contain the terminating nul; it is added if it is does not exist. /// /// `_truncate` methods on the other hand, such as /// [`from_ptr_truncate`][Self::from_ptr_truncate] and /// [`from_vec_truncate`][Self::from_vec_truncate], construct a string that terminates with /// the first nul value encountered in the string, and do not return an error. They /// automatically ensure the string is terminated in a nul value even if it was not originally. /// /// Finally, unsafe `_unchecked` variants of these methods, such as /// [`from_ptr_unchecked`][Self::from_ptr_unchecked] and /// [`from_vec_unchecked`][Self::from_vec_unchecked] allow bypassing any checks for nul /// values, when the input has already been ensured to no interior nul values. Again, any /// missing nul terminator is automatically added if necessary. /// /// # Examples /// /// The easiest way to use [`U32CString`] outside of FFI is with the /// [`u32cstr!`][crate::u32cstr] macro to convert string literals into nul-terminated UTF-32 /// strings at compile time: /// /// ``` /// use widestring::{u32cstr, U32CString}; /// let hello = U32CString::from(u32cstr!("Hello, world!")); /// ``` /// /// You can also convert any [`u32`] slice or vector directly: /// /// ``` /// use widestring::{u32cstr, U32CString}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32CString::from_vec(sparkle_heart).unwrap(); /// // The string will add the missing nul terminator /// /// assert_eq!(u32cstr!("๐Ÿ’–"), sparkle_heart); /// /// // This UTf-16 surrogate is invalid UTF-32, but is perfectly valid in U32CString /// let malformed_utf32 = vec![0xd83d, 0x0]; /// let s = U32CString::from_vec(malformed_utf32).unwrap(); /// /// assert_eq!(s.len(), 1); // Note the terminating nul is not counted in the length /// ``` /// /// When working with a FFI, it is useful to create a [`U32CString`] from a pointer: /// /// ``` /// use widestring::{u32cstr, U32CString}; /// /// let sparkle_heart = [0x1f496, 0x0]; /// let s = unsafe { /// // Note the string and pointer length does not include the nul terminator /// U32CString::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len() - 1).unwrap() /// }; /// assert_eq!(u32cstr!("๐Ÿ’–"), s); /// /// // Alternatively, if the length of the pointer is unknown but definitely terminates in nul, /// // a C-style string version can be used /// let s = unsafe { U32CString::from_ptr_str(sparkle_heart.as_ptr()) }; /// /// assert_eq!(u32cstr!("๐Ÿ’–"), s); /// ``` struct U32CString([u32]); type UCStr = U32CStr; type UString = U32String; type UStr = U32Str; /// Constructs a wide C string from a container of wide character data. /// /// This method will consume the provided data and use the underlying elements to /// construct a new string. The data will be scanned for invalid interior nul values. /// /// # Errors /// /// This function will return an error if the data contains a nul value that is not the /// terminating nul. /// The returned error will contain the original [`Vec`] as well as the position of the /// nul value. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let v = vec![84u32, 104u32, 101u32]; // 'T' 'h' 'e' /// # let cloned = v.clone(); /// // Create a wide string from the vector /// let wcstr = U32CString::from_vec(v).unwrap(); /// # assert_eq!(wcstr.into_vec(), cloned); /// ``` /// /// Empty vectors are valid and will return an empty string with a nul terminator: /// /// ``` /// use widestring::U32CString; /// let wcstr = U32CString::from_vec(vec![]).unwrap(); /// assert_eq!(wcstr, U32CString::default()); /// ``` /// /// The following example demonstrates errors from nul values in a vector. /// /// ```rust /// use widestring::U32CString; /// let v = vec![84u32, 0u32, 104u32, 101u32]; // 'T' NUL 'h' 'e' /// // Create a wide string from the vector /// let res = U32CString::from_vec(v); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 1); /// ``` fn from_vec() -> {} /// Constructs a wide C string from a container of wide character data, truncating at /// the first nul terminator. /// /// The string will be truncated at the first nul value in the data. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let v = vec![84u32, 104u32, 101u32, 0u32]; // 'T' 'h' 'e' NUL /// # let cloned = v[..3].to_owned(); /// // Create a wide string from the vector /// let wcstr = U32CString::from_vec_truncate(v); /// # assert_eq!(wcstr.into_vec(), cloned); /// ``` fn from_vec_truncate() -> {} /// Converts this wide C string into a boxed wide C string slice. /// /// # Examples /// /// ``` /// use widestring::{U32CString, U32CStr}; /// /// let mut v = vec![102u32, 111u32, 111u32]; // "foo" /// let c_string = U32CString::from_vec(v.clone()).unwrap(); /// let boxed = c_string.into_boxed_ucstr(); /// v.push(0); /// assert_eq!(&*boxed, U32CStr::from_slice(&v).unwrap()); /// ``` fn into_boxed_ucstr() -> {} } impl U16CString { /// Constructs a [`U16CString`] copy from a [`str`], encoding it as UTF-16. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U16CString`] will also be valid UTF-16. /// /// The string will be scanned for nul values, which are invalid anywhere except the final /// character. /// /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Errors /// /// This function will return an error if the data contains a nul value anywhere except the /// final position. /// The returned error will contain a [`Vec`] as well as the position of the nul value. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = U16CString::from_str(s).unwrap(); /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` /// /// The following example demonstrates errors from nul values in a string. /// /// ```rust /// use widestring::U16CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let res = U16CString::from_str(s); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 2); /// ``` #[allow(clippy::should_implement_trait)] #[inline] pub fn from_str(s: impl AsRef) -> Result> { let v: Vec = s.as_ref().encode_utf16().collect(); Self::from_vec(v) } /// Constructs a [`U16CString`] copy from a [`str`], encoding it as UTF-16, without checking for /// interior nul values. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U16CString`] will also be valid UTF-16. /// /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Safety /// /// This method is equivalent to [`from_str`][Self::from_str] except that no runtime assertion /// is made that `s` contains no interior nul values. Providing a string with nul values that /// are not the last character will result in an invalid [`U16CString`]. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = unsafe { U16CString::from_str_unchecked(s) }; /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` #[inline] #[must_use] pub unsafe fn from_str_unchecked(s: impl AsRef) -> Self { let v: Vec = s.as_ref().encode_utf16().collect(); Self::from_vec_unchecked(v) } /// Constructs a [`U16CString`] copy from a [`str`], encoding it as UTF-16, truncating at the /// first nul terminator. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U16CString`] will also be valid UTF-16. /// /// The string will be truncated at the first nul value in the string. /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let wcstr = U16CString::from_str_truncate(s); /// assert_eq!(wcstr.to_string_lossy(), "My"); /// ``` #[inline] #[must_use] pub fn from_str_truncate(s: impl AsRef) -> Self { let v: Vec = s.as_ref().encode_utf16().collect(); Self::from_vec_truncate(v) } /// Constructs a [`U16CString`] copy from an [`OsStr`][std::ffi::OsStr]. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U16CString`] will be valid UTF-16. /// /// The string will be scanned for nul values, which are invalid anywhere except the final /// character. /// The resulting string will always be nul-terminated even if the original string is not. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms (such as /// windows) no changes to the string will be made. /// /// # Errors /// /// This function will return an error if the data contains a nul value anywhere except the /// last character. /// The returned error will contain a [`Vec`] as well as the position of the nul value. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = U16CString::from_os_str(s).unwrap(); /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` /// /// The following example demonstrates errors from nul values in the string. /// /// ```rust /// use widestring::U16CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let res = U16CString::from_os_str(s); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 2); /// ``` #[inline] #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] pub fn from_os_str(s: impl AsRef) -> Result> { let v = crate::platform::os_to_wide(s.as_ref()); Self::from_vec(v) } /// Constructs a [`U16CString`] copy from an [`OsStr`][std::ffi::OsStr], without checking for nul /// values. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U16CString`] will be valid UTF-16. /// /// The resulting string will always be nul-terminated even if the original string is not. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms (such as /// windows) no changes to the string will be made. /// /// # Safety /// /// This method is equivalent to [`from_os_str`][Self::from_os_str] except that no runtime /// assertion is made that `s` contains no interior nul values. Providing a string with nul /// values anywhere but the last character will result in an invalid [`U16CString`]. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = unsafe { U16CString::from_os_str_unchecked(s) }; /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[must_use] pub unsafe fn from_os_str_unchecked(s: impl AsRef) -> Self { let v = crate::platform::os_to_wide(s.as_ref()); Self::from_vec_unchecked(v) } /// Constructs a [`U16CString`] copy from an [`OsStr`][std::ffi::OsStr], truncating at the first /// nul terminator. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U16CString`] will be valid UTF-16. /// /// The string will be truncated at the first nul value in the string. /// The resulting string will always be nul-terminated even if the original string is not. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms (such as /// windows) no changes to the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U16CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let wcstr = U16CString::from_os_str_truncate(s); /// assert_eq!(wcstr.to_string_lossy(), "My"); /// ``` #[inline] #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[must_use] pub fn from_os_str_truncate(s: impl AsRef) -> Self { let v = crate::platform::os_to_wide(s.as_ref()); Self::from_vec_truncate(v) } } impl U32CString { /// Constructs a [`U32CString`] from a container of character data, checking for invalid nul /// values. /// /// This method will consume the provided data and use the underlying elements to construct a /// new string. The data will be scanned for invalid nul values anywhere except the last /// character. /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Errors /// /// This function will return an error if the data contains a nul value anywhere except the /// last character. /// The returned error will contain the [`Vec`] as well as the position of the nul value. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let v: Vec = "Test".chars().collect(); /// # let cloned: Vec = v.iter().map(|&c| c as u32).collect(); /// // Create a wide string from the vector /// let wcstr = U32CString::from_chars(v).unwrap(); /// # assert_eq!(wcstr.into_vec(), cloned); /// ``` /// /// The following example demonstrates errors from nul values in a vector. /// /// ```rust /// use widestring::U32CString; /// let v: Vec = "T\u{0}est".chars().collect(); /// // Create a wide string from the vector /// let res = U32CString::from_chars(v); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 1); /// ``` pub fn from_chars(v: impl Into>) -> Result> { let mut chars = v.into(); let v: Vec = unsafe { let ptr = chars.as_mut_ptr() as *mut u32; let len = chars.len(); let cap = chars.capacity(); mem::forget(chars); Vec::from_raw_parts(ptr, len, cap) }; Self::from_vec(v) } /// Constructs a [`U32CString`] from a container of character data, truncating at the first nul /// value. /// /// This method will consume the provided data and use the underlying elements to construct a /// new string. The string will be truncated at the first nul value in the string. /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let v: Vec = "Test\u{0}".chars().collect(); /// # let cloned: Vec = v[..4].iter().map(|&c| c as u32).collect(); /// // Create a wide string from the vector /// let wcstr = U32CString::from_chars_truncate(v); /// # assert_eq!(wcstr.into_vec(), cloned); /// ``` #[must_use] pub fn from_chars_truncate(v: impl Into>) -> Self { let mut chars = v.into(); let v: Vec = unsafe { let ptr = chars.as_mut_ptr() as *mut u32; let len = chars.len(); let cap = chars.capacity(); mem::forget(chars); Vec::from_raw_parts(ptr, len, cap) }; Self::from_vec_truncate(v) } /// Constructs a [`U32CString`] from character data without checking for nul values. /// /// A terminating nul value will be appended if the vector does not already have a terminating /// nul. /// /// # Safety /// /// This method is equivalent to [`from_chars`][Self::from_chars] except that no runtime /// assertion is made that `v` contains no interior nul values. Providing a vector with nul /// values anywhere but the last character will result in an invalid [`U32CString`]. #[must_use] pub unsafe fn from_chars_unchecked(v: impl Into>) -> Self { let mut chars = v.into(); let v: Vec = { let ptr = chars.as_mut_ptr() as *mut u32; let len = chars.len(); let cap = chars.capacity(); mem::forget(chars); Vec::from_raw_parts(ptr, len, cap) }; Self::from_vec_unchecked(v) } /// Constructs a [`U32CString`] copy from a [`str`], encoding it as UTF-32 and checking for /// invalid interior nul values. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U32CString`] will also be valid UTF-32. /// /// The string will be scanned for nul values, which are invalid anywhere except the last /// character. /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Errors /// /// This function will return an error if the data contains a nul value anywhere except the /// last character. /// The returned error will contain a [`Vec`] as well as the position of the nul value. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = U32CString::from_str(s).unwrap(); /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` /// /// The following example demonstrates errors from nul values in a string. /// /// ```rust /// use widestring::U32CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let res = U32CString::from_str(s); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 2); /// ``` #[allow(clippy::should_implement_trait)] #[inline] pub fn from_str(s: impl AsRef) -> Result> { let v: Vec = s.as_ref().chars().collect(); Self::from_chars(v) } /// Constructs a [`U32CString`] copy from a [`str`], encoding it as UTF-32, without checking for /// nul values. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U32CString`] will also be valid UTF-32. /// /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Safety /// /// This method is equivalent to [`from_str`][Self::from_str] except that no runtime assertion /// is made that `s` contains invalid nul values. Providing a string with nul values anywhere /// except the last character will result in an invalid [`U32CString`]. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = unsafe { U32CString::from_str_unchecked(s) }; /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` #[inline] #[must_use] pub unsafe fn from_str_unchecked(s: impl AsRef) -> Self { let v: Vec = s.as_ref().chars().collect(); Self::from_chars_unchecked(v) } /// Constructs a [`U32CString`] copy from a [`str`], encoding it as UTF-32, truncating at the /// first nul terminator. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U32CString`] will also be valid UTF-32. /// /// The string will be truncated at the first nul value in the string. /// The resulting string will always be nul-terminated even if the original string is not. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let wcstr = U32CString::from_str_truncate(s); /// assert_eq!(wcstr.to_string_lossy(), "My"); /// ``` #[inline] #[must_use] pub fn from_str_truncate(s: impl AsRef) -> Self { let v: Vec = s.as_ref().chars().collect(); Self::from_chars_truncate(v) } /// Constructs a new wide C string copied from a nul-terminated [`char`] string pointer. /// /// This will scan for nul values beginning with `p`. The first nul value will be used as the /// nul terminator for the string, similar to how libc string functions such as `strlen` work. /// /// If you wish to avoid copying the string pointer, use [`U32CStr::from_char_ptr_str`] instead. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid or has a /// nul terminator, and the function could scan past the underlying buffer. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr_str(p: *const char) -> Self { Self::from_ptr_str(p as *const u32) } /// Constructs a wide C string copied from a [`char`] pointer and a length, checking for invalid /// interior nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. If `len` is `0`, `p` is allowed to be a /// null pointer. /// /// The resulting string will always be nul-terminated even if the pointer data is not. /// /// # Errors /// /// This will scan the pointer string for an interior nul value and error if one is found. To /// avoid scanning for interior nuls, [`from_ptr_unchecked`][Self::from_ptr_unchecked] may be /// used instead. /// The returned error will contain a [`Vec`] as well as the position of the nul value. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` /// elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[inline] pub unsafe fn from_char_ptr(p: *const char, len: usize) -> Result> { Self::from_ptr(p as *const u32, len) } /// Constructs a wide C string copied from a [`char`] pointer and a length, truncating at the /// first nul terminator. /// /// The `len` argument is the number of elements, **not** the number of bytes. This will scan /// for nul values beginning with `p` until offset `len`. The first nul value will be used as /// the nul terminator for the string, ignoring any remaining values left before `len`. If no /// nul value is found, the whole string of length `len` is used, and a new nul-terminator /// will be added to the resulting string. If `len` is `0`, `p` is allowed to be a null pointer. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` /// elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[inline] #[must_use] pub unsafe fn from_char_ptr_truncate(p: *const char, len: usize) -> Self { Self::from_ptr_truncate(p as *const u32, len) } /// Constructs a wide C string copied from a [`char`] pointer and a length without checking for /// any nul values. /// /// The `len` argument is the number of elements, **not** the number of bytes, and does /// **not** include the nul terminator of the string. If `len` is `0`, `p` is allowed to be a /// null pointer. /// /// The resulting string will always be nul-terminated even if the pointer data is not. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` /// elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// /// The interior values of the pointer are not scanned for nul. Any interior nul values or /// will result in an invalid C string. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[must_use] pub unsafe fn from_char_ptr_unchecked(p: *const char, len: usize) -> Self { Self::from_ptr_unchecked(p as *const u32, len) } /// Constructs a [`U32CString`] copy from an [`OsStr`][std::ffi::OsStr], checking for invalid /// nul values. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U32CString`] will be valid UTF-32. /// /// The string will be scanned for nul values, which are invlaid anywhere except the last /// character. /// The resulting string will always be nul-terminated even if the string is not. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms no changes to /// the string will be made. /// /// # Errors /// /// This function will return an error if the data contains a nul value anywhere except the /// last character. /// The returned error will contain a [`Vec`] as well as the position of the nul value. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = U32CString::from_os_str(s).unwrap(); /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` /// /// The following example demonstrates errors from nul values in a string. /// /// ```rust /// use widestring::U32CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let res = U32CString::from_os_str(s); /// assert!(res.is_err()); /// assert_eq!(res.err().unwrap().nul_position(), 2); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] pub fn from_os_str(s: impl AsRef) -> Result> { let v: Vec = s.as_ref().to_string_lossy().chars().collect(); Self::from_chars(v) } /// Constructs a [`U32CString`] copy from an [`OsStr`][std::ffi::OsStr], without checking for /// nul values. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U32CString`] will be valid UTF-32. /// /// The resulting string will always be nul-terminated even if the string is not. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms no changes to /// the string will be made. /// /// # Safety /// /// This method is equivalent to [`from_os_str`][Self::from_os_str] except that no runtime /// assertion is made that `s` contains invalid nul values. Providing a string with nul values /// anywhere except the last character will result in an invalid [`U32CString`]. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "MyString"; /// // Create a wide string from the string /// let wcstr = unsafe { U32CString::from_os_str_unchecked(s) }; /// # assert_eq!(wcstr.to_string_lossy(), s); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] #[must_use] pub unsafe fn from_os_str_unchecked(s: impl AsRef) -> Self { let v: Vec = s.as_ref().to_string_lossy().chars().collect(); Self::from_chars_unchecked(v) } /// Constructs a [`U32CString`] copy from an [`OsStr`][std::ffi::OsStr], truncating at the first /// nul terminator. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U32CString`] will be valid UTF-32. /// /// The string will be truncated at the first nul value in the string. /// The resulting string will always be nul-terminated even if the string is not. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms no changes to /// the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U32CString; /// let s = "My\u{0}String"; /// // Create a wide string from the string /// let wcstr = U32CString::from_os_str_truncate(s); /// assert_eq!(wcstr.to_string_lossy(), "My"); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] #[must_use] pub fn from_os_str_truncate(s: impl AsRef) -> Self { let v: Vec = s.as_ref().to_string_lossy().chars().collect(); Self::from_chars_truncate(v) } } impl core::fmt::Debug for U16CString { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u16(self.as_slice_with_nul(), f) } } impl core::fmt::Debug for U32CString { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u32(self.as_slice_with_nul(), f) } } /// Alias for `U16String` or `U32String` depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(not(windows))] pub type WideCString = U32CString; /// Alias for `U16String` or `U32String` depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(windows)] pub type WideCString = U16CString; widestring-1.1.0/src/ustr/iter.rs000064400000000000000000000304141046102023000151010ustar 00000000000000use crate::{ error::{DecodeUtf16Error, DecodeUtf32Error}, iter::{DecodeUtf16, DecodeUtf16Lossy, DecodeUtf32, DecodeUtf32Lossy}, }; #[allow(unused_imports)] use core::{ iter::{Copied, DoubleEndedIterator, ExactSizeIterator, FusedIterator}, slice::Iter, }; /// An iterator over UTF-16 decoded [`char`][prim@char]s of a string slice. /// /// This struct is created by the `chars` method on strings. See its documentation for more. #[derive(Clone)] pub struct CharsUtf16<'a> { inner: DecodeUtf16>>, } impl<'a> CharsUtf16<'a> { pub(crate) fn new(s: &'a [u16]) -> Self { Self { inner: crate::decode_utf16(s.iter().copied()), } } } impl<'a> Iterator for CharsUtf16<'a> { type Item = Result; #[inline] fn next(&mut self) -> Option { self.inner.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.inner.size_hint() } } impl<'a> FusedIterator for CharsUtf16<'a> {} impl<'a> DoubleEndedIterator for CharsUtf16<'a> { #[inline] fn next_back(&mut self) -> Option { self.inner.next_back() } } impl<'a> core::fmt::Debug for CharsUtf16<'a> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_utf16_iter(self.clone(), f) } } /// An iterator over UTF-32 decoded [`char`][prim@char]s of a string slice. /// /// This struct is created by the `chars` method on strings. See its documentation for more. #[derive(Clone)] pub struct CharsUtf32<'a> { inner: DecodeUtf32>>, } impl<'a> CharsUtf32<'a> { pub(crate) fn new(s: &'a [u32]) -> Self { Self { inner: crate::decode_utf32(s.iter().copied()), } } } impl<'a> Iterator for CharsUtf32<'a> { type Item = Result; #[inline] fn next(&mut self) -> Option { self.inner.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.inner.size_hint() } } impl<'a> FusedIterator for CharsUtf32<'a> {} impl<'a> DoubleEndedIterator for CharsUtf32<'a> { #[inline] fn next_back(&mut self) -> Option { self.inner.next_back() } } impl<'a> ExactSizeIterator for CharsUtf32<'a> { #[inline] fn len(&self) -> usize { self.inner.len() } } impl<'a> core::fmt::Debug for CharsUtf32<'a> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_utf32_iter(self.clone(), f) } } /// A lossy iterator over UTF-16 decoded [`char`][prim@char]s of a string slice. /// /// This struct is created by the `chars_lossy` method on strings. See its documentation for more. #[derive(Clone)] pub struct CharsLossyUtf16<'a> { iter: DecodeUtf16Lossy>>, } impl<'a> CharsLossyUtf16<'a> { pub(crate) fn new(s: &'a [u16]) -> Self { Self { iter: crate::decode_utf16_lossy(s.iter().copied()), } } } impl<'a> Iterator for CharsLossyUtf16<'a> { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharsLossyUtf16<'a> {} impl<'a> DoubleEndedIterator for CharsLossyUtf16<'a> { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back() } } impl<'a> core::fmt::Debug for CharsLossyUtf16<'a> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_char_iter(self.clone(), f) } } /// A lossy iterator over UTF-32 decoded [`char`][prim@char]s of a string slice. /// /// This struct is created by the `chars_lossy` method on strings. See its documentation for more. #[derive(Clone)] pub struct CharsLossyUtf32<'a> { iter: DecodeUtf32Lossy>>, } impl<'a> CharsLossyUtf32<'a> { pub(crate) fn new(s: &'a [u32]) -> Self { Self { iter: crate::decode_utf32_lossy(s.iter().copied()), } } } impl<'a> Iterator for CharsLossyUtf32<'a> { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharsLossyUtf32<'a> {} impl<'a> DoubleEndedIterator for CharsLossyUtf32<'a> { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back() } } impl<'a> ExactSizeIterator for CharsLossyUtf32<'a> { #[inline] fn len(&self) -> usize { self.iter.len() } } impl<'a> core::fmt::Debug for CharsLossyUtf32<'a> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_char_iter(self.clone(), f) } } /// An iterator over the decoded [`char`][prim@char]s of a string slice, and their positions. /// /// This struct is created by the `char_indices` method on strings. See its documentation for /// more. #[derive(Debug, Clone)] pub struct CharIndicesUtf16<'a> { forward_offset: usize, back_offset: usize, iter: CharsUtf16<'a>, } impl<'a> CharIndicesUtf16<'a> { pub(crate) fn new(s: &'a [u16]) -> Self { Self { forward_offset: 0, back_offset: s.len(), iter: CharsUtf16::new(s), } } /// Returns the position of the next character, or the length of the underlying string if /// there are no more characters. #[inline] pub fn offset(&self) -> usize { self.forward_offset } } impl<'a> Iterator for CharIndicesUtf16<'a> { type Item = (usize, Result); fn next(&mut self) -> Option { match self.iter.next() { Some(Ok(c)) => { let idx = self.forward_offset; self.forward_offset += c.len_utf16(); Some((idx, Ok(c))) } Some(Err(e)) => { let idx = self.forward_offset; self.forward_offset += 1; Some((idx, Err(e))) } None => None, } } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharIndicesUtf16<'a> {} impl<'a> DoubleEndedIterator for CharIndicesUtf16<'a> { #[inline] fn next_back(&mut self) -> Option { match self.iter.next_back() { Some(Ok(c)) => { self.back_offset -= c.len_utf16(); Some((self.back_offset, Ok(c))) } Some(Err(e)) => { self.back_offset -= 1; Some((self.back_offset, Err(e))) } None => None, } } } /// An iterator over the decoded [`char`][prim@char]s of a string slice, and their positions. /// /// This struct is created by the `char_indices` method on strings. See its documentation for /// more. #[derive(Debug, Clone)] pub struct CharIndicesUtf32<'a> { forward_offset: usize, back_offset: usize, iter: CharsUtf32<'a>, } impl<'a> CharIndicesUtf32<'a> { pub(crate) fn new(s: &'a [u32]) -> Self { Self { forward_offset: 0, back_offset: s.len(), iter: CharsUtf32::new(s), } } /// Returns the position of the next character, or the length of the underlying string if /// there are no more characters. #[inline] pub fn offset(&self) -> usize { self.forward_offset } } impl<'a> Iterator for CharIndicesUtf32<'a> { type Item = (usize, Result); fn next(&mut self) -> Option { match self.iter.next() { Some(Ok(c)) => { let idx = self.forward_offset; self.forward_offset += 1; Some((idx, Ok(c))) } Some(Err(e)) => { let idx = self.forward_offset; self.forward_offset += 1; Some((idx, Err(e))) } None => None, } } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharIndicesUtf32<'a> {} impl<'a> DoubleEndedIterator for CharIndicesUtf32<'a> { #[inline] fn next_back(&mut self) -> Option { match self.iter.next_back() { Some(Ok(c)) => { self.back_offset -= 1; Some((self.back_offset, Ok(c))) } Some(Err(e)) => { self.back_offset -= 1; Some((self.back_offset, Err(e))) } None => None, } } } impl<'a> ExactSizeIterator for CharIndicesUtf32<'a> { #[inline] fn len(&self) -> usize { self.iter.len() } } /// A lossy iterator over the [`char`][prim@char]s of a string slice, and their positions. /// /// This struct is created by the `char_indices_lossy` method on strings. See its documentation /// for more. #[derive(Debug, Clone)] pub struct CharIndicesLossyUtf16<'a> { forward_offset: usize, back_offset: usize, iter: CharsLossyUtf16<'a>, } impl<'a> CharIndicesLossyUtf16<'a> { pub(crate) fn new(s: &'a [u16]) -> Self { Self { forward_offset: 0, back_offset: s.len(), iter: CharsLossyUtf16::new(s), } } /// Returns the position of the next character, or the length of the underlying string if /// there are no more characters. #[inline] pub fn offset(&self) -> usize { self.forward_offset } } impl<'a> Iterator for CharIndicesLossyUtf16<'a> { type Item = (usize, char); #[inline] fn next(&mut self) -> Option { match self.iter.next() { Some(c) => { let idx = self.forward_offset; self.forward_offset += c.len_utf16(); Some((idx, c)) } None => None, } } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharIndicesLossyUtf16<'a> {} impl<'a> DoubleEndedIterator for CharIndicesLossyUtf16<'a> { #[inline] fn next_back(&mut self) -> Option { match self.iter.next_back() { Some(c) => { self.back_offset -= c.len_utf16(); Some((self.back_offset, c)) } None => None, } } } /// A lossy iterator over the [`char`][prim@char]s of a string slice, and their positions. /// /// This struct is created by the `char_indices_lossy` method on strings. See its documentation /// for more. #[derive(Debug, Clone)] pub struct CharIndicesLossyUtf32<'a> { forward_offset: usize, back_offset: usize, iter: CharsLossyUtf32<'a>, } impl<'a> CharIndicesLossyUtf32<'a> { pub(crate) fn new(s: &'a [u32]) -> Self { Self { forward_offset: 0, back_offset: s.len(), iter: CharsLossyUtf32::new(s), } } /// Returns the position of the next character, or the length of the underlying string if /// there are no more characters. #[inline] pub fn offset(&self) -> usize { self.forward_offset } } impl<'a> Iterator for CharIndicesLossyUtf32<'a> { type Item = (usize, char); #[inline] fn next(&mut self) -> Option { match self.iter.next() { Some(c) => { let idx = self.forward_offset; self.forward_offset += 1; Some((idx, c)) } None => None, } } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharIndicesLossyUtf32<'a> {} impl<'a> DoubleEndedIterator for CharIndicesLossyUtf32<'a> { #[inline] fn next_back(&mut self) -> Option { match self.iter.next_back() { Some(c) => { self.back_offset -= 1; Some((self.back_offset, c)) } None => None, } } } impl<'a> ExactSizeIterator for CharIndicesLossyUtf32<'a> { #[inline] fn len(&self) -> usize { self.iter.len() } } widestring-1.1.0/src/ustr.rs000064400000000000000000001364021046102023000141420ustar 00000000000000//! Wide string slices with undefined encoding. //! //! This module contains wide string slices and related types. #[cfg(feature = "alloc")] use crate::{ error::{Utf16Error, Utf32Error}, U16String, U32String, }; #[cfg(feature = "alloc")] #[allow(unused_imports)] use alloc::{boxed::Box, string::String, vec::Vec}; use core::{ char, fmt::Write, ops::{Index, IndexMut, Range}, slice::{self, SliceIndex}, }; mod iter; pub use iter::*; macro_rules! ustr_common_impl { { $(#[$ustr_meta:meta])* struct $ustr:ident([$uchar:ty]); type UString = $ustring:ident; type UCStr = $ucstr:ident; $(#[$display_meta:meta])* fn display() -> {} } => { $(#[$ustr_meta])* #[allow(clippy::derive_hash_xor_eq)] #[derive(PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct $ustr { pub(crate) inner: [$uchar], } impl $ustr { /// Coerces a value into a wide string slice. #[inline] #[must_use] pub fn new + ?Sized>(s: &S) -> &Self { s.as_ref() } /// Constructs a wide string slice from a pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes. No /// copying or allocation is performed, the resulting value is a direct reference to the /// pointer bytes. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. In particular, the returned string reference *must not /// be mutated* for the duration of lifetime `'a`, except inside an /// [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_ptr<'a>(p: *const $uchar, len: usize) -> &'a Self { assert!(!p.is_null()); let slice: *const [$uchar] = slice::from_raw_parts(p, len); &*(slice as *const $ustr) } /// Constructs a mutable wide string slice from a mutable pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes. No /// copying or allocation is performed, the resulting value is a direct reference to the /// pointer bytes. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent /// accidental misuse, it's suggested to tie the lifetime to whichever source lifetime /// is safe in the context, such as by providing a helper function taking the lifetime /// of a host value for the string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_ptr_mut<'a>(p: *mut $uchar, len: usize) -> &'a mut Self { assert!(!p.is_null()); let slice: *mut [$uchar] = slice::from_raw_parts_mut(p, len); &mut *(slice as *mut $ustr) } /// Constructs a wide string slice from a slice of character data. /// /// No checks are performed on the slice. It may be of any encoding and may contain /// invalid or malformed data for that encoding. #[inline] #[must_use] pub const fn from_slice(slice: &[$uchar]) -> &Self { let ptr: *const [$uchar] = slice; unsafe { &*(ptr as *const $ustr) } } /// Constructs a mutable wide string slice from a mutable slice of character data. /// /// No checks are performed on the slice. It may be of any encoding and may contain /// invalid or malformed data for that encoding. #[inline] #[must_use] pub fn from_slice_mut(slice: &mut [$uchar]) -> &mut Self { let ptr: *mut [$uchar] = slice; unsafe { &mut *(ptr as *mut $ustr) } } /// Copies the string reference to a new owned wide string. #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[inline] #[must_use] pub fn to_ustring(&self) -> $ustring { $ustring::from_vec(&self.inner) } /// Converts to a slice of the underlying elements of the string. #[inline] #[must_use] pub const fn as_slice(&self) -> &[$uchar] { &self.inner } /// Converts to a mutable slice of the underlying elements of the string. #[must_use] pub fn as_mut_slice(&mut self) -> &mut [$uchar] { &mut self.inner } /// Returns a raw pointer to the string. /// /// The caller must ensure that the string outlives the pointer this function returns, /// or else it will end up pointing to garbage. /// /// The caller must also ensure that the memory the pointer (non-transitively) points to /// is never written to (except inside an `UnsafeCell`) using this pointer or any /// pointer derived from it. If you need to mutate the contents of the string, use /// [`as_mut_ptr`][Self::as_mut_ptr]. /// /// Modifying the container referenced by this string may cause its buffer to be /// reallocated, which would also make any pointers to it invalid. #[inline] #[must_use] pub const fn as_ptr(&self) -> *const $uchar { self.inner.as_ptr() } /// Returns an unsafe mutable raw pointer to the string. /// /// The caller must ensure that the string outlives the pointer this function returns, /// or else it will end up pointing to garbage. /// /// Modifying the container referenced by this string may cause its buffer to be /// reallocated, which would also make any pointers to it invalid. #[inline] #[must_use] pub fn as_mut_ptr(&mut self) -> *mut $uchar { self.inner.as_mut_ptr() } /// Returns the two raw pointers spanning the string slice. /// /// The returned range is half-open, which means that the end pointer points one past /// the last element of the slice. This way, an empty slice is represented by two equal /// pointers, and the difference between the two pointers represents the size of the /// slice. /// /// See [`as_ptr`][Self::as_ptr] for warnings on using these pointers. The end pointer /// requires extra caution, as it does not point to a valid element in the slice. /// /// This function is useful for interacting with foreign interfaces which use two /// pointers to refer to a range of elements in memory, as is common in C++. #[inline] #[must_use] pub fn as_ptr_range(&self) -> Range<*const $uchar> { self.inner.as_ptr_range() } /// Returns the two unsafe mutable pointers spanning the string slice. /// /// The returned range is half-open, which means that the end pointer points one past /// the last element of the slice. This way, an empty slice is represented by two equal /// pointers, and the difference between the two pointers represents the size of the /// slice. /// /// See [`as_mut_ptr`][Self::as_mut_ptr] for warnings on using these pointers. The end /// pointer requires extra caution, as it does not point to a valid element in the /// slice. /// /// This function is useful for interacting with foreign interfaces which use two /// pointers to refer to a range of elements in memory, as is common in C++. #[inline] #[must_use] pub fn as_mut_ptr_range(&mut self) -> Range<*mut $uchar> { self.inner.as_mut_ptr_range() } /// Returns the length of the string as number of elements (**not** number of bytes). #[inline] #[must_use] pub const fn len(&self) -> usize { self.inner.len() } /// Returns whether this string contains no data. #[inline] #[must_use] pub const fn is_empty(&self) -> bool { self.inner.is_empty() } /// Converts a boxed wide string slice into an owned wide string without copying or /// allocating. #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_ustring(self: Box) -> $ustring { let boxed = unsafe { Box::from_raw(Box::into_raw(self) as *mut [$uchar]) }; $ustring { inner: boxed.into_vec(), } } $(#[$display_meta])* #[inline] #[must_use] pub fn display(&self) -> Display<'_, $ustr> { Display { str: self } } /// Returns a subslice of the string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] /// whenever equivalent indexing operation would panic. #[inline] #[must_use] pub fn get(&self, i: I) -> Option<&Self> where I: SliceIndex<[$uchar], Output = [$uchar]>, { self.inner.get(i).map(Self::from_slice) } /// Returns a mutable subslice of the string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] /// whenever equivalent indexing operation would panic. #[inline] #[must_use] pub fn get_mut(&mut self, i: I) -> Option<&mut Self> where I: SliceIndex<[$uchar], Output = [$uchar]>, { self.inner.get_mut(i).map(Self::from_slice_mut) } /// Returns an unchecked subslice of the string. /// /// This is the unchecked alternative to indexing the string. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice. /// /// Failing that, the returned string slice may reference invalid memory. #[inline] #[must_use] pub unsafe fn get_unchecked(&self, i: I) -> &Self where I: SliceIndex<[$uchar], Output = [$uchar]>, { Self::from_slice(self.inner.get_unchecked(i)) } /// Returns aa mutable, unchecked subslice of the string. /// /// This is the unchecked alternative to indexing the string. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice. /// /// Failing that, the returned string slice may reference invalid memory. #[inline] #[must_use] pub unsafe fn get_unchecked_mut(&mut self, i: I) -> &mut Self where I: SliceIndex<[$uchar], Output = [$uchar]>, { Self::from_slice_mut(self.inner.get_unchecked_mut(i)) } /// Divide one string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. /// /// The two slices returned go from the start of the string slice to `mid`, and from /// `mid` to the end of the string slice. /// /// To get mutable string slices instead, see the [`split_at_mut`][Self::split_at_mut] /// method. #[inline] #[must_use] pub fn split_at(&self, mid: usize) -> (&Self, &Self) { let split = self.inner.split_at(mid); (Self::from_slice(split.0), Self::from_slice(split.1)) } /// Divide one mutable string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. /// /// The two slices returned go from the start of the string slice to `mid`, and from /// `mid` to the end of the string slice. /// /// To get immutable string slices instead, see the [`split_at`][Self::split_at] method. #[inline] #[must_use] pub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self) { let split = self.inner.split_at_mut(mid); (Self::from_slice_mut(split.0), Self::from_slice_mut(split.1)) } /// Creates a new owned string by repeating this string `n` times. /// /// # Panics /// /// This function will panic if the capacity would overflow. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn repeat(&self, n: usize) -> $ustring { $ustring::from_vec(self.as_slice().repeat(n)) } } impl AsMut<$ustr> for $ustr { #[inline] fn as_mut(&mut self) -> &mut $ustr { self } } impl AsMut<[$uchar]> for $ustr { #[inline] fn as_mut(&mut self) -> &mut [$uchar] { self.as_mut_slice() } } impl AsRef<$ustr> for $ustr { #[inline] fn as_ref(&self) -> &Self { self } } impl AsRef<[$uchar]> for $ustr { #[inline] fn as_ref(&self) -> &[$uchar] { self.as_slice() } } impl Default for &$ustr { #[inline] fn default() -> Self { $ustr::from_slice(&[]) } } impl Default for &mut $ustr { #[inline] fn default() -> Self { $ustr::from_slice_mut(&mut []) } } impl<'a> From<&'a [$uchar]> for &'a $ustr { #[inline] fn from(value: &'a [$uchar]) -> Self { $ustr::from_slice(value) } } impl<'a> From<&'a mut [$uchar]> for &'a $ustr { #[inline] fn from(value: &'a mut [$uchar]) -> Self { $ustr::from_slice(value) } } impl<'a> From<&'a mut [$uchar]> for &'a mut $ustr { #[inline] fn from(value: &'a mut [$uchar]) -> Self { $ustr::from_slice_mut(value) } } impl<'a> From<&'a $ustr> for &'a [$uchar] { #[inline] fn from(value: &'a $ustr) -> Self { value.as_slice() } } impl<'a> From<&'a mut $ustr> for &'a mut [$uchar] { #[inline] fn from(value: &'a mut $ustr) -> Self { value.as_mut_slice() } } #[cfg(feature = "std")] impl From<&$ustr> for std::ffi::OsString { #[inline] fn from(s: &$ustr) -> std::ffi::OsString { s.to_os_string() } } impl Index for $ustr where I: SliceIndex<[$uchar], Output = [$uchar]>, { type Output = Self; #[inline] fn index(&self, index: I) -> &Self::Output { Self::from_slice(&self.inner[index]) } } impl IndexMut for $ustr where I: SliceIndex<[$uchar], Output = [$uchar]>, { #[inline] fn index_mut(&mut self, index: I) -> &mut Self::Output { Self::from_slice_mut(&mut self.inner[index]) } } impl PartialEq<$ustr> for &$ustr { #[inline] fn eq(&self, other: &$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<&$ustr> for $ustr { #[inline] fn eq(&self, other: &&$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $ustr { #[inline] fn eq(&self, other: &crate::$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for &$ustr { #[inline] fn eq(&self, other: &crate::$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<&crate::$ucstr> for $ustr { #[inline] fn eq(&self, other: &&crate::$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialOrd for $ustr { #[inline] fn partial_cmp(&self, other: &crate::$ucstr) -> Option { self.partial_cmp(other.as_ustr()) } } }; } ustr_common_impl! { /// 16-bit wide string slice with undefined encoding. /// /// [`U16Str`] is to [`U16String`][crate::U16String] as [`OsStr`][std::ffi::OsStr] is to /// [`OsString`][std::ffi::OsString]. /// /// [`U16Str`] are string slices that do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-16 data, they may be used for /// any wide encoded string. This is because [`U16Str`] is intended to be used with FFI /// functions, where proper encoding cannot be guaranteed. If you need string slices that are /// always valid UTF-16 strings, use [`Utf16Str`][crate::Utf16Str] instead. /// /// Because [`U16Str`] does not have a defined encoding, no restrictions are placed on mutating /// or indexing the slice. This means that even if the string contained properly encoded UTF-16 /// or other encoding data, mutationing or indexing may result in malformed data. Convert to a /// [`Utf16Str`][crate::Utf16Str] if retaining proper UTF-16 encoding is desired. /// /// # FFI considerations /// /// [`U16Str`] is not aware of nul values and may or may not be nul-terminated. It is intended /// to be used with FFI functions that directly use string length, where the strings are known /// to have proper nul-termination already, or where strings are merely being passed through /// without modification. /// /// [`U16CStr`][crate::U16CStr] should be used instead if nul-aware strings are required. /// /// # Examples /// /// The easiest way to use [`U16Str`] outside of FFI is with the [`u16str!`][crate::u16str] /// macro to convert string literals into UTF-16 string slices at compile time: /// /// ``` /// use widestring::u16str; /// let hello = u16str!("Hello, world!"); /// ``` /// /// You can also convert any [`u16`] slice directly: /// /// ``` /// use widestring::{u16str, U16Str}; /// /// let sparkle_heart = [0xd83d, 0xdc96]; /// let sparkle_heart = U16Str::from_slice(&sparkle_heart); /// /// assert_eq!(u16str!("๐Ÿ’–"), sparkle_heart); /// /// // This unpaired UTf-16 surrogate is invalid UTF-16, but is perfectly valid in U16Str /// let malformed_utf16 = [0x0, 0xd83d]; // Note that nul values are also valid an untouched /// let s = U16Str::from_slice(&malformed_utf16); /// /// assert_eq!(s.len(), 2); /// ``` /// /// When working with a FFI, it is useful to create a [`U16Str`] from a pointer and a length: /// /// ``` /// use widestring::{u16str, U16Str}; /// /// let sparkle_heart = [0xd83d, 0xdc96]; /// let sparkle_heart = unsafe { /// U16Str::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len()) /// }; /// assert_eq!(u16str!("๐Ÿ’–"), sparkle_heart); /// ``` struct U16Str([u16]); type UString = U16String; type UCStr = U16CStr; /// Returns an object that implements [`Display`][std::fmt::Display] for printing /// strings that may contain non-Unicode data. /// /// This method assumes this string is intended to be UTF-16 encoding, but handles /// ill-formed UTF-16 sequences lossily. The returned struct implements /// the [`Display`][std::fmt::Display] trait in a way that decoding the string is lossy /// UTF-16 decoding but no heap allocations are performed, such as by /// [`to_string_lossy`][Self::to_string_lossy]. /// /// By default, invalid Unicode data is replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). If you wish /// to simply skip any invalid Uncode data and forego the replacement, you may use the /// [alternate formatting][std::fmt#sign0] with `{:#}`. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::U16Str; /// /// // ๐„žmusic /// let s = U16Str::from_slice(&[ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]); /// /// assert_eq!(format!("{}", s.display()), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` /// /// Using alternate formatting style to skip invalid values entirely: /// /// ``` /// use widestring::U16Str; /// /// // ๐„žmusic /// let s = U16Str::from_slice(&[ /// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]); /// /// assert_eq!(format!("{:#}", s.display()), /// "๐„žmusic" /// ); /// ``` fn display() -> {} } ustr_common_impl! { /// 32-bit wide string slice with undefined encoding. /// /// [`U32Str`] is to [`U32String`][crate::U32String] as [`OsStr`][std::ffi::OsStr] is to /// [`OsString`][std::ffi::OsString]. /// /// [`U32Str`] are string slices that do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-32 data, they may be used for /// any wide encoded string. This is because [`U32Str`] is intended to be used with FFI /// functions, where proper encoding cannot be guaranteed. If you need string slices that are /// always valid UTF-32 strings, use [`Utf32Str`][crate::Utf32Str] instead. /// /// Because [`U32Str`] does not have a defined encoding, no restrictions are placed on mutating /// or indexing the slice. This means that even if the string contained properly encoded UTF-32 /// or other encoding data, mutationing or indexing may result in malformed data. Convert to a /// [`Utf32Str`][crate::Utf32Str] if retaining proper UTF-32 encoding is desired. /// /// # FFI considerations /// /// [`U32Str`] is not aware of nul values and may or may not be nul-terminated. It is intended /// to be used with FFI functions that directly use string length, where the strings are known /// to have proper nul-termination already, or where strings are merely being passed through /// without modification. /// /// [`U32CStr`][crate::U32CStr] should be used instead if nul-aware strings are required. /// /// # Examples /// /// The easiest way to use [`U32Str`] outside of FFI is with the [`u32str!`][crate::u32str] /// macro to convert string literals into UTF-32 string slices at compile time: /// /// ``` /// use widestring::u32str; /// let hello = u32str!("Hello, world!"); /// ``` /// /// You can also convert any [`u32`] slice directly: /// /// ``` /// use widestring::{u32str, U32Str}; /// /// let sparkle_heart = [0x1f496]; /// let sparkle_heart = U32Str::from_slice(&sparkle_heart); /// /// assert_eq!(u32str!("๐Ÿ’–"), sparkle_heart); /// /// // This UTf-16 surrogate is invalid UTF-32, but is perfectly valid in U32Str /// let malformed_utf32 = [0x0, 0xd83d]; // Note that nul values are also valid an untouched /// let s = U32Str::from_slice(&malformed_utf32); /// /// assert_eq!(s.len(), 2); /// ``` /// /// When working with a FFI, it is useful to create a [`U32Str`] from a pointer and a length: /// /// ``` /// use widestring::{u32str, U32Str}; /// /// let sparkle_heart = [0x1f496]; /// let sparkle_heart = unsafe { /// U32Str::from_ptr(sparkle_heart.as_ptr(), sparkle_heart.len()) /// }; /// assert_eq!(u32str!("๐Ÿ’–"), sparkle_heart); /// ``` struct U32Str([u32]); type UString = U32String; type UCStr = U32CStr; /// Returns an object that implements [`Display`][std::fmt::Display] for printing /// strings that may contain non-Unicode data. /// /// This method assumes this string is intended to be UTF-32 encoding, but handles /// ill-formed UTF-32 sequences lossily. The returned struct implements /// the [`Display`][std::fmt::Display] trait in a way that decoding the string is lossy /// UTF-32 decoding but no heap allocations are performed, such as by /// [`to_string_lossy`][Self::to_string_lossy]. /// /// By default, invalid Unicode data is replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). If you wish /// to simply skip any invalid Uncode data and forego the replacement, you may use the /// [alternate formatting][std::fmt#sign0] with `{:#}`. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::U32Str; /// /// // ๐„žmusic /// let s = U32Str::from_slice(&[ /// 0x1d11e, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]); /// /// assert_eq!(format!("{}", s.display()), /// "๐„žmus๏ฟฝic๏ฟฝ" /// ); /// ``` /// /// Using alternate formatting style to skip invalid values entirely: /// /// ``` /// use widestring::U32Str; /// /// // ๐„žmusic /// let s = U32Str::from_slice(&[ /// 0x1d11e, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834, /// ]); /// /// assert_eq!(format!("{:#}", s.display()), /// "๐„žmusic" /// ); /// ``` fn display() -> {} } impl U16Str { /// Decodes a string reference to an owned [`OsString`][std::ffi::OsString]. /// /// This makes a string copy of the [`U16Str`]. Since [`U16Str`] makes no guarantees that its /// encoding is UTF-16 or that the data valid UTF-16, there is no guarantee that the resulting /// [`OsString`][std::ffi::OsString] will have a valid underlying encoding either. /// /// Note that the encoding of [`OsString`][std::ffi::OsString] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms (such as /// windows) no changes to the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// use std::ffi::OsString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16String::from_str(s); /// // Create an OsString from the wide string /// let osstr = wstr.to_os_string(); /// /// assert_eq!(osstr, OsString::from(s)); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] #[must_use] pub fn to_os_string(&self) -> std::ffi::OsString { crate::platform::os_from_wide(&self.inner) } /// Decodes this string to a [`String`] if it contains valid UTF-16 data. /// /// This method assumes this string is encoded as UTF-16 and attempts to decode it as such. /// /// # Failures /// /// Returns an error if the string contains any invalid UTF-16 data. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16String::from_str(s); /// // Create a regular string from the wide string /// let s2 = wstr.to_string().unwrap(); /// /// assert_eq!(s2, s); /// ``` #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[inline] pub fn to_string(&self) -> Result { // Perform conversion ourselves to use our own error types with additional info let mut s = String::with_capacity(self.len()); for (index, result) in self.chars().enumerate() { let c = result.map_err(|e| Utf16Error::empty(index, e))?; s.push(c); } Ok(s) } /// Decodes the string to a [`String`] even if it is invalid UTF-16 data. /// /// This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any /// invalid sequences are replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which looks like this: /// ๏ฟฝ /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16String::from_str(s); /// // Create a regular string from the wide string /// let lossy = wstr.to_string_lossy(); /// /// assert_eq!(lossy, s); /// ``` #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[inline] #[must_use] pub fn to_string_lossy(&self) -> String { String::from_utf16_lossy(&self.inner) } /// Returns an iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-16. Since it /// may consist of invalid UTF-16, the iterator returned by this method /// is an iterator over `Result` instead of [`char`][prim@char]s /// directly. If you would like a lossy iterator over [`chars`][prim@char]s directly, instead /// use [`chars_lossy`][Self::chars_lossy]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars(&self) -> CharsUtf16<'_> { CharsUtf16::new(self.as_slice()) } /// Returns a lossy iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-16. Since it /// may consist of invalid UTF-16, the iterator returned by this method will replace unpaired /// surrogates with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). This is a lossy /// version of [`chars`][Self::chars]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars_lossy(&self) -> CharsLossyUtf16<'_> { CharsLossyUtf16::new(self.as_slice()) } /// Returns an iterator over the chars of a string slice, and their positions. /// /// As this string has no defined encoding, this method assumes the string is UTF-16. Since it /// may consist of invalid UTF-16, the iterator returned by this method is an iterator over /// `Result` as well as their positions, instead of /// [`char`][prim@char]s directly. If you would like a lossy indices iterator over /// [`chars`][prim@char]s directly, instead use /// [`char_indices_lossy`][Self::char_indices_lossy]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices(&self) -> CharIndicesUtf16<'_> { CharIndicesUtf16::new(self.as_slice()) } /// Returns a lossy iterator over the chars of a string slice, and their positions. /// /// As this string slice may consist of invalid UTF-16, the iterator returned by this method /// will replace unpaired surrogates with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ), as well as the /// positions of all characters. This is a lossy version of /// [`char_indices`][Self::char_indices]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf16<'_> { CharIndicesLossyUtf16::new(self.as_slice()) } } impl U32Str { /// Constructs a [`U32Str`] from a [`char`][prim@char] pointer and a length. /// /// The `len` argument is the number of `char` elements, **not** the number of bytes. No copying /// or allocation is performed, the resulting value is a direct reference to the pointer bytes. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` /// elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// In particular, the returned string reference *must not be mutated* for the duration of /// lifetime `'a`, except inside an [`UnsafeCell`][std::cell::UnsafeCell]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr<'a>(p: *const char, len: usize) -> &'a Self { Self::from_ptr(p as *const u32, len) } /// Constructs a mutable [`U32Str`] from a mutable [`char`][prim@char] pointer and a length. /// /// The `len` argument is the number of `char` elements, **not** the number of bytes. No copying /// or allocation is performed, the resulting value is a direct reference to the pointer bytes. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` /// elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts_mut]. /// /// # Panics /// /// This function panics if `p` is null. /// /// # Caveat /// /// The lifetime for the returned string is inferred from its usage. To prevent accidental /// misuse, it's suggested to tie the lifetime to whichever source lifetime is safe in the /// context, such as by providing a helper function taking the lifetime of a host value for the /// string, or by explicit annotation. #[inline] #[must_use] pub unsafe fn from_char_ptr_mut<'a>(p: *mut char, len: usize) -> &'a mut Self { Self::from_ptr_mut(p as *mut u32, len) } /// Constructs a [`U32Str`] from a [`char`][prim@char] slice. /// /// No checks are performed on the slice. #[inline] #[must_use] pub fn from_char_slice(slice: &[char]) -> &Self { let ptr: *const [char] = slice; unsafe { &*(ptr as *const Self) } } /// Constructs a mutable [`U32Str`] from a mutable [`char`][prim@char] slice. /// /// No checks are performed on the slice. #[inline] #[must_use] pub fn from_char_slice_mut(slice: &mut [char]) -> &mut Self { let ptr: *mut [char] = slice; unsafe { &mut *(ptr as *mut Self) } } /// Decodes a string to an owned [`OsString`][std::ffi::OsString]. /// /// This makes a string copy of the [`U16Str`]. Since [`U16Str`] makes no guarantees that its /// encoding is UTF-16 or that the data valid UTF-16, there is no guarantee that the resulting /// [`OsString`][std::ffi::OsString] will have a valid underlying encoding either. /// /// Note that the encoding of [`OsString`][std::ffi::OsString] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms no changes to /// the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// use std::ffi::OsString; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32String::from_str(s); /// // Create an OsString from the wide string /// let osstr = wstr.to_os_string(); /// /// assert_eq!(osstr, OsString::from(s)); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] #[must_use] pub fn to_os_string(&self) -> std::ffi::OsString { self.to_string_lossy().into() } /// Decodes the string to a [`String`] if it contains valid UTF-32 data. /// /// This method assumes this string is encoded as UTF-32 and attempts to decode it as such. /// /// # Failures /// /// Returns an error if the string contains any invalid UTF-32 data. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32String::from_str(s); /// // Create a regular string from the wide string /// let s2 = wstr.to_string().unwrap(); /// /// assert_eq!(s2, s); /// ``` #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub fn to_string(&self) -> Result { let mut s = String::with_capacity(self.len()); for (index, result) in self.chars().enumerate() { let c = result.map_err(|e| Utf32Error::empty(index, e))?; s.push(c); } Ok(s) } /// Decodes the string reference to a [`String`] even if it is invalid UTF-32 data. /// /// This method assumes this string is encoded as UTF-16 and attempts to decode it as such. Any /// invalid sequences are replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which looks like this: /// ๏ฟฝ /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32String::from_str(s); /// // Create a regular string from the wide string /// let lossy = wstr.to_string_lossy(); /// /// assert_eq!(lossy, s); /// ``` #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_string_lossy(&self) -> String { let chars: Vec = self .inner .iter() .map(|&c| char::from_u32(c).unwrap_or(char::REPLACEMENT_CHARACTER)) .collect(); let size = chars.iter().map(|c| c.len_utf8()).sum(); let mut vec = alloc::vec![0; size]; let mut i = 0; for c in chars { c.encode_utf8(&mut vec[i..]); i += c.len_utf8(); } unsafe { String::from_utf8_unchecked(vec) } } /// Returns an iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-32. Since it /// may consist of invalid UTF-32, the iterator returned by this method /// is an iterator over `Result` instead of [`char`][prim@char]s /// directly. If you would like a lossy iterator over [`chars`][prim@char]s directly, instead /// use [`chars_lossy`][Self::chars_lossy]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars(&self) -> CharsUtf32<'_> { CharsUtf32::new(self.as_slice()) } /// Returns a lossy iterator over the [`char`][prim@char]s of a string slice. /// /// As this string has no defined encoding, this method assumes the string is UTF-32. Since it /// may consist of invalid UTF-32, the iterator returned by this method will replace unpaired /// surrogates with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). This is a lossy /// version of [`chars`][Self::chars]. /// /// It's important to remember that [`char`][prim@char] represents a Unicode Scalar Value, and /// may not match your idea of what a 'character' is. Iteration over grapheme clusters may be /// what you actually want. That functionality is not provided by by this crate. #[inline] #[must_use] pub fn chars_lossy(&self) -> CharsLossyUtf32<'_> { CharsLossyUtf32::new(self.as_slice()) } /// Returns an iterator over the chars of a string slice, and their positions. /// /// As this string has no defined encoding, this method assumes the string is UTF-32. Since it /// may consist of invalid UTF-32, the iterator returned by this method is an iterator over /// `Result` as well as their positions, instead of /// [`char`][prim@char]s directly. If you would like a lossy indices iterator over /// [`chars`][prim@char]s directly, instead use /// [`char_indices_lossy`][Self::char_indices_lossy]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices(&self) -> CharIndicesUtf32<'_> { CharIndicesUtf32::new(self.as_slice()) } /// Returns a lossy iterator over the chars of a string slice, and their positions. /// /// As this string slice may consist of invalid UTF-32, the iterator returned by this method /// will replace invalid values with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ), as well as the /// positions of all characters. This is a lossy version of /// [`char_indices`][Self::char_indices]. /// /// The iterator yields tuples. The position is first, the [`char`][prim@char] is second. #[inline] #[must_use] pub fn char_indices_lossy(&self) -> CharIndicesLossyUtf32<'_> { CharIndicesLossyUtf32::new(self.as_slice()) } } impl core::fmt::Debug for U16Str { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u16(self.as_slice(), f) } } impl core::fmt::Debug for U32Str { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u32(self.as_slice(), f) } } impl<'a> From<&'a [char]> for &'a U32Str { #[inline] fn from(value: &'a [char]) -> Self { U32Str::from_char_slice(value) } } impl<'a> From<&'a mut [char]> for &'a mut U32Str { #[inline] fn from(value: &'a mut [char]) -> Self { U32Str::from_char_slice_mut(value) } } /// Alias for [`U16Str`] or [`U32Str`] depending on platform. Intended to match typical C `wchar_t` /// size on platform. #[cfg(not(windows))] pub type WideStr = U32Str; /// Alias for [`U16Str`] or [`U32Str`] depending on platform. Intended to match typical C `wchar_t` /// size on platform. #[cfg(windows)] pub type WideStr = U16Str; /// Helper struct for printing wide string values with [`format!`] and `{}`. /// /// A wide string might contain ill-formed UTF encoding. This struct implements the /// [`Display`][std::fmt::Display] trait in a way that decoding the string is lossy but no heap /// allocations are performed, such as by [`to_string_lossy`][U16Str::to_string_lossy]. It is /// created by the [`display`][U16Str::display] method on [`U16Str`] and [`U32Str`]. /// /// By default, invalid Unicode data is replaced with /// [`U+FFFD REPLACEMENT CHARACTER`][std::char::REPLACEMENT_CHARACTER] (๏ฟฝ). If you wish to simply /// skip any invalid Uncode data and forego the replacement, you may use the /// [alternate formatting][std::fmt#sign0] with `{:#}`. pub struct Display<'a, S: ?Sized> { str: &'a S, } impl<'a> core::fmt::Debug for Display<'a, U16Str> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(&self.str, f) } } impl<'a> core::fmt::Debug for Display<'a, U32Str> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(&self.str, f) } } impl<'a> core::fmt::Display for Display<'a, U16Str> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { for c in crate::decode_utf16_lossy(self.str.as_slice().iter().copied()) { // Allow alternate {:#} format which skips replacment chars entirely if c != core::char::REPLACEMENT_CHARACTER || !f.alternate() { f.write_char(c)?; } } Ok(()) } } impl<'a> core::fmt::Display for Display<'a, U32Str> { fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { for c in crate::decode_utf32_lossy(self.str.as_slice().iter().copied()) { // Allow alternate {:#} format which skips replacment chars entirely if c != core::char::REPLACEMENT_CHARACTER || !f.alternate() { f.write_char(c)?; } } Ok(()) } } widestring-1.1.0/src/ustring/iter.rs000064400000000000000000000016651046102023000156050ustar 00000000000000#[allow(unused_imports)] use core::iter::{DoubleEndedIterator, ExactSizeIterator, FusedIterator}; /// A draining iterator for string data with unknown encoding. #[derive(Debug)] pub struct Drain<'a, T> { pub(crate) inner: alloc::vec::Drain<'a, T>, } impl AsRef<[T]> for Drain<'_, T> { #[inline] fn as_ref(&self) -> &[T] { self.inner.as_ref() } } impl Iterator for Drain<'_, T> { type Item = T; #[inline] fn next(&mut self) -> Option { self.inner.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.inner.size_hint() } } impl DoubleEndedIterator for Drain<'_, T> { #[inline] fn next_back(&mut self) -> Option { self.inner.next_back() } } impl ExactSizeIterator for Drain<'_, T> { #[inline] fn len(&self) -> usize { self.inner.len() } } impl FusedIterator for Drain<'_, T> {} widestring-1.1.0/src/ustring.rs000064400000000000000000001472741046102023000146510ustar 00000000000000//! Owned, growable wide strings with undefined encoding. //! //! This module contains wide strings and related types. use crate::{U16CStr, U16CString, U16Str, U32CStr, U32CString, U32Str}; #[allow(unused_imports)] use alloc::{ borrow::{Cow, ToOwned}, boxed::Box, string::String, vec::Vec, }; #[allow(unused_imports)] use core::{ borrow::{Borrow, BorrowMut}, char, cmp, convert::Infallible, fmt::Write, iter::FromIterator, mem, ops::{Add, AddAssign, Deref, DerefMut, Index, IndexMut, RangeBounds}, slice::{self, SliceIndex}, str::FromStr, }; mod iter; pub use iter::*; macro_rules! ustring_common_impl { { $(#[$ustring_meta:meta])* struct $ustring:ident([$uchar:ty]); type UStr = $ustr:ident; type UCString = $ucstring:ident; type UCStr = $ucstr:ident; type UtfStr = $utfstr:ident; type UtfString = $utfstring:ident; $(#[$push_meta:meta])* fn push() -> {} $(#[$push_slice_meta:meta])* fn push_slice() -> {} $(#[$into_boxed_ustr_meta:meta])* fn into_boxed_ustr() -> {} } => { $(#[$ustring_meta])* #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[derive(Default, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct $ustring { pub(crate) inner: Vec<$uchar>, } impl $ustring { /// Constructs a new empty wide string. #[inline] #[must_use] pub const fn new() -> Self { Self { inner: Vec::new() } } /// Constructs a wide string from a vector. /// /// No checks are made on the contents of the vector. It may or may not be valid /// character data. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let v = vec![84u16, 104u16, 101u16]; // 'T' 'h' 'e' /// # let cloned = v.clone(); /// // Create a wide string from the vector /// let wstr = U16String::from_vec(v); /// # assert_eq!(wstr.into_vec(), cloned); /// ``` /// /// ```rust /// use widestring::U32String; /// let v = vec![84u32, 104u32, 101u32]; // 'T' 'h' 'e' /// # let cloned = v.clone(); /// // Create a wide string from the vector /// let wstr = U32String::from_vec(v); /// # assert_eq!(wstr.into_vec(), cloned); /// ``` #[inline] #[must_use] pub fn from_vec(raw: impl Into>) -> Self { Self { inner: raw.into() } } /// Constructs a wide string copy from a pointer and a length. /// /// The `len` argument is the number of elements, **not** the number of bytes. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for /// `len` elements. /// /// In addition, the data must meet the safety conditions of /// [std::slice::from_raw_parts]. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[must_use] pub unsafe fn from_ptr(p: *const $uchar, len: usize) -> Self { if len == 0 { return Self::new(); } assert!(!p.is_null()); let slice = slice::from_raw_parts(p, len); Self::from_vec(slice) } /// Constructs a wide string with the given capacity. /// /// The string will be able to hold exactly `capacity` elements without reallocating. /// If `capacity` is set to 0, the string will not initially allocate. #[inline] #[must_use] pub fn with_capacity(capacity: usize) -> Self { Self { inner: Vec::with_capacity(capacity), } } /// Returns the capacity this wide string can hold without reallocating. #[inline] #[must_use] pub fn capacity(&self) -> usize { self.inner.capacity() } /// Truncates the wide string to zero length. #[inline] pub fn clear(&mut self) { self.inner.clear() } /// Reserves the capacity for at least `additional` more capacity to be inserted in the /// given wide string. /// /// More space may be reserved to avoid frequent allocations. #[inline] pub fn reserve(&mut self, additional: usize) { self.inner.reserve(additional) } /// Reserves the minimum capacity for exactly `additional` more capacity to be inserted /// in the given wide string. Does nothing if the capacity is already sufficient. /// /// Note that the allocator may give more space than is requested. Therefore capacity /// can not be relied upon to be precisely minimal. Prefer [`reserve`][Self::reserve] if /// future insertions are expected. #[inline] pub fn reserve_exact(&mut self, additional: usize) { self.inner.reserve_exact(additional) } /// Converts the string into a [`Vec`], consuming the string in the process. #[inline] #[must_use] pub fn into_vec(self) -> Vec<$uchar> { self.inner } /// Converts to a wide string slice. #[inline] #[must_use] pub fn as_ustr(&self) -> &$ustr { $ustr::from_slice(&self.inner) } /// Converts to a mutable wide string slice. #[inline] #[must_use] pub fn as_mut_ustr(&mut self) -> &mut $ustr { $ustr::from_slice_mut(&mut self.inner) } /// Returns a [`Vec`] reference to the contents of this string. #[inline] #[must_use] pub fn as_vec(&self) -> &Vec<$uchar> { &self.inner } /// Returns a mutable reference to the contents of this string. #[inline] #[must_use] pub fn as_mut_vec(&mut self) -> &mut Vec<$uchar> { &mut self.inner } $(#[$push_meta])* #[inline] pub fn push(&mut self, s: impl AsRef<$ustr>) { self.inner.extend_from_slice(&s.as_ref().as_slice()) } $(#[$push_slice_meta])* #[inline] pub fn push_slice(&mut self, s: impl AsRef<[$uchar]>) { self.inner.extend_from_slice(s.as_ref()) } /// Shrinks the capacity of the wide string to match its length. #[inline] pub fn shrink_to_fit(&mut self) { self.inner.shrink_to_fit(); } /// Shrinks the capacity of this string with a lower bound. /// /// The capacity will remain at least as large as both the length and the supplied /// value. /// /// If the current capacity is less than the lower limit, this is a no-op. #[inline] pub fn shrink_to(&mut self, min_capacity: usize) { self.inner.shrink_to(min_capacity) } $(#[$into_boxed_ustr_meta])* #[must_use] pub fn into_boxed_ustr(self) -> Box<$ustr> { let rw = Box::into_raw(self.inner.into_boxed_slice()) as *mut $ustr; unsafe { Box::from_raw(rw) } } /// Shortens this string to the specified length. /// /// If `new_len` is greater than the string's current length, this has no effect. /// /// Note that this method has no effect on the allocated capacity of the string. #[inline] pub fn truncate(&mut self, new_len: usize) { self.inner.truncate(new_len) } /// Inserts a string slice into this string at a specified position. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length. pub fn insert_ustr(&mut self, idx: usize, string: &$ustr) { assert!(idx <= self.len()); self.inner .resize_with(self.len() + string.len(), Default::default); self.inner.copy_within(idx.., idx + string.len()); self.inner[idx..].copy_from_slice(string.as_slice()); } /// Splits the string into two at the given index. /// /// Returns a newly allocated string. `self` contains values `[0, at)`, and the returned /// string contains values `[at, len)`. /// /// Note that the capacity of `self` does not change. /// /// # Panics /// /// Panics if `at` is equal to or greater than the length of the string. #[inline] #[must_use] pub fn split_off(&mut self, at: usize) -> $ustring { Self::from_vec(self.inner.split_off(at)) } /// Retains only the elements specified by the predicate. /// /// In other words, remove all elements `e` such that `f(e)` returns `false`. This /// method operates in place, visiting each element exactly once in the original order, /// and preserves the order of the retained elements. pub fn retain(&mut self, mut f: F) where F: FnMut($uchar) -> bool, { self.inner.retain(|e| f(*e)) } /// Creates a draining iterator that removes the specified range in the string and /// yields the removed elements. /// /// Note: The element range is removed even if the iterator is not consumed until the /// end. /// /// # Panics /// /// Panics if the starting point or end point are out of bounds. pub fn drain(&mut self, range: R) -> Drain<'_, $uchar> where R: RangeBounds, { Drain { inner: self.inner.drain(range) } } /// Removes the specified range in the string, and replaces it with the given string. /// /// The given string doesn't need to be the same length as the range. /// /// # Panics /// /// Panics if the starting point or end point are out of bounds. pub fn replace_range(&mut self, range: R, replace_with: impl AsRef<$ustr>) where R: RangeBounds, { self.inner .splice(range, replace_with.as_ref().as_slice().iter().copied()); } } impl Add<&$ustr> for $ustring { type Output = $ustring; #[inline] fn add(mut self, rhs: &$ustr) -> Self::Output { self.push(rhs); self } } impl Add<&$ucstr> for $ustring { type Output = $ustring; #[inline] fn add(mut self, rhs: &$ucstr) -> Self::Output { self.push(rhs); self } } impl Add<&crate::$utfstr> for $ustring { type Output = $ustring; #[inline] fn add(mut self, rhs: &crate::$utfstr) -> Self::Output { self.push(rhs); self } } impl Add<&str> for $ustring { type Output = $ustring; #[inline] fn add(mut self, rhs: &str) -> Self::Output { self.push_str(rhs); self } } impl AddAssign<&$ustr> for $ustring { #[inline] fn add_assign(&mut self, rhs: &$ustr) { self.push(rhs) } } impl AddAssign<&$ucstr> for $ustring { #[inline] fn add_assign(&mut self, rhs: &$ucstr) { self.push(rhs) } } impl AddAssign<&crate::$utfstr> for $ustring { #[inline] fn add_assign(&mut self, rhs: &crate::$utfstr) { self.push(rhs) } } impl AddAssign<&str> for $ustring { #[inline] fn add_assign(&mut self, rhs: &str) { self.push_str(rhs); } } impl AsMut<$ustr> for $ustring { #[inline] fn as_mut(&mut self) -> &mut $ustr { self.as_mut_ustr() } } impl AsMut<[$uchar]> for $ustring { #[inline] fn as_mut(&mut self) -> &mut [$uchar] { self.as_mut_slice() } } impl AsRef<$ustr> for $ustring { #[inline] fn as_ref(&self) -> &$ustr { self.as_ustr() } } impl AsRef<[$uchar]> for $ustring { #[inline] fn as_ref(&self) -> &[$uchar] { self.as_slice() } } impl Borrow<$ustr> for $ustring { #[inline] fn borrow(&self) -> &$ustr { self.as_ustr() } } impl BorrowMut<$ustr> for $ustring { #[inline] fn borrow_mut(&mut self) -> &mut $ustr { self.as_mut_ustr() } } impl Default for Box<$ustr> { #[inline] fn default() -> Self { let boxed: Box<[$uchar]> = Box::from([]); let rw = Box::into_raw(boxed) as *mut $ustr; unsafe { Box::from_raw(rw) } } } impl Deref for $ustring { type Target = $ustr; #[inline] fn deref(&self) -> &$ustr { self.as_ustr() } } impl DerefMut for $ustring { #[inline] fn deref_mut(&mut self) -> &mut Self::Target { self.as_mut_ustr() } } impl<'a> Extend<&'a $ustr> for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s)) } } impl<'a> Extend<&'a $ucstr> for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s)) } } impl<'a> Extend<&'a crate::$utfstr> for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s)) } } impl<'a> Extend<&'a str> for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_str(s)) } } impl Extend<$ustring> for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s)) } } impl Extend<$ucstring> for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s.as_ucstr())) } } impl Extend for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s.as_ustr())) } } impl Extend for $ustring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_str(s)) } } impl Extend for $ustring { #[inline] fn extend>(&mut self, iter: T) { let iter = iter.into_iter(); let (lower_bound, _) = iter.size_hint(); self.reserve(lower_bound); iter.for_each(|c| self.push_char(c)); } } impl<'a> Extend<&'a char> for $ustring { #[inline] fn extend>(&mut self, iter: T) { self.extend(iter.into_iter().copied()) } } impl Extend> for $ustring { #[inline] fn extend>>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s)) } } impl<'a> Extend> for $ustring { #[inline] fn extend>>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push(s)) } } impl From<$ustring> for Vec<$uchar> { #[inline] fn from(value: $ustring) -> Self { value.into_vec() } } impl<'a> From<$ustring> for Cow<'a, $ustr> { #[inline] fn from(s: $ustring) -> Self { Cow::Owned(s) } } impl From> for $ustring { #[inline] fn from(value: Vec<$uchar>) -> Self { Self::from_vec(value) } } impl From for $ustring { #[inline] fn from(s: String) -> Self { Self::from_str(&s) } } impl From<&str> for $ustring { #[inline] fn from(s: &str) -> Self { Self::from_str(s) } } #[cfg(feature = "std")] impl From for $ustring { #[inline] fn from(s: std::ffi::OsString) -> Self { Self::from_os_str(&s) } } #[cfg(feature = "std")] impl From<$ustring> for std::ffi::OsString { #[inline] fn from(s: $ustring) -> Self { s.to_os_string() } } impl<'a, T: ?Sized + AsRef<$ustr>> From<&'a T> for $ustring { #[inline] fn from(s: &'a T) -> Self { s.as_ref().to_ustring() } } impl<'a> From<&'a $ustr> for Cow<'a, $ustr> { #[inline] fn from(s: &'a $ustr) -> Self { Cow::Borrowed(s) } } impl<'a> From<&'a $ustr> for Box<$ustr> { fn from(s: &'a $ustr) -> Self { let boxed: Box<[$uchar]> = Box::from(&s.inner); let rw = Box::into_raw(boxed) as *mut $ustr; unsafe { Box::from_raw(rw) } } } impl From> for $ustring { #[inline] fn from(boxed: Box<$ustr>) -> Self { boxed.into_ustring() } } impl From<$ustring> for Box<$ustr> { #[inline] fn from(s: $ustring) -> Self { s.into_boxed_ustr() } } impl<'a> FromIterator<&'a $ustr> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl<'a> FromIterator<&'a $ucstr> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl<'a> FromIterator<&'a crate::$utfstr> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl<'a> FromIterator<&'a str> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromIterator<$ustring> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromIterator<$ucstring> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromIterator for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromIterator for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromIterator for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl<'a> FromIterator<&'a char> for $ustring { #[inline] fn from_iter>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromIterator> for $ustring { #[inline] fn from_iter>>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl<'a> FromIterator> for $ustring { #[inline] fn from_iter>>(iter: T) -> Self { let mut string = Self::new(); string.extend(iter); string } } impl FromStr for $ustring { type Err = Infallible; #[inline] fn from_str(s: &str) -> Result { Ok(Self::from_str(s)) } } impl Index for $ustring where I: SliceIndex<[$uchar], Output = [$uchar]>, { type Output = $ustr; #[inline] fn index(&self, index: I) -> &$ustr { &self.as_ustr()[index] } } impl IndexMut for $ustring where I: SliceIndex<[$uchar], Output = [$uchar]>, { fn index_mut(&mut self, index: I) -> &mut Self::Output { &mut self.as_mut_ustr()[index] } } impl PartialEq<$ustr> for $ustring { #[inline] fn eq(&self, other: &$ustr) -> bool { self.as_ustr() == other } } impl PartialEq<$ucstr> for $ustring { #[inline] fn eq(&self, other: &$ucstr) -> bool { self.as_ustr() == other } } impl PartialEq<$ucstring> for $ustring { #[inline] fn eq(&self, other: &$ucstring) -> bool { self.as_ustr() == other.as_ucstr() } } impl<'a> PartialEq<&'a $ustr> for $ustring { #[inline] fn eq(&self, other: &&'a $ustr) -> bool { self.as_ustr() == *other } } impl<'a> PartialEq<&'a $ucstr> for $ustring { #[inline] fn eq(&self, other: &&'a $ucstr) -> bool { self.as_ustr() == *other } } impl<'a> PartialEq> for $ustring { #[inline] fn eq(&self, other: &Cow<'a, $ustr>) -> bool { self.as_ustr() == other.as_ref() } } impl<'a> PartialEq> for $ustring { #[inline] fn eq(&self, other: &Cow<'a, $ucstr>) -> bool { self.as_ustr() == other.as_ref() } } impl PartialEq<$ustring> for $ustr { #[inline] fn eq(&self, other: &$ustring) -> bool { self == other.as_ustr() } } impl PartialEq<$ustring> for $ucstr { #[inline] fn eq(&self, other: &$ustring) -> bool { self.as_ustr() == other.as_ustr() } } impl PartialEq<$ustring> for &$ustr { #[inline] fn eq(&self, other: &$ustring) -> bool { self == other.as_ustr() } } impl PartialEq<$ustring> for &$ucstr { #[inline] fn eq(&self, other: &$ustring) -> bool { self.as_ustr() == other.as_ustr() } } impl PartialOrd<$ustr> for $ustring { #[inline] fn partial_cmp(&self, other: &$ustr) -> Option { self.as_ustr().partial_cmp(other) } } impl PartialOrd<$ucstr> for $ustring { #[inline] fn partial_cmp(&self, other: &$ucstr) -> Option { self.as_ustr().partial_cmp(other) } } impl<'a> PartialOrd<&'a $ustr> for $ustring { #[inline] fn partial_cmp(&self, other: &&'a $ustr) -> Option { self.as_ustr().partial_cmp(*other) } } impl<'a> PartialOrd<&'a $ucstr> for $ustring { #[inline] fn partial_cmp(&self, other: &&'a $ucstr) -> Option { self.as_ustr().partial_cmp(*other) } } impl<'a> PartialOrd> for $ustring { #[inline] fn partial_cmp(&self, other: &Cow<'a, $ustr>) -> Option { self.as_ustr().partial_cmp(other.as_ref()) } } impl<'a> PartialOrd> for $ustring { #[inline] fn partial_cmp(&self, other: &Cow<'a, $ucstr>) -> Option { self.as_ustr().partial_cmp(other.as_ref()) } } impl PartialOrd<$ucstring> for $ustring { #[inline] fn partial_cmp(&self, other: &$ucstring) -> Option { self.as_ustr().partial_cmp(other.as_ucstr()) } } impl ToOwned for $ustr { type Owned = $ustring; #[inline] fn to_owned(&self) -> $ustring { self.to_ustring() } } impl Write for $ustring { #[inline] fn write_str(&mut self, s: &str) -> core::fmt::Result { self.push_str(s); Ok(()) } #[inline] fn write_char(&mut self, c: char) -> core::fmt::Result { self.push_char(c); Ok(()) } } }; } ustring_common_impl! { /// An owned, mutable 16-bit wide string with undefined encoding. /// /// The string slice of a [`U16String`] is [`U16Str`]. /// /// [`U16String`] are strings that do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-16 data, they may be used for /// any wide encoded string. This is because [`U16String`] is intended to be used with FFI /// functions, where proper encoding cannot be guaranteed. If you need string slices that are /// always valid UTF-16 strings, use [`Utf16String`][crate::Utf16String] instead. /// /// Because [`U16String`] does not have a defined encoding, no restrictions are placed on /// mutating or indexing the string. This means that even if the string contained properly /// encoded UTF-16 or other encoding data, mutationing or indexing may result in malformed data. /// Convert to a [`Utf16String`][crate::Utf16String] if retaining proper UTF-16 encoding is /// desired. /// /// # FFI considerations /// /// [`U16String`] is not aware of nul values. Strings may or may not be nul-terminated, and may /// contain invalid and ill-formed UTF-16. These strings are intended to be used with FFI functions /// that directly use string length, where the strings are known to have proper nul-termination /// already, or where strings are merely being passed through without modification. /// /// [`U16CString`][crate::U16CString] should be used instead if nul-aware strings are required. /// /// # Examples /// /// The easiest way to use [`U16String`] outside of FFI is with the [`u16str!`][crate::u16str] /// macro to convert string literals into UTF-16 string slices at compile time: /// /// ``` /// use widestring::{u16str, U16String}; /// let hello = U16String::from(u16str!("Hello, world!")); /// ``` /// /// You can also convert any [`u16`] slice or vector directly: /// /// ``` /// use widestring::{u16str, U16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; /// let sparkle_heart = U16String::from_vec(sparkle_heart); /// /// assert_eq!(u16str!("๐Ÿ’–"), sparkle_heart); /// /// // This unpaired UTf-16 surrogate is invalid UTF-16, but is perfectly valid in U16String /// let malformed_utf16 = vec![0x0, 0xd83d]; // Note that nul values are also valid an untouched /// let s = U16String::from_vec(malformed_utf16); /// /// assert_eq!(s.len(), 2); /// ``` /// /// The following example constructs a [`U16String`] and shows how to convert a [`U16String`] to /// a regular Rust [`String`]. /// /// ```rust /// use widestring::U16String; /// let s = "Test"; /// // Create a wide string from the rust string /// let wstr = U16String::from_str(s); /// // Convert back to a rust string /// let rust_str = wstr.to_string_lossy(); /// assert_eq!(rust_str, "Test"); /// ``` struct U16String([u16]); type UStr = U16Str; type UCString = U16CString; type UCStr = U16CStr; type UtfStr = Utf16Str; type UtfString = Utf16String; /// Extends the string with the given string slice. /// /// No checks are performed on the strings. It is possible to end up nul values inside /// the string, or invalid encoding, and it is up to the caller to determine if that is /// acceptable. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// let mut wstr = U16String::from_str(s); /// let cloned = wstr.clone(); /// // Push the clone to the end, repeating the string twice. /// wstr.push(cloned); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` fn push() -> {} /// Extends the string with the given slice. /// /// No checks are performed on the strings. It is possible to end up nul values inside /// the string, or invalid encoding, and it is up to the caller to determine if that is /// acceptable. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// let mut wstr = U16String::from_str(s); /// let cloned = wstr.clone(); /// // Push the clone to the end, repeating the string twice. /// wstr.push_slice(cloned); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` fn push_slice() -> {} /// Converts this wide string into a boxed string slice. /// /// # Examples /// /// ``` /// use widestring::{U16String, U16Str}; /// /// let s = U16String::from_str("hello"); /// /// let b: Box = s.into_boxed_ustr(); /// ``` fn into_boxed_ustr() -> {} } ustring_common_impl! { /// An owned, mutable 32-bit wide string with undefined encoding. /// /// The string slice of a [`U32String`] is [`U32Str`]. /// /// [`U32String`] are strings that do not have a defined encoding. While it is sometimes /// assumed that they contain possibly invalid or ill-formed UTF-32 data, they may be used for /// any wide encoded string. This is because [`U32String`] is intended to be used with FFI /// functions, where proper encoding cannot be guaranteed. If you need string slices that are /// always valid UTF-32 strings, use [`Utf32String`][crate::Utf32String] instead. /// /// Because [`U32String`] does not have a defined encoding, no restrictions are placed on /// mutating or indexing the string. This means that even if the string contained properly /// encoded UTF-32 or other encoding data, mutationing or indexing may result in malformed data. /// Convert to a [`Utf32String`][crate::Utf32String] if retaining proper UTF-16 encoding is /// desired. /// /// # FFI considerations /// /// [`U32String`] is not aware of nul values. Strings may or may not be nul-terminated, and may /// contain invalid and ill-formed UTF-32. These strings are intended to be used with FFI functions /// that directly use string length, where the strings are known to have proper nul-termination /// already, or where strings are merely being passed through without modification. /// /// [`U32CString`][crate::U32CString] should be used instead if nul-aware strings are required. /// /// # Examples /// /// The easiest way to use [`U32String`] outside of FFI is with the [`u32str!`][crate::u32str] /// macro to convert string literals into UTF-32 string slices at compile time: /// /// ``` /// use widestring::{u32str, U32String}; /// let hello = U32String::from(u32str!("Hello, world!")); /// ``` /// /// You can also convert any [`u32`] slice or vector directly: /// /// ``` /// use widestring::{u32str, U32String}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32String::from_vec(sparkle_heart); /// /// assert_eq!(u32str!("๐Ÿ’–"), sparkle_heart); /// /// // This UTf-16 surrogate is invalid UTF-32, but is perfectly valid in U32String /// let malformed_utf32 = vec![0x0, 0xd83d]; // Note that nul values are also valid an untouched /// let s = U32String::from_vec(malformed_utf32); /// /// assert_eq!(s.len(), 2); /// ``` /// /// The following example constructs a [`U32String`] and shows how to convert a [`U32String`] to /// a regular Rust [`String`]. /// /// ```rust /// use widestring::U32String; /// let s = "Test"; /// // Create a wide string from the rust string /// let wstr = U32String::from_str(s); /// // Convert back to a rust string /// let rust_str = wstr.to_string_lossy(); /// assert_eq!(rust_str, "Test"); /// ``` struct U32String([u32]); type UStr = U32Str; type UCString = U32CString; type UCStr = U32CStr; type UtfStr = Utf32Str; type UtfString = Utf32String; /// Extends the string with the given string slice. /// /// No checks are performed on the strings. It is possible to end up nul values inside /// the string, or invalid encoding, and it is up to the caller to determine if that is /// acceptable. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// let mut wstr = U32String::from_str(s); /// let cloned = wstr.clone(); /// // Push the clone to the end, repeating the string twice. /// wstr.push(cloned); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` fn push() -> {} /// Extends the string with the given slice. /// /// No checks are performed on the strings. It is possible to end up nul values inside /// the string, or invalid encoding, and it is up to the caller to determine if that is /// acceptable. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// let mut wstr = U32String::from_str(s); /// let cloned = wstr.clone(); /// // Push the clone to the end, repeating the string twice. /// wstr.push_slice(cloned); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` fn push_slice() -> {} /// Converts this wide string into a boxed string slice. /// /// # Examples /// /// ``` /// use widestring::{U32String, U32Str}; /// /// let s = U32String::from_str("hello"); /// /// let b: Box = s.into_boxed_ustr(); /// ``` fn into_boxed_ustr() -> {} } impl U16String { /// Constructs a [`U16String`] copy from a [`str`], encoding it as UTF-16. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U16String`] will also be valid UTF-16. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16String::from_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), s); /// ``` #[allow(clippy::should_implement_trait)] #[inline] #[must_use] pub fn from_str + ?Sized>(s: &S) -> Self { Self { inner: s.as_ref().encode_utf16().collect(), } } /// Constructs a [`U16String`] copy from an [`OsStr`][std::ffi::OsStr]. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U16String`] will be valid UTF-16. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms (such as /// windows) no changes to the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U16String::from_os_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), s); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] #[must_use] pub fn from_os_str + ?Sized>(s: &S) -> Self { Self { inner: crate::platform::os_to_wide(s.as_ref()), } } /// Extends the string with the given string slice, encoding it at UTF-16. /// /// No checks are performed on the strings. It is possible to end up nul values inside the /// string, and it is up to the caller to determine if that is acceptable. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// let mut wstr = U16String::from_str(s); /// // Push the original to the end, repeating the string twice. /// wstr.push_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` #[inline] pub fn push_str(&mut self, s: impl AsRef) { self.inner.extend(s.as_ref().encode_utf16()) } /// Extends the string with the given string slice. /// /// No checks are performed on the strings. It is possible to end up nul values inside the /// string, and it is up to the caller to determine if that is acceptable. /// /// # Examples /// /// ```rust /// use widestring::U16String; /// let s = "MyString"; /// let mut wstr = U16String::from_str(s); /// // Push the original to the end, repeating the string twice. /// wstr.push_os_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] pub fn push_os_str(&mut self, s: impl AsRef) { self.inner.extend(crate::platform::os_to_wide(s.as_ref())) } /// Appends the given [`char`][prim@char] encoded as UTF-16 to the end of this string. #[inline] pub fn push_char(&mut self, c: char) { let mut buf = [0; 2]; self.inner.extend_from_slice(c.encode_utf16(&mut buf)) } /// Removes the last character or unpaired surrogate from the string buffer and returns it. /// /// This method assumes UTF-16 encoding, but handles invalid UTF-16 by returning unpaired /// surrogates. /// /// Returns `None` if this String is empty. Otherwise, returns the character cast to a /// [`u32`][prim@u32] or the value of the unpaired surrogate as a [`u32`][prim@u32] value. pub fn pop_char(&mut self) -> Option { match self.inner.pop() { Some(low) if crate::is_utf16_surrogate(low) => { if !crate::is_utf16_low_surrogate(low) || self.inner.is_empty() { Some(low as u32) } else { let high = self.inner[self.len() - 1]; if crate::is_utf16_high_surrogate(high) { self.inner.pop(); let buf = [high, low]; Some( char::decode_utf16(buf.iter().copied()) .next() .unwrap() .unwrap() as u32, ) } else { Some(low as u32) } } } Some(u) => Some(u as u32), None => None, } } /// Removes a [`char`][prim@char] or unpaired surrogate from this string at a position and /// returns it as a [`u32`][prim@u32]. /// /// This method assumes UTF-16 encoding, but handles invalid UTF-16 by returning unpaired /// surrogates. /// /// This is an _O(n)_ operation, as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than or equal to the string's length. pub fn remove_char(&mut self, idx: usize) -> u32 { let slice = &self.inner[idx..]; let c = char::decode_utf16(slice.iter().copied()).next().unwrap(); let clen = c.as_ref().map(|c| c.len_utf16()).unwrap_or(1); let c = c .map(|c| c as u32) .unwrap_or_else(|_| self.inner[idx] as u32); self.inner.drain(idx..idx + clen); c } /// Inserts a character encoded as UTF-16 into this string at a specified position. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length. pub fn insert_char(&mut self, idx: usize, c: char) { assert!(idx <= self.len()); let mut buf = [0; 2]; let slice = c.encode_utf16(&mut buf); self.inner.resize(self.len() + slice.len(), 0); self.inner.copy_within(idx.., idx + slice.len()); self.inner[idx..].copy_from_slice(slice); } } impl U32String { /// Constructs a [`U32String`] from a [`char`][prim@char] vector. /// /// No checks are made on the contents of the vector. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let v: Vec = "Test".chars().collect(); /// # let cloned: Vec = v.iter().map(|&c| c as u32).collect(); /// // Create a wide string from the vector /// let wstr = U32String::from_chars(v); /// # assert_eq!(wstr.into_vec(), cloned); /// ``` #[must_use] pub fn from_chars(raw: impl Into>) -> Self { let mut chars = raw.into(); Self { inner: unsafe { let ptr = chars.as_mut_ptr() as *mut u32; let len = chars.len(); let cap = chars.capacity(); mem::forget(chars); Vec::from_raw_parts(ptr, len, cap) }, } } /// Constructs a [`U16String`] copy from a [`str`], encoding it as UTF-32. /// /// This makes a string copy of the [`str`]. Since [`str`] will always be valid UTF-8, the /// resulting [`U32String`] will also be valid UTF-32. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32String::from_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), s); /// ``` #[allow(clippy::should_implement_trait)] #[inline] #[must_use] pub fn from_str + ?Sized>(s: &S) -> Self { let v: Vec = s.as_ref().chars().collect(); Self::from_chars(v) } /// Constructs a [`U32String`] copy from an [`OsStr`][std::ffi::OsStr]. /// /// This makes a string copy of the [`OsStr`][std::ffi::OsStr]. Since [`OsStr`][std::ffi::OsStr] /// makes no guarantees that it is valid data, there is no guarantee that the resulting /// [`U32String`] will be valid UTF-32. /// /// Note that the encoding of [`OsStr`][std::ffi::OsStr] is platform-dependent, so on /// some platforms this may make an encoding conversions, while on other platforms no changes to /// the string will be made. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// // Create a wide string from the string /// let wstr = U32String::from_os_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), s); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[must_use] pub fn from_os_str + ?Sized>(s: &S) -> Self { let v: Vec = s.as_ref().to_string_lossy().chars().collect(); Self::from_chars(v) } /// Constructs a [`U32String`] from a [`char`][prim@char] pointer and a length. /// /// The `len` argument is the number of `char` elements, **not** the number of bytes. /// /// # Safety /// /// This function is unsafe as there is no guarantee that the given pointer is valid for `len` /// elements. /// /// In addition, the data must meet the safety conditions of [std::slice::from_raw_parts]. /// /// # Panics /// /// Panics if `len` is greater than 0 but `p` is a null pointer. #[inline] #[must_use] pub unsafe fn from_char_ptr(p: *const char, len: usize) -> Self { Self::from_ptr(p as *const u32, len) } /// Extends the string with the given string slice, encoding it at UTF-32. /// /// No checks are performed on the strings. It is possible to end up nul values inside the /// string, and it is up to the caller to determine if that is acceptable. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// let mut wstr = U32String::from_str(s); /// // Push the original to the end, repeating the string twice. /// wstr.push_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` #[inline] pub fn push_str(&mut self, s: impl AsRef) { self.inner.extend(s.as_ref().chars().map(|c| c as u32)) } /// Extends the string with the given string slice. /// /// No checks are performed on the strings. It is possible to end up nul values inside the /// string, and it is up to the caller to determine if that is acceptable. /// /// # Examples /// /// ```rust /// use widestring::U32String; /// let s = "MyString"; /// let mut wstr = U32String::from_str(s); /// // Push the original to the end, repeating the string twice. /// wstr.push_os_str(s); /// /// assert_eq!(wstr.to_string().unwrap(), "MyStringMyString"); /// ``` #[cfg(feature = "std")] #[cfg_attr(docsrs, doc(cfg(feature = "std")))] #[inline] pub fn push_os_str(&mut self, s: impl AsRef) { self.inner .extend(s.as_ref().to_string_lossy().chars().map(|c| c as u32)) } /// Appends the given [`char`][prim@char] encoded as UTF-32 to the end of this string. #[inline] pub fn push_char(&mut self, c: char) { self.inner.push(c as u32); } /// Removes the last value from the string buffer and returns it. /// /// This method assumes UTF-32 encoding. /// /// Returns `None` if this String is empty. #[inline] pub fn pop_char(&mut self) -> Option { self.inner.pop() } /// Removes a value from this string at a position and returns it. /// /// This method assumes UTF-32 encoding. /// /// This is an _O(n)_ operation, as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than or equal to the string's length. #[inline] pub fn remove_char(&mut self, idx: usize) -> u32 { self.inner.remove(idx) } /// Inserts a character encoded as UTF-32 into this string at a specified position. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length. #[inline] pub fn insert_char(&mut self, idx: usize, c: char) { self.inner.insert(idx, c as u32) } } impl core::fmt::Debug for U16String { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u16(self.as_slice(), f) } } impl core::fmt::Debug for U32String { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { crate::debug_fmt_u32(self.as_slice(), f) } } impl From> for U32String { #[inline] fn from(value: Vec) -> Self { Self::from_chars(value) } } impl From<&[char]> for U32String { #[inline] fn from(value: &[char]) -> Self { U32String::from_chars(value) } } /// Alias for [`U16String`] or [`U32String`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(not(windows))] pub type WideString = U32String; /// Alias for [`U16String`] or [`U32String`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(windows)] pub type WideString = U16String; #[cfg(test)] mod test { use super::*; #[test] #[allow(clippy::write_literal)] fn number_to_string() { let mut s = U16String::new(); write!(s, "{}", 1234).unwrap(); assert_eq!(s, U16String::from_str("1234")); } #[test] fn truncated_with_surrogate() { // Character U+24B62, encoded as D852 DF62 in UTF16 let buf = "๐คญข"; let mut s = U16String::from_str(buf); assert_eq!(s.pop_char(), Some('๐คญข' as u32)); } } widestring-1.1.0/src/utfstr/iter.rs000064400000000000000000000247511046102023000154420ustar 00000000000000use crate::{ debug_fmt_char_iter, decode_utf16, decode_utf32, iter::{DecodeUtf16, DecodeUtf32}, }; #[allow(unused_imports)] use core::{ fmt::Write, iter::{Copied, DoubleEndedIterator, ExactSizeIterator, FlatMap, FusedIterator}, slice::Iter, }; /// An iterator over the [`char`]s of a UTF-16 string slice /// /// This struct is created by the [`chars`][crate::Utf16Str::chars] method on /// [`Utf16Str`][crate::Utf16Str]. See its documentation for more. #[derive(Clone)] pub struct CharsUtf16<'a> { iter: DecodeUtf16>>, } impl<'a> CharsUtf16<'a> { pub(super) fn new(s: &'a [u16]) -> Self { Self { iter: decode_utf16(s.iter().copied()), } } } impl<'a> Iterator for CharsUtf16<'a> { type Item = char; #[inline] fn next(&mut self) -> Option { // Utf16Str already ensures valid surrogate pairs self.iter.next().map(|r| r.unwrap()) } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharsUtf16<'a> {} impl<'a> DoubleEndedIterator for CharsUtf16<'a> { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back().map(|r| r.unwrap()) } } impl<'a> core::fmt::Debug for CharsUtf16<'a> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { debug_fmt_char_iter(self.clone(), f) } } impl<'a> core::fmt::Display for CharsUtf16<'a> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { self.clone().try_for_each(|c| f.write_char(c)) } } /// An iterator over the [`char`]s of a UTF-32 string slice /// /// This struct is created by the [`chars`][crate::Utf32Str::chars] method on /// [`Utf32Str`][crate::Utf32Str]. See its documentation for more. #[derive(Clone)] pub struct CharsUtf32<'a> { iter: DecodeUtf32>>, } impl<'a> CharsUtf32<'a> { pub(super) fn new(s: &'a [u32]) -> Self { Self { iter: decode_utf32(s.iter().copied()), } } } impl<'a> Iterator for CharsUtf32<'a> { type Item = char; #[inline] fn next(&mut self) -> Option { // Utf32Str already ensures valid code points self.iter.next().map(|r| r.unwrap()) } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> DoubleEndedIterator for CharsUtf32<'a> { #[inline] fn next_back(&mut self) -> Option { // Utf32Str already ensures valid code points self.iter.next_back().map(|r| r.unwrap()) } } impl<'a> FusedIterator for CharsUtf32<'a> {} impl<'a> ExactSizeIterator for CharsUtf32<'a> { #[inline] fn len(&self) -> usize { self.iter.len() } } impl<'a> core::fmt::Debug for CharsUtf32<'a> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { debug_fmt_char_iter(self.clone(), f) } } impl<'a> core::fmt::Display for CharsUtf32<'a> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { self.clone().try_for_each(|c| f.write_char(c)) } } /// An iterator over the [`char`]s of a string slice, and their positions /// /// This struct is created by the [`char_indices`][crate::Utf16Str::char_indices] method on /// [`Utf16Str`][crate::Utf16Str]. See its documentation for more. #[derive(Debug, Clone)] pub struct CharIndicesUtf16<'a> { forward_offset: usize, back_offset: usize, iter: CharsUtf16<'a>, } impl<'a> CharIndicesUtf16<'a> { /// Returns the position of the next character, or the length of the underlying string if /// there are no more characters. #[inline] pub fn offset(&self) -> usize { self.forward_offset } } impl<'a> CharIndicesUtf16<'a> { pub(super) fn new(s: &'a [u16]) -> Self { Self { forward_offset: 0, back_offset: s.len(), iter: CharsUtf16::new(s), } } } impl<'a> Iterator for CharIndicesUtf16<'a> { type Item = (usize, char); #[inline] fn next(&mut self) -> Option { let result = self.iter.next(); if let Some(c) = result { let offset = self.forward_offset; self.forward_offset += c.len_utf16(); Some((offset, c)) } else { None } } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharIndicesUtf16<'a> {} impl<'a> DoubleEndedIterator for CharIndicesUtf16<'a> { #[inline] fn next_back(&mut self) -> Option { let result = self.iter.next_back(); if let Some(c) = result { self.back_offset -= c.len_utf16(); Some((self.back_offset, c)) } else { None } } } /// An iterator over the [`char`]s of a string slice, and their positions /// /// This struct is created by the [`char_indices`][crate::Utf32Str::char_indices] method on /// [`Utf32Str`][crate::Utf32Str]. See its documentation for more. #[derive(Debug, Clone)] pub struct CharIndicesUtf32<'a> { forward_offset: usize, back_offset: usize, iter: CharsUtf32<'a>, } impl<'a> CharIndicesUtf32<'a> { /// Returns the position of the next character, or the length of the underlying string if /// there are no more characters. #[inline] pub fn offset(&self) -> usize { self.forward_offset } } impl<'a> CharIndicesUtf32<'a> { pub(super) fn new(s: &'a [u32]) -> Self { Self { forward_offset: 0, back_offset: s.len(), iter: CharsUtf32::new(s), } } } impl<'a> Iterator for CharIndicesUtf32<'a> { type Item = (usize, char); #[inline] fn next(&mut self) -> Option { let result = self.iter.next(); if let Some(c) = result { let offset = self.forward_offset; self.forward_offset += 1; Some((offset, c)) } else { None } } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CharIndicesUtf32<'a> {} impl<'a> DoubleEndedIterator for CharIndicesUtf32<'a> { #[inline] fn next_back(&mut self) -> Option { let result = self.iter.next_back(); if let Some(c) = result { self.back_offset -= 1; Some((self.back_offset, c)) } else { None } } } impl<'a> ExactSizeIterator for CharIndicesUtf32<'a> { #[inline] fn len(&self) -> usize { self.iter.len() } } /// The return type of [`Utf16Str::escape_debug`][crate::Utf16Str::escape_debug]. #[derive(Debug, Clone)] pub struct EscapeDebug { iter: FlatMap core::char::EscapeDebug>, } impl<'a> EscapeDebug> { pub(super) fn new(s: &'a [u16]) -> Self { Self { iter: CharsUtf16::new(s).flat_map(|c| c.escape_debug()), } } } impl<'a> EscapeDebug> { pub(super) fn new(s: &'a [u32]) -> Self { Self { iter: CharsUtf32::new(s).flat_map(|c| c.escape_debug()), } } } /// The return type of [`Utf16Str::escape_default`][crate::Utf16Str::escape_default]. #[derive(Debug, Clone)] pub struct EscapeDefault { iter: FlatMap core::char::EscapeDefault>, } impl<'a> EscapeDefault> { pub(super) fn new(s: &'a [u16]) -> Self { Self { iter: CharsUtf16::new(s).flat_map(|c| c.escape_default()), } } } impl<'a> EscapeDefault> { pub(super) fn new(s: &'a [u32]) -> Self { Self { iter: CharsUtf32::new(s).flat_map(|c| c.escape_default()), } } } /// The return type of [`Utf16Str::escape_unicode`][crate::Utf16Str::escape_unicode]. #[derive(Debug, Clone)] pub struct EscapeUnicode { iter: FlatMap core::char::EscapeUnicode>, } impl<'a> EscapeUnicode> { pub(super) fn new(s: &'a [u16]) -> Self { Self { iter: CharsUtf16::new(s).flat_map(|c| c.escape_unicode()), } } } impl<'a> EscapeUnicode> { pub(super) fn new(s: &'a [u32]) -> Self { Self { iter: CharsUtf32::new(s).flat_map(|c| c.escape_unicode()), } } } macro_rules! escape_impls { ($($name:ident),+) => {$( impl core::fmt::Display for $name where I: Iterator + Clone { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { self.clone().try_for_each(|c| f.write_char(c)) } } impl< I> Iterator for $name where I: Iterator { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter.next() } #[inline] fn size_hint(&self) -> (usize, Option) { let (lower, upper) = self.iter.size_hint(); // Worst case, every char has to be unicode escaped as \u{NNNNNN} (lower, upper.and_then(|len| len.checked_mul(10))) } } impl FusedIterator for $name where I: Iterator + FusedIterator {} )+} } escape_impls!(EscapeDebug, EscapeDefault, EscapeUnicode); /// An iterator over the [`u16`] code units of a UTF-16 string slice /// /// This struct is created by the [`code_units`][crate::Utf16Str::code_units] method on /// [`Utf16Str`][crate::Utf16Str]. See its documentation for more. #[derive(Debug, Clone)] pub struct CodeUnits<'a> { iter: Copied>, } impl<'a> CodeUnits<'a> { pub(super) fn new(s: &'a [u16]) -> Self { Self { iter: s.iter().copied(), } } } impl<'a> Iterator for CodeUnits<'a> { type Item = u16; #[inline] fn next(&mut self) -> Option { self.iter.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl<'a> FusedIterator for CodeUnits<'a> {} impl<'a> DoubleEndedIterator for CodeUnits<'a> { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back() } } impl<'a> ExactSizeIterator for CodeUnits<'a> { #[inline] fn len(&self) -> usize { self.iter.len() } } widestring-1.1.0/src/utfstr.rs000064400000000000000000002312551046102023000144760ustar 00000000000000//! UTF string slices. //! //! This module contains UTF string slices and related types. use crate::{ error::{Utf16Error, Utf32Error}, is_utf16_low_surrogate, iter::{EncodeUtf16, EncodeUtf32, EncodeUtf8}, validate_utf16, validate_utf32, U16Str, U32Str, }; #[cfg(feature = "alloc")] use crate::{Utf16String, Utf32String}; #[cfg(feature = "alloc")] #[allow(unused_imports)] use alloc::{borrow::Cow, boxed::Box, string::String}; #[allow(unused_imports)] use core::{ convert::{AsMut, AsRef, TryFrom}, fmt::Write, ops::{Index, IndexMut, RangeBounds}, slice::SliceIndex, }; mod iter; pub use iter::*; macro_rules! utfstr_common_impl { { $(#[$utfstr_meta:meta])* struct $utfstr:ident([$uchar:ty]); type UtfString = $utfstring:ident; type UStr = $ustr:ident; type UCStr = $ucstr:ident; type UtfError = $utferror:ident; $(#[$from_slice_unchecked_meta:meta])* fn from_slice_unchecked() -> {} $(#[$from_slice_unchecked_mut_meta:meta])* fn from_slice_unchecked_mut() -> {} $(#[$from_boxed_slice_unchecked_meta:meta])* fn from_boxed_slice_unchecked() -> {} $(#[$get_unchecked_meta:meta])* fn get_unchecked() -> {} $(#[$get_unchecked_mut_meta:meta])* fn get_unchecked_mut() -> {} $(#[$len_meta:meta])* fn len() -> {} } => { $(#[$utfstr_meta])* #[allow(clippy::derive_hash_xor_eq)] #[derive(PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct $utfstr { pub(crate) inner: [$uchar], } impl $utfstr { $(#[$from_slice_unchecked_meta])* #[allow(trivial_casts)] #[inline] #[must_use] pub const unsafe fn from_slice_unchecked(s: &[$uchar]) -> &Self { &*(s as *const [$uchar] as *const Self) } $(#[$from_slice_unchecked_mut_meta])* #[allow(trivial_casts)] #[inline] #[must_use] pub unsafe fn from_slice_unchecked_mut(s: &mut [$uchar]) -> &mut Self { &mut *(s as *mut [$uchar] as *mut Self) } $(#[$from_boxed_slice_unchecked_meta])* #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub unsafe fn from_boxed_slice_unchecked(s: Box<[$uchar]>) -> Box { Box::from_raw(Box::into_raw(s) as *mut Self) } $(#[$get_unchecked_meta])* #[inline] #[must_use] pub unsafe fn get_unchecked(&self, index: I) -> &Self where I: SliceIndex<[$uchar], Output = [$uchar]>, { Self::from_slice_unchecked(self.inner.get_unchecked(index)) } $(#[$get_unchecked_mut_meta])* #[inline] #[must_use] pub unsafe fn get_unchecked_mut(&mut self, index: I) -> &mut Self where I: SliceIndex<[$uchar], Output = [$uchar]>, { Self::from_slice_unchecked_mut(self.inner.get_unchecked_mut(index)) } $(#[$len_meta])* #[inline] #[must_use] pub const fn len(&self) -> usize { self.inner.len() } /// Returns `true` if the string has a length of zero. #[inline] #[must_use] pub const fn is_empty(&self) -> bool { self.inner.is_empty() } /// Converts a string to a slice of its underlying elements. /// /// To convert the slice back into a string slice, use the /// [`from_slice`][Self::from_slice] function. #[inline] #[must_use] pub const fn as_slice(&self) -> &[$uchar] { &self.inner } /// Converts a mutable string to a mutable slice of its underlying elements. /// /// # Safety /// /// This function is unsafe because you can violate the invariants of this type when /// mutating the slice. The caller must ensure that the contents of the slice is valid /// UTF before the borrow ends and the underlying string is used. /// /// Use of this string type whose contents have been mutated to invalid UTF is /// undefined behavior. #[inline] #[must_use] pub unsafe fn as_mut_slice(&mut self) -> &mut [$uchar] { &mut self.inner } /// Converts a string slice to a raw pointer. /// /// This pointer will be pointing to the first element of the string slice. /// /// The caller must ensure that the returned pointer is never written to. If you need to /// mutate the contents of the string slice, use [`as_mut_ptr`][Self::as_mut_ptr]. #[inline] #[must_use] pub const fn as_ptr(&self) -> *const $uchar { self.inner.as_ptr() } /// Converts a mutable string slice to a mutable pointer. /// /// This pointer will be pointing to the first element of the string slice. #[inline] #[must_use] pub fn as_mut_ptr(&mut self) -> *mut $uchar { self.inner.as_mut_ptr() } /// Returns this string as a wide string slice of undefined encoding. #[inline] #[must_use] pub const fn as_ustr(&self) -> &$ustr { $ustr::from_slice(self.as_slice()) } /// Returns a string slice with leading and trailing whitespace removed. /// /// 'Whitespace' is defined according to the terms of the Unicode Derived Core Property /// `White_Space`. #[must_use] pub fn trim(&self) -> &Self { self.trim_start().trim_end() } /// Returns a string slice with leading whitespace removed. /// /// 'Whitespace' is defined according to the terms of the Unicode Derived Core Property /// `White_Space`. /// /// # Text directionality /// /// A string is a sequence of elements. `start` in this context means the first position /// of that sequence; for a left-to-right language like English or Russian, this will be /// left side, and for right-to-left languages like Arabic or Hebrew, this will be the /// right side. #[must_use] pub fn trim_start(&self) -> &Self { if let Some((index, _)) = self.char_indices().find(|(_, c)| !c.is_whitespace()) { &self[index..] } else { <&Self as Default>::default() } } /// Returns a string slice with trailing whitespace removed. /// /// 'Whitespace' is defined according to the terms of the Unicode Derived Core Property /// `White_Space`. /// /// # Text directionality /// /// A string is a sequence of elements. `end` in this context means the last position of /// that sequence; for a left-to-right language like English or Russian, this will be /// right side, and for right-to-left languages like Arabic or Hebrew, this will be the /// left side. #[must_use] pub fn trim_end(&self) -> &Self { if let Some((index, _)) = self.char_indices().rfind(|(_, c)| !c.is_whitespace()) { &self[..=index] } else { <&Self as Default>::default() } } /// Converts a boxed string into a boxed slice without copying or allocating. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_boxed_slice(self: Box) -> Box<[$uchar]> { // SAFETY: from_raw pointer is from into_raw unsafe { Box::from_raw(Box::into_raw(self) as *mut [$uchar]) } } /// Converts a boxed string slice into an owned UTF string without copying or /// allocating. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn into_utfstring(self: Box) -> $utfstring { unsafe { $utfstring::from_vec_unchecked(self.into_boxed_slice().into_vec()) } } /// Creates a new owned string by repeating this string `n` times. /// /// # Panics /// /// This function will panic if the capacity would overflow. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn repeat(&self, n: usize) -> $utfstring { unsafe { $utfstring::from_vec_unchecked(self.as_slice().repeat(n)) } } } impl AsMut<$utfstr> for $utfstr { #[inline] fn as_mut(&mut self) -> &mut $utfstr { self } } impl AsRef<$utfstr> for $utfstr { #[inline] fn as_ref(&self) -> &$utfstr { self } } impl AsRef<[$uchar]> for $utfstr { #[inline] fn as_ref(&self) -> &[$uchar] { self.as_slice() } } impl AsRef<$ustr> for $utfstr { #[inline] fn as_ref(&self) -> &$ustr { self.as_ustr() } } impl core::fmt::Debug for $utfstr { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { f.write_char('"')?; self.escape_debug().try_for_each(|c| f.write_char(c))?; f.write_char('"') } } impl Default for &$utfstr { #[inline] fn default() -> Self { // SAFETY: Empty slice is always valid unsafe { $utfstr::from_slice_unchecked(&[]) } } } impl Default for &mut $utfstr { #[inline] fn default() -> Self { // SAFETY: Empty slice is valways valid unsafe { $utfstr::from_slice_unchecked_mut(&mut []) } } } impl core::fmt::Display for $utfstr { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { self.chars().try_for_each(|c| f.write_char(c)) } } #[cfg(feature = "alloc")] impl From> for Box<[$uchar]> { #[inline] fn from(value: Box<$utfstr>) -> Self { value.into_boxed_slice() } } impl<'a> From<&'a $utfstr> for &'a $ustr { #[inline] fn from(value: &'a $utfstr) -> Self { value.as_ustr() } } impl<'a> From<&'a $utfstr> for &'a [$uchar] { #[inline] fn from(value: &'a $utfstr) -> Self { value.as_slice() } } #[cfg(feature = "std")] impl From<&$utfstr> for std::ffi::OsString { #[inline] fn from(value: &$utfstr) -> std::ffi::OsString { value.as_ustr().to_os_string() } } impl PartialEq<$utfstr> for &$utfstr { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_slice() == other.as_slice() } } #[cfg(feature = "alloc")] impl<'a, 'b> PartialEq> for &'b $utfstr { #[inline] fn eq(&self, other: &Cow<'a, $utfstr>) -> bool { self == other.as_ref() } } #[cfg(feature = "alloc")] impl PartialEq<$utfstr> for Cow<'_, $utfstr> { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_ref() == other } } #[cfg(feature = "alloc")] impl<'a, 'b> PartialEq<&'a $utfstr> for Cow<'b, $utfstr> { #[inline] fn eq(&self, other: &&'a $utfstr) -> bool { self.as_ref() == *other } } impl PartialEq<$ustr> for $utfstr { #[inline] fn eq(&self, other: &$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstr> for $ustr { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstr { #[inline] fn eq(&self, other: &crate::$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstr> for crate::$ucstr { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstr { #[inline] fn eq(&self, other: &str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<&str> for $utfstr { #[inline] fn eq(&self, other: &&str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for &$utfstr { #[inline] fn eq(&self, other: &str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<$utfstr> for str { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<$utfstr> for &str { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.chars().eq(other.chars()) } } #[cfg(feature = "alloc")] impl<'a, 'b> PartialEq> for &'b $utfstr { #[inline] fn eq(&self, other: &Cow<'a, str>) -> bool { self == other.as_ref() } } #[cfg(feature = "alloc")] impl PartialEq<$utfstr> for Cow<'_, str> { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_ref() == other } } #[cfg(feature = "alloc")] impl<'a, 'b> PartialEq<&'a $utfstr> for Cow<'b, str> { #[inline] fn eq(&self, other: &&'a $utfstr) -> bool { self.as_ref() == *other } } impl<'a> TryFrom<&'a $ustr> for &'a $utfstr { type Error = $utferror; #[inline] fn try_from(value: &'a $ustr) -> Result { $utfstr::from_ustr(value) } } impl<'a> TryFrom<&'a crate::$ucstr> for &'a $utfstr { type Error = $utferror; #[inline] fn try_from(value: &'a crate::$ucstr) -> Result { $utfstr::from_ucstr(value) } } }; } utfstr_common_impl! { /// UTF-16 string slice for [`Utf16String`][crate::Utf16String]. /// /// [`Utf16Str`] is to [`Utf16String`][crate::Utf16String] as [`str`] is to [`String`]. /// /// [`Utf16Str`] slices are string slices that are always valid UTF-16 encoding. This is unlike /// the [`U16Str`][crate::U16Str] string slices, which may not have valid encoding. In this way, /// [`Utf16Str`] string slices most resemble native [`str`] slices of all the types in this /// crate. /// /// # Examples /// /// The easiest way to use [`Utf16Str`] is with the [`utf16str!`][crate::utf16str] macro to /// convert string literals into string slices at compile time: /// /// ``` /// use widestring::utf16str; /// let hello = utf16str!("Hello, world!"); /// ``` /// /// You can also convert a [`u16`] slice directly, provided it is valid UTF-16: /// /// ``` /// use widestring::Utf16Str; /// /// let sparkle_heart = [0xd83d, 0xdc96]; /// let sparkle_heart = Utf16Str::from_slice(&sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` struct Utf16Str([u16]); type UtfString = Utf16String; type UStr = U16Str; type UCStr = U16CStr; type UtfError = Utf16Error; /// Converts a slice to a string slice without checking that the string contains valid UTF-16. /// /// See the safe version, [`from_slice`][Self::from_slice], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the slice passed to it is valid /// UTF-16. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf16Str`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::Utf16Str; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = unsafe { Utf16Str::from_slice_unchecked(&sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` fn from_slice_unchecked() -> {} /// Converts a mutable slice to a mutable string slice without checking that the string contains /// valid UTF-16. /// /// See the safe version, [`from_slice_mut`][Self::from_slice_mut], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the slice passed to it is valid /// UTF-16. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf16Str`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::Utf16Str; /// /// let mut sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = unsafe { Utf16Str::from_slice_unchecked_mut(&mut sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` fn from_slice_unchecked_mut() -> {} /// Converts a boxed slice to a boxed string slice without checking that the string contains /// valid UTF-16. /// /// # Safety /// /// This function is unsafe because it does not check if the string slice is valid UTF-16, and /// [`Utf16Str`] must always be valid UTF-16. fn from_boxed_slice_unchecked() -> {} /// Returns an unchecked subslice of this string slice. /// /// This is the unchecked alternative to indexing the string slice. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice; /// - Indexes must lie on UTF-16 sequence boundaries. /// /// Failing that, the returned string slice may reference invalid memory or violate the /// invariants communicated by the type. /// /// # Examples /// /// ``` /// # use widestring::{utf16str}; /// let v = utf16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); /// unsafe { /// assert_eq!(utf16str!("โšง๏ธ"), v.get_unchecked(..2)); /// assert_eq!(utf16str!("๐Ÿณ๏ธโ€โšง๏ธ"), v.get_unchecked(2..8)); /// assert_eq!(utf16str!("โžก๏ธ"), v.get_unchecked(8..10)); /// assert_eq!(utf16str!("s"), v.get_unchecked(10..)); /// } /// ``` fn get_unchecked() -> {} /// Returns a mutable, unchecked subslice of this string slice /// /// This is the unchecked alternative to indexing the string slice. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice; /// - Indexes must lie on UTF-16 sequence boundaries. /// /// Failing that, the returned string slice may reference invalid memory or violate the /// invariants communicated by the type. /// /// # Examples /// /// ``` /// # use widestring::{utf16str}; /// # #[cfg(feature = "alloc")] { /// let mut v = utf16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs").to_owned(); /// unsafe { /// assert_eq!(utf16str!("โšง๏ธ"), v.get_unchecked_mut(..2)); /// assert_eq!(utf16str!("๐Ÿณ๏ธโ€โšง๏ธ"), v.get_unchecked_mut(2..8)); /// assert_eq!(utf16str!("โžก๏ธ"), v.get_unchecked_mut(8..10)); /// assert_eq!(utf16str!("s"), v.get_unchecked_mut(10..)); /// } /// # } /// ``` fn get_unchecked_mut() -> {} /// Returns the length of `self`. /// /// This length is in `u16` values, not [`char`]s or graphemes. In other words, it may not be /// what human considers the length of the string. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// assert_eq!(utf16str!("foo").len(), 3); /// /// let complex = utf16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); /// assert_eq!(complex.len(), 11); /// assert_eq!(complex.chars().count(), 10); /// ``` fn len() -> {} } utfstr_common_impl! { /// UTF-32 string slice for [`Utf32String`][crate::Utf32String]. /// /// [`Utf32Str`] is to [`Utf32String`][crate::Utf32String] as [`str`] is to [`String`]. /// /// [`Utf32Str`] slices are string slices that are always valid UTF-32 encoding. This is unlike /// the [`U32Str`][crate::U16Str] string slices, which may not have valid encoding. In this way, /// [`Utf32Str`] string slices most resemble native [`str`] slices of all the types in this /// crate. /// /// # Examples /// /// The easiest way to use [`Utf32Str`] is with the [`utf32str!`][crate::utf32str] macro to /// convert string literals into string slices at compile time: /// /// ``` /// use widestring::utf32str; /// let hello = utf32str!("Hello, world!"); /// ``` /// /// You can also convert a [`u32`] slice directly, provided it is valid UTF-32: /// /// ``` /// use widestring::Utf32Str; /// /// let sparkle_heart = [0x1f496]; /// let sparkle_heart = Utf32Str::from_slice(&sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// Since [`char`] slices are valid UTF-32, a slice of [`char`]s can be easily converted to a /// string slice: /// /// ``` /// use widestring::Utf32Str; /// /// let sparkle_heart = ['๐Ÿ’–'; 3]; /// let sparkle_heart = Utf32Str::from_char_slice(&sparkle_heart); /// /// assert_eq!("๐Ÿ’–๐Ÿ’–๐Ÿ’–", sparkle_heart); /// ``` struct Utf32Str([u32]); type UtfString = Utf32String; type UStr = U32Str; type UCStr = U32CStr; type UtfError = Utf32Error; /// Converts a slice to a string slice without checking that the string contains valid UTF-32. /// /// See the safe version, [`from_slice`][Self::from_slice], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the slice passed to it is valid /// UTF-32. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf32Str`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::Utf32Str; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = unsafe { Utf32Str::from_slice_unchecked(&sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` fn from_slice_unchecked() -> {} /// Converts a mutable slice to a mutable string slice without checking that the string contains /// valid UTF-32. /// /// See the safe version, [`from_slice_mut`][Self::from_slice_mut], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the slice passed to it is valid /// UTF-32. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf32Str`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::Utf32Str; /// /// let mut sparkle_heart = vec![0x1f496]; /// let sparkle_heart = unsafe { Utf32Str::from_slice_unchecked_mut(&mut sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` fn from_slice_unchecked_mut() -> {} /// Converts a boxed slice to a boxed string slice without checking that the string contains /// valid UTF-32. /// /// # Safety /// /// This function is unsafe because it does not check if the string slice is valid UTF-32, and /// [`Utf32Str`] must always be valid UTF-32. fn from_boxed_slice_unchecked() -> {} /// Returns an unchecked subslice of this string slice. /// /// This is the unchecked alternative to indexing the string slice. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice; /// /// Failing that, the returned string slice may reference invalid memory or violate the /// invariants communicated by the type. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// let v = utf32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); /// unsafe { /// assert_eq!(utf32str!("โšง๏ธ"), v.get_unchecked(..2)); /// assert_eq!(utf32str!("๐Ÿณ๏ธโ€โšง๏ธ"), v.get_unchecked(2..7)); /// assert_eq!(utf32str!("โžก๏ธ"), v.get_unchecked(7..9)); /// assert_eq!(utf32str!("s"), v.get_unchecked(9..)) /// } /// ``` fn get_unchecked() -> {} /// Returns a mutable, unchecked subslice of this string slice /// /// This is the unchecked alternative to indexing the string slice. /// /// # Safety /// /// Callers of this function are responsible that these preconditions are satisfied: /// /// - The starting index must not exceed the ending index; /// - Indexes must be within bounds of the original slice; /// /// Failing that, the returned string slice may reference invalid memory or violate the /// invariants communicated by the type. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// # #[cfg(feature = "alloc")] { /// let mut v = utf32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs").to_owned(); /// unsafe { /// assert_eq!(utf32str!("โšง๏ธ"), v.get_unchecked_mut(..2)); /// assert_eq!(utf32str!("๐Ÿณ๏ธโ€โšง๏ธ"), v.get_unchecked_mut(2..7)); /// assert_eq!(utf32str!("โžก๏ธ"), v.get_unchecked_mut(7..9)); /// assert_eq!(utf32str!("s"), v.get_unchecked_mut(9..)) /// } /// # } /// ``` fn get_unchecked_mut() -> {} /// Returns the length of `self`. /// /// This length is in the number of [`char`]s in the slice, not graphemes. In other words, it /// may not be what human considers the length of the string. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// assert_eq!(utf32str!("foo").len(), 3); /// /// let complex = utf32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); /// assert_eq!(complex.len(), 10); /// assert_eq!(complex.chars().count(), 10); /// ``` fn len() -> {} } impl Utf16Str { /// Converts a slice of UTF-16 data to a string slice. /// /// Not all slices of [`u16`] values are valid to convert, since [`Utf16Str`] requires that it /// is always valid UTF-16. This function checks to ensure that the values are valid UTF-16, and /// then does the conversion. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_slice_unchecked`][Self::from_slice_unchecked], which has the same behavior but skips /// the check. /// /// If you need an owned string, consider using [`Utf16String::from_vec`] instead. /// /// Because you can stack-allocate a `[u16; N]`, this function is one way to have a /// stack-allocated string. Indeed, the [`utf16str!`][crate::utf16str] macro does exactly this /// after converting from UTF-8 to UTF-16. /// /// # Errors /// /// Returns an error if the slice is not UTF-16 with a description as to why the provided slice /// is not UTF-16. /// /// # Examples /// /// ``` /// use widestring::Utf16Str; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = Utf16Str::from_slice(&sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::Utf16Str; /// /// let sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// /// assert!(Utf16Str::from_slice(&sparkle_heart).is_err()); /// ``` pub fn from_slice(s: &[u16]) -> Result<&Self, Utf16Error> { validate_utf16(s)?; // SAFETY: Just validated Ok(unsafe { Self::from_slice_unchecked(s) }) } /// Converts a mutable slice of UTF-16 data to a mutable string slice. /// /// Not all slices of [`u16`] values are valid to convert, since [`Utf16Str`] requires that it /// is always valid UTF-16. This function checks to ensure that the values are valid UTF-16, and /// then does the conversion. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_slice_unchecked_mut`][Self::from_slice_unchecked_mut], which has the same behavior /// but skips the check. /// /// If you need an owned string, consider using [`Utf16String::from_vec`] instead. /// /// Because you can stack-allocate a `[u16; N]`, this function is one way to have a /// stack-allocated string. Indeed, the [`utf16str!`][crate::utf16str] macro does exactly this /// after converting from UTF-8 to UTF-16. /// /// # Errors /// /// Returns an error if the slice is not UTF-16 with a description as to why the provided slice /// is not UTF-16. /// /// # Examples /// /// ``` /// use widestring::Utf16Str; /// /// let mut sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = Utf16Str::from_slice_mut(&mut sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::Utf16Str; /// /// let mut sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// /// assert!(Utf16Str::from_slice_mut(&mut sparkle_heart).is_err()); /// ``` pub fn from_slice_mut(s: &mut [u16]) -> Result<&mut Self, Utf16Error> { validate_utf16(s)?; // SAFETY: Just validated Ok(unsafe { Self::from_slice_unchecked_mut(s) }) } /// Converts a wide string slice of undefined encoding to a UTF-16 string slice without checking /// if the string slice is valid UTF-16. /// /// See the safe version, [`from_ustr`][Self::from_ustr], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-16. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf16Str`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::{Utf16Str, u16str}; /// /// let sparkle_heart = u16str!("๐Ÿ’–"); /// let sparkle_heart = unsafe { Utf16Str::from_ustr_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[must_use] pub const unsafe fn from_ustr_unchecked(s: &U16Str) -> &Self { Self::from_slice_unchecked(s.as_slice()) } /// Converts a mutable wide string slice of undefined encoding to a mutable UTF-16 string slice /// without checking if the string slice is valid UTF-16. /// /// See the safe version, [`from_ustr_mut`][Self::from_ustr_mut], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-16. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf16Str`] is always valid UTF-16. #[must_use] pub unsafe fn from_ustr_unchecked_mut(s: &mut U16Str) -> &mut Self { Self::from_slice_unchecked_mut(s.as_mut_slice()) } /// Converts a wide string slice of undefined encoding to a UTF-16 string slice. /// /// Since [`U16Str`] does not have a specified encoding, this conversion may fail if the /// [`U16Str`] does not contain valid UTF-16 data. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ustr_unchecked`][Self::from_ustr_unchecked], which has the same behavior /// but skips the check. /// /// # Errors /// /// Returns an error if the string slice is not UTF-16 with a description as to why the /// provided string slice is not UTF-16. /// /// # Examples /// /// ``` /// use widestring::{Utf16Str, u16str}; /// /// let sparkle_heart = u16str!("๐Ÿ’–"); /// let sparkle_heart = Utf16Str::from_ustr(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] pub fn from_ustr(s: &U16Str) -> Result<&Self, Utf16Error> { Self::from_slice(s.as_slice()) } /// Converts a mutable wide string slice of undefined encoding to a mutable UTF-16 string slice. /// /// Since [`U16Str`] does not have a specified encoding, this conversion may fail if the /// [`U16Str`] does not contain valid UTF-16 data. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ustr_unchecked_mut`][Self::from_ustr_unchecked_mut], which has the same behavior /// but skips the check. /// /// # Errors /// /// Returns an error if the string slice is not UTF-16 with a description as to why the /// provided string slice is not UTF-16. #[inline] pub fn from_ustr_mut(s: &mut U16Str) -> Result<&mut Self, Utf16Error> { Self::from_slice_mut(s.as_mut_slice()) } /// Converts a wide C string slice to a UTF-16 string slice without checking if the /// string slice is valid UTF-16. /// /// The resulting string slice does *not* contain the nul terminator. /// /// See the safe version, [`from_ucstr`][Self::from_ucstr], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-16. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf16Str`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::{Utf16Str, u16cstr}; /// /// let sparkle_heart = u16cstr!("๐Ÿ’–"); /// let sparkle_heart = unsafe { Utf16Str::from_ucstr_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub unsafe fn from_ucstr_unchecked(s: &crate::U16CStr) -> &Self { Self::from_slice_unchecked(s.as_slice()) } /// Converts a mutable wide C string slice to a mutable UTF-16 string slice without /// checking if the string slice is valid UTF-16. /// /// The resulting string slice does *not* contain the nul terminator. /// /// See the safe version, [`from_ucstr_mut`][Self::from_ucstr_mut], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-16. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf16Str`] is always valid UTF-16. #[inline] #[must_use] pub unsafe fn from_ucstr_unchecked_mut(s: &mut crate::U16CStr) -> &mut Self { Self::from_slice_unchecked_mut(s.as_mut_slice()) } /// Converts a wide C string slice to a UTF-16 string slice. /// /// The resulting string slice does *not* contain the nul terminator. /// /// Since [`U16CStr`][crate::U16CStr] does not have a specified encoding, this conversion may /// fail if the [`U16CStr`][crate::U16CStr] does not contain valid UTF-16 data. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ucstr_unchecked`][Self::from_ucstr_unchecked], which has the same behavior /// but skips the check. /// /// # Errors /// /// Returns an error if the string slice is not UTF-16 with a description as to why the /// provided string slice is not UTF-16. /// /// # Examples /// /// ``` /// use widestring::{Utf16Str, u16cstr}; /// /// let sparkle_heart = u16cstr!("๐Ÿ’–"); /// let sparkle_heart = Utf16Str::from_ucstr(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] pub fn from_ucstr(s: &crate::U16CStr) -> Result<&Self, Utf16Error> { Self::from_slice(s.as_slice()) } /// Converts a mutable wide C string slice to a mutable UTF-16 string slice. /// /// The resulting string slice does *not* contain the nul terminator. /// /// Since [`U16CStr`][crate::U16CStr] does not have a specified encoding, this conversion may /// fail if the [`U16CStr`][crate::U16CStr] does not contain valid UTF-16 data. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ucstr_unchecked_mut`][Self::from_ucstr_unchecked_mut], which has the same behavior /// but skips the check. /// /// # Safety /// /// This method is unsafe because you can violate the invariants of [`U16CStr`][crate::U16CStr] /// when mutating the slice (i.e. by adding interior nul values). /// /// # Errors /// /// Returns an error if the string slice is not UTF-16 with a description as to why the /// provided string slice is not UTF-16. #[inline] pub unsafe fn from_ucstr_mut(s: &mut crate::U16CStr) -> Result<&mut Self, Utf16Error> { Self::from_slice_mut(s.as_mut_slice()) } /// Converts to a standard UTF-8 [`String`]. /// /// Because this string is always valid UTF-16, the conversion is lossless and non-fallible. #[inline] #[allow(clippy::inherent_to_string_shadow_display)] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_string(&self) -> String { String::from_utf16(self.as_slice()).unwrap() } /// Checks that `index`-th value is the value in a UTF-16 code point sequence or the end of the /// string. /// /// Returns `true` if the value at `index` is not a UTF-16 surrogate value, or if the value at /// `index` is the first value of a surrogate pair (the "high" surrogate). Returns `false` if /// the value at `index` is the second value of a surrogate pair (a.k.a the "low" surrogate). /// /// The start and end of the string (when `index == self.len()`) are considered to be /// boundaries. /// /// Returns `false` if `index is greater than `self.len()`. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// let s = utf16str!("Sparkle ๐Ÿ’– Heart"); /// assert!(s.is_char_boundary(0)); /// /// // high surrogate of `๐Ÿ’–` /// assert!(s.is_char_boundary(8)); /// // low surrogate of `๐Ÿ’–` /// assert!(!s.is_char_boundary(9)); /// /// assert!(s.is_char_boundary(s.len())); /// ``` #[inline] #[must_use] pub const fn is_char_boundary(&self, index: usize) -> bool { if index > self.len() { false } else if index == self.len() { true } else { !is_utf16_low_surrogate(self.inner[index]) } } /// Returns a subslice of this string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] whenever /// equivalent indexing operation would panic. /// /// # Examples /// /// ``` /// # use widestring::{utf16str}; /// let v = utf16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); /// /// assert_eq!(Some(utf16str!("โšง๏ธ")), v.get(..2)); /// assert_eq!(Some(utf16str!("๐Ÿณ๏ธโ€โšง๏ธ")), v.get(2..8)); /// assert_eq!(Some(utf16str!("โžก๏ธ")), v.get(8..10)); /// assert_eq!(Some(utf16str!("s")), v.get(10..)); /// /// assert!(v.get(3..4).is_none()); /// ``` #[inline] #[must_use] pub fn get(&self, index: I) -> Option<&Self> where I: RangeBounds + SliceIndex<[u16], Output = [u16]>, { // TODO: Use SliceIndex directly when it is stabilized let range = crate::range_check(index, ..self.len())?; if !self.is_char_boundary(range.start) || !self.is_char_boundary(range.end) { return None; } // SAFETY: range_check verified bounds, and we just verified char boundaries Some(unsafe { self.get_unchecked(range) }) } /// Returns a mutable subslice of this string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] whenever /// equivalent indexing operation would panic. /// /// # Examples /// /// ``` /// # use widestring::{utf16str}; /// # #[cfg(feature = "alloc")] { /// let mut v = utf16str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs").to_owned(); /// /// assert_eq!(utf16str!("โšง๏ธ"), v.get_mut(..2).unwrap()); /// assert_eq!(utf16str!("๐Ÿณ๏ธโ€โšง๏ธ"), v.get_mut(2..8).unwrap()); /// assert_eq!(utf16str!("โžก๏ธ"), v.get_mut(8..10).unwrap()); /// assert_eq!(utf16str!("s"), v.get_mut(10..).unwrap()); /// /// assert!(v.get_mut(3..4).is_none()); /// # } /// ``` #[inline] #[must_use] pub fn get_mut(&mut self, index: I) -> Option<&mut Self> where I: RangeBounds + SliceIndex<[u16], Output = [u16]>, { // TODO: Use SliceIndex directly when it is stabilized let range = crate::range_check(index, ..self.len())?; if !self.is_char_boundary(range.start) || !self.is_char_boundary(range.end) { return None; } // SAFETY: range_check verified bounds, and we just verified char boundaries Some(unsafe { self.get_unchecked_mut(range) }) } /// Divide one string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. It must also be on /// the boundary of a UTF-16 code point. /// /// The two slices returned go from the start of the string slice to `mid`, and from `mid` to /// the end of the string slice. /// /// To get mutable string slices instead, see the [`split_at_mut`][Self::split_at_mut] method. /// /// # Panics /// /// Panics if `mid` is not on a UTF-16 code point boundary, or if it is past the end of the last /// code point of the string slice. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// let s = utf16str!("Per Martin-Lรถf"); /// /// let (first, last) = s.split_at(3); /// /// assert_eq!("Per", first); /// assert_eq!(" Martin-Lรถf", last); /// ``` #[inline] #[must_use] pub fn split_at(&self, mid: usize) -> (&Self, &Self) { assert!(self.is_char_boundary(mid)); let (a, b) = self.inner.split_at(mid); unsafe { (Self::from_slice_unchecked(a), Self::from_slice_unchecked(b)) } } /// Divide one mutable string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. It must also be on /// the boundary of a UTF-16 code point. /// /// The two slices returned go from the start of the string slice to `mid`, and from `mid` to /// the end of the string slice. /// /// To get immutable string slices instead, see the [`split_at`][Self::split_at] method. /// /// # Panics /// /// Panics if `mid` is not on a UTF-16 code point boundary, or if it is past the end of the last /// code point of the string slice. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// # #[cfg(feature = "alloc")] { /// let mut s = utf16str!("Per Martin-Lรถf").to_owned(); /// /// let (first, last) = s.split_at_mut(3); /// /// assert_eq!("Per", first); /// assert_eq!(" Martin-Lรถf", last); /// # } /// ``` #[inline] #[must_use] pub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self) { assert!(self.is_char_boundary(mid)); let (a, b) = self.inner.split_at_mut(mid); unsafe { ( Self::from_slice_unchecked_mut(a), Self::from_slice_unchecked_mut(b), ) } } /// Returns an iterator over the [`char`]s of a string slice. /// /// As this string slice consists of valid UTF-16, we can iterate through a string slice by /// [`char`]. This method returns such an iterator. /// /// It's important to remember that [`char`] represents a Unicode Scalar Value, and might not /// match your idea of what a 'character' is. Iteration over grapheme clusters may be what you /// actually want. This functionality is not provided by this crate. #[inline] #[must_use] pub fn chars(&self) -> CharsUtf16<'_> { CharsUtf16::new(self.as_slice()) } /// Returns an iterator over the [`char`]s of a string slice and their positions. /// /// As this string slice consists of valid UTF-16, we can iterate through a string slice by /// [`char`]. This method returns an iterator of both these [`char`]s as well as their offsets. /// /// The iterator yields tuples. The position is first, the [`char`] is second. #[inline] #[must_use] pub fn char_indices(&self) -> CharIndicesUtf16<'_> { CharIndicesUtf16::new(self.as_slice()) } /// An iterator over the [`u16`] code units of a string slice. /// /// As a UTF-16 string slice consists of a sequence of [`u16`] code units, we can iterate /// through a string slice by each code unit. This method returns such an iterator. #[must_use] pub fn code_units(&self) -> CodeUnits<'_> { CodeUnits::new(self.as_slice()) } /// Returns an iterator of bytes over the string encoded as UTF-8. #[must_use] pub fn encode_utf8(&self) -> EncodeUtf8> { crate::encode_utf8(self.chars()) } /// Returns an iterator of [`u32`] over the sting encoded as UTF-32. #[must_use] pub fn encode_utf32(&self) -> EncodeUtf32> { crate::encode_utf32(self.chars()) } /// Returns an iterator that escapes each [`char`] in `self` with [`char::escape_debug`]. #[inline] #[must_use] pub fn escape_debug(&self) -> EscapeDebug> { EscapeDebug::::new(self.as_slice()) } /// Returns an iterator that escapes each [`char`] in `self` with [`char::escape_default`]. #[inline] #[must_use] pub fn escape_default(&self) -> EscapeDefault> { EscapeDefault::::new(self.as_slice()) } /// Returns an iterator that escapes each [`char`] in `self` with [`char::escape_unicode`]. #[inline] #[must_use] pub fn escape_unicode(&self) -> EscapeUnicode> { EscapeUnicode::::new(self.as_slice()) } /// Returns the lowercase equivalent of this string slice, as a new [`Utf16String`]. /// /// 'Lowercase' is defined according to the terms of the Unicode Derived Core Property /// `Lowercase`. /// /// Since some characters can expand into multiple characters when changing the case, this /// function returns a [`Utf16String`] instead of modifying the parameter in-place. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_lowercase(&self) -> Utf16String { let mut s = Utf16String::with_capacity(self.len()); for c in self.chars() { for lower in c.to_lowercase() { s.push(lower); } } s } /// Returns the uppercase equivalent of this string slice, as a new [`Utf16String`]. /// /// 'Uppercase' is defined according to the terms of the Unicode Derived Core Property /// `Uppercase`. /// /// Since some characters can expand into multiple characters when changing the case, this /// function returns a [`Utf16String`] instead of modifying the parameter in-place. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_uppercase(&self) -> Utf16String { let mut s = Utf16String::with_capacity(self.len()); for c in self.chars() { for lower in c.to_uppercase() { s.push(lower); } } s } } impl Utf32Str { /// Converts a slice of UTF-32 data to a string slice. /// /// Not all slices of [`u32`] values are valid to convert, since [`Utf32Str`] requires that it /// is always valid UTF-32. This function checks to ensure that the values are valid UTF-32, and /// then does the conversion. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_slice_unchecked`][Self::from_slice_unchecked], which has the same behavior but skips /// the check. /// /// If you need an owned string, consider using [`Utf32String::from_vec`] instead. /// /// Because you can stack-allocate a `[u32; N]`, this function is one way to have a /// stack-allocated string. Indeed, the [`utf32str!`][crate::utf32str] macro does exactly this /// after converting from UTF-8 to UTF-32. /// /// # Errors /// /// Returns an error if the slice is not UTF-32 with a description as to why the provided slice /// is not UTF-32. /// /// # Examples /// /// ``` /// use widestring::Utf32Str; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = Utf32Str::from_slice(&sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::Utf32Str; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// /// assert!(Utf32Str::from_slice(&sparkle_heart).is_err()); /// ``` pub fn from_slice(s: &[u32]) -> Result<&Self, Utf32Error> { validate_utf32(s)?; // SAFETY: Just validated Ok(unsafe { Self::from_slice_unchecked(s) }) } /// Converts a mutable slice of UTF-32 data to a mutable string slice. /// /// Not all slices of [`u32`] values are valid to convert, since [`Utf32Str`] requires that it /// is always valid UTF-32. This function checks to ensure that the values are valid UTF-32, and /// then does the conversion. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_slice_unchecked_mut`][Self::from_slice_unchecked_mut], which has the same behavior /// but skips the check. /// /// If you need an owned string, consider using [`Utf32String::from_vec`] instead. /// /// Because you can stack-allocate a `[u32; N]`, this function is one way to have a /// stack-allocated string. Indeed, the [`utf32str!`][crate::utf32str] macro does exactly this /// after converting from UTF-8 to UTF-32. /// /// # Errors /// /// Returns an error if the slice is not UTF-32 with a description as to why the provided slice /// is not UTF-32. /// /// # Examples /// /// ``` /// use widestring::Utf32Str; /// /// let mut sparkle_heart = vec![0x1f496]; /// let sparkle_heart = Utf32Str::from_slice_mut(&mut sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::Utf32Str; /// /// let mut sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// /// assert!(Utf32Str::from_slice_mut(&mut sparkle_heart).is_err()); /// ``` pub fn from_slice_mut(s: &mut [u32]) -> Result<&mut Self, Utf32Error> { validate_utf32(s)?; // SAFETY: Just validated Ok(unsafe { Self::from_slice_unchecked_mut(s) }) } /// Converts a wide string slice of undefined encoding to a UTF-32 string slice without checking /// if the string slice is valid UTF-32. /// /// See the safe version, [`from_ustr`][Self::from_ustr], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-32. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf32Str`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::{Utf32Str, u32str}; /// /// let sparkle_heart = u32str!("๐Ÿ’–"); /// let sparkle_heart = unsafe { Utf32Str::from_ustr_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub const unsafe fn from_ustr_unchecked(s: &crate::U32Str) -> &Self { Self::from_slice_unchecked(s.as_slice()) } /// Converts a mutable wide string slice of undefined encoding to a mutable UTF-32 string slice /// without checking if the string slice is valid UTF-32. /// /// See the safe version, [`from_ustr_mut`][Self::from_ustr_mut], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-32. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf32Str`] is always valid UTF-32. #[inline] #[must_use] pub unsafe fn from_ustr_unchecked_mut(s: &mut crate::U32Str) -> &mut Self { Self::from_slice_unchecked_mut(s.as_mut_slice()) } /// Converts a wide string slice of undefined encoding to a UTF-32 string slice. /// /// Since [`U32Str`] does not have a specified encoding, this conversion may fail if the /// [`U32Str`] does not contain valid UTF-32 data. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ustr_unchecked`][Self::from_ustr_unchecked], which has the same behavior /// but skips the check. /// /// # Errors /// /// Returns an error if the string slice is not UTF-32 with a description as to why the /// provided string slice is not UTF-32. /// /// # Examples /// /// ``` /// use widestring::{Utf32Str, u32str}; /// /// let sparkle_heart = u32str!("๐Ÿ’–"); /// let sparkle_heart = Utf32Str::from_ustr(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] pub fn from_ustr(s: &crate::U32Str) -> Result<&Self, Utf32Error> { Self::from_slice(s.as_slice()) } /// Converts a mutable wide string slice of undefined encoding to a mutable UTF-32 string slice. /// /// Since [`U32Str`] does not have a specified encoding, this conversion may fail if the /// [`U32Str`] does not contain valid UTF-32 data. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ustr_unchecked_mut`][Self::from_ustr_unchecked_mut], which has the same behavior /// but skips the check. /// /// # Errors /// /// Returns an error if the string slice is not UTF-32 with a description as to why the /// provided string slice is not UTF-32. #[inline] pub fn from_ustr_mut(s: &mut crate::U32Str) -> Result<&mut Self, Utf32Error> { Self::from_slice_mut(s.as_mut_slice()) } /// Converts a wide C string slice to a UTF-32 string slice without checking if the /// string slice is valid UTF-32. /// /// The resulting string slice does *not* contain the nul terminator. /// /// See the safe version, [`from_ucstr`][Self::from_ucstr], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-32. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf32Str`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::{Utf32Str, u32cstr}; /// /// let sparkle_heart = u32cstr!("๐Ÿ’–"); /// let sparkle_heart = unsafe { Utf32Str::from_ucstr_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub unsafe fn from_ucstr_unchecked(s: &crate::U32CStr) -> &Self { Self::from_slice_unchecked(s.as_slice()) } /// Converts a mutable wide C string slice to a mutable UTF-32 string slice without /// checking if the string slice is valid UTF-32. /// /// The resulting string slice does *not* contain the nul terminator. /// /// See the safe version, [`from_ucstr_mut`][Self::from_ucstr_mut], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string slice passed to it is /// valid UTF-32. If this constraint is violated, undefined behavior results as it is assumed /// the [`Utf32Str`] is always valid UTF-32. #[inline] #[must_use] pub unsafe fn from_ucstr_unchecked_mut(s: &mut crate::U32CStr) -> &mut Self { Self::from_slice_unchecked_mut(s.as_mut_slice()) } /// Converts a wide C string slice to a UTF-32 string slice. /// /// The resulting string slice does *not* contain the nul terminator. /// /// Since [`U32CStr`][crate::U32CStr] does not have a specified encoding, this conversion may /// fail if the [`U32CStr`][crate::U32CStr] does not contain valid UTF-32 data. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ucstr_unchecked`][Self::from_ucstr_unchecked], which has the same behavior /// but skips the check. /// /// # Errors /// /// Returns an error if the string slice is not UTF-32 with a description as to why the /// provided string slice is not UTF-32. /// /// # Examples /// /// ``` /// use widestring::{Utf32Str, u32cstr}; /// /// let sparkle_heart = u32cstr!("๐Ÿ’–"); /// let sparkle_heart = Utf32Str::from_ucstr(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] pub fn from_ucstr(s: &crate::U32CStr) -> Result<&Self, Utf32Error> { Self::from_slice(s.as_slice()) } /// Converts a mutable wide C string slice to a mutable UTF-32 string slice. /// /// The resulting string slice does *not* contain the nul terminator. /// /// Since [`U32CStr`][crate::U32CStr] does not have a specified encoding, this conversion may /// fail if the [`U32CStr`][crate::U32CStr] does not contain valid UTF-32 data. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ucstr_unchecked_mut`][Self::from_ucstr_unchecked_mut], which has the same behavior /// but skips the check. /// /// # Safety /// /// This method is unsafe because you can violate the invariants of [`U16CStr`][crate::U16CStr] /// when mutating the slice (i.e. by adding interior nul values). /// /// # Errors /// /// Returns an error if the string slice is not UTF-32 with a description as to why the /// provided string slice is not UTF-32. #[inline] pub unsafe fn from_ucstr_mut(s: &mut crate::U32CStr) -> Result<&mut Self, Utf32Error> { Self::from_slice_mut(s.as_mut_slice()) } /// Converts a slice of [`char`]s to a string slice. /// /// Since [`char`] slices are always valid UTF-32, this conversion always suceeds. /// /// If you need an owned string, consider using [`Utf32String::from_chars`] instead. /// /// # Examples /// /// ``` /// use widestring::Utf32Str; /// /// let sparkle_heart = ['๐Ÿ’–']; /// let sparkle_heart = Utf32Str::from_char_slice(&sparkle_heart); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[allow(trivial_casts)] #[inline] #[must_use] pub const fn from_char_slice(s: &[char]) -> &Self { // SAFETY: char slice is always valid UTF-32 unsafe { Self::from_slice_unchecked(&*(s as *const [char] as *const [u32])) } } /// Converts a mutable slice of [`char`]s to a string slice. /// /// Since [`char`] slices are always valid UTF-32, this conversion always suceeds. /// /// If you need an owned string, consider using [`Utf32String::from_chars`] instead. /// /// # Examples /// /// ``` /// use widestring::Utf32Str; /// /// let mut sparkle_heart = ['๐Ÿ’–']; /// let sparkle_heart = Utf32Str::from_char_slice_mut(&mut sparkle_heart); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[allow(trivial_casts)] #[inline] #[must_use] pub fn from_char_slice_mut(s: &mut [char]) -> &mut Self { // SAFETY: char slice is always valid UTF-32 unsafe { Self::from_slice_unchecked_mut(&mut *(s as *mut [char] as *mut [u32])) } } /// Converts a string slice into a slice of [`char`]s. #[allow(trivial_casts)] #[inline] #[must_use] pub const fn as_char_slice(&self) -> &[char] { // SAFETY: Self should be valid UTF-32 so chars will be in range unsafe { &*(self.as_slice() as *const [u32] as *const [char]) } } /// Converts a mutable string slice into a mutable slice of [`char`]s. #[allow(trivial_casts)] #[inline] #[must_use] pub fn as_char_slice_mut(&mut self) -> &mut [char] { // SAFETY: Self should be valid UTF-32 so chars will be in range unsafe { &mut *(self.as_mut_slice() as *mut [u32] as *mut [char]) } } /// Converts to a standard UTF-8 [`String`]. /// /// Because this string is always valid UTF-32, the conversion is lossless and non-fallible. #[inline] #[allow(clippy::inherent_to_string_shadow_display)] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_string(&self) -> String { let mut s = String::with_capacity(self.len()); s.extend(self.as_char_slice()); s } /// Returns a subslice of this string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] whenever /// equivalent indexing operation would panic. /// /// # Examples /// /// ``` /// # use widestring::{utf32str}; /// let v = utf32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs"); /// /// assert_eq!(Some(utf32str!("โšง๏ธ")), v.get(..2)); /// assert_eq!(Some(utf32str!("๐Ÿณ๏ธโ€โšง๏ธ")), v.get(2..7)); /// assert_eq!(Some(utf32str!("โžก๏ธ")), v.get(7..9)); /// assert_eq!(Some(utf32str!("s")), v.get(9..)); /// ``` #[inline] #[must_use] pub fn get(&self, index: I) -> Option<&Self> where I: SliceIndex<[u32], Output = [u32]>, { // TODO: Use SliceIndex directly when it is stabilized // SAFETY: subslice has already been verified self.inner .get(index) .map(|s| unsafe { Self::from_slice_unchecked(s) }) } /// Returns a mutable subslice of this string. /// /// This is the non-panicking alternative to indexing the string. Returns [`None`] whenever /// equivalent indexing operation would panic. /// /// # Examples /// /// ``` /// # use widestring::{utf32str}; /// # #[cfg(feature = "alloc")] { /// let mut v = utf32str!("โšง๏ธ๐Ÿณ๏ธโ€โšง๏ธโžก๏ธs").to_owned(); /// /// assert_eq!(utf32str!("โšง๏ธ"), v.get_mut(..2).unwrap()); /// assert_eq!(utf32str!("๐Ÿณ๏ธโ€โšง๏ธ"), v.get_mut(2..7).unwrap()); /// assert_eq!(utf32str!("โžก๏ธ"), v.get_mut(7..9).unwrap()); /// assert_eq!(utf32str!("s"), v.get_mut(9..).unwrap()); /// # } /// ``` #[inline] #[must_use] pub fn get_mut(&mut self, index: I) -> Option<&mut Self> where I: SliceIndex<[u32], Output = [u32]>, { // TODO: Use SliceIndex directly when it is stabilized // SAFETY: subslice has already been verified self.inner .get_mut(index) .map(|s| unsafe { Self::from_slice_unchecked_mut(s) }) } /// Divide one string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. /// /// The two slices returned go from the start of the string slice to `mid`, and from `mid` to /// the end of the string slice. /// /// To get mutable string slices instead, see the [`split_at_mut`][Self::split_at_mut] method. /// /// # Panics /// /// Panics if `mid` is past the end of the last code point of the string slice. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// let s = utf32str!("Per Martin-Lรถf"); /// /// let (first, last) = s.split_at(3); /// /// assert_eq!("Per", first); /// assert_eq!(" Martin-Lรถf", last); /// ``` #[inline] #[must_use] pub fn split_at(&self, mid: usize) -> (&Self, &Self) { let (a, b) = self.inner.split_at(mid); unsafe { (Self::from_slice_unchecked(a), Self::from_slice_unchecked(b)) } } /// Divide one mutable string slice into two at an index. /// /// The argument, `mid`, should be an offset from the start of the string. /// /// The two slices returned go from the start of the string slice to `mid`, and from `mid` to /// the end of the string slice. /// /// To get immutable string slices instead, see the [`split_at`][Self::split_at] method. /// /// # Panics /// /// Panics if `mid` is past the end of the last code point of the string slice. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// # #[cfg(feature = "alloc")] { /// let mut s = utf32str!("Per Martin-Lรถf").to_owned(); /// /// let (first, last) = s.split_at_mut(3); /// /// assert_eq!("Per", first); /// assert_eq!(" Martin-Lรถf", last); /// # } /// ``` #[inline] #[must_use] pub fn split_at_mut(&mut self, mid: usize) -> (&mut Self, &mut Self) { let (a, b) = self.inner.split_at_mut(mid); unsafe { ( Self::from_slice_unchecked_mut(a), Self::from_slice_unchecked_mut(b), ) } } /// Returns an iterator over the [`char`]s of a string slice. /// /// As this string slice consists of valid UTF-32, we can iterate through a string slice by /// [`char`]. This method returns such an iterator. /// /// It's important to remember that [`char`] represents a Unicode Scalar Value, and might not /// match your idea of what a 'character' is. Iteration over grapheme clusters may be what you /// actually want. This functionality is not provided by this crate. #[inline] #[must_use] pub fn chars(&self) -> CharsUtf32<'_> { CharsUtf32::new(self.as_slice()) } /// Returns an iterator over the [`char`]s of a string slice and their positions. /// /// As this string slice consists of valid UTF-32, we can iterate through a string slice by /// [`char`]. This method returns an iterator of both these [`char`]s as well as their offsets. /// /// The iterator yields tuples. The position is first, the [`char`] is second. #[inline] #[must_use] pub fn char_indices(&self) -> CharIndicesUtf32<'_> { CharIndicesUtf32::new(self.as_slice()) } /// Returns an iterator of bytes over the string encoded as UTF-8. #[must_use] pub fn encode_utf8(&self) -> EncodeUtf8> { crate::encode_utf8(self.chars()) } /// Returns an iterator of [`u16`] over the sting encoded as UTF-16. #[must_use] pub fn encode_utf16(&self) -> EncodeUtf16> { crate::encode_utf16(self.chars()) } /// Returns an iterator that escapes each [`char`] in `self` with [`char::escape_debug`]. #[inline] #[must_use] pub fn escape_debug(&self) -> EscapeDebug> { EscapeDebug::::new(self.as_slice()) } /// Returns an iterator that escapes each [`char`] in `self` with [`char::escape_default`]. #[inline] #[must_use] pub fn escape_default(&self) -> EscapeDefault> { EscapeDefault::::new(self.as_slice()) } /// Returns an iterator that escapes each [`char`] in `self` with [`char::escape_unicode`]. #[inline] #[must_use] pub fn escape_unicode(&self) -> EscapeUnicode> { EscapeUnicode::::new(self.as_slice()) } /// Returns the lowercase equivalent of this string slice, as a new [`Utf32String`]. /// /// 'Lowercase' is defined according to the terms of the Unicode Derived Core Property /// `Lowercase`. /// /// Since some characters can expand into multiple characters when changing the case, this /// function returns a [`Utf32String`] instead of modifying the parameter in-place. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_lowercase(&self) -> Utf32String { let mut s = Utf32String::with_capacity(self.len()); for c in self.chars() { for lower in c.to_lowercase() { s.push(lower); } } s } /// Returns the uppercase equivalent of this string slice, as a new [`Utf32String`]. /// /// 'Uppercase' is defined according to the terms of the Unicode Derived Core Property /// `Uppercase`. /// /// Since some characters can expand into multiple characters when changing the case, this /// function returns a [`Utf32String`] instead of modifying the parameter in-place. #[inline] #[cfg(feature = "alloc")] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] #[must_use] pub fn to_uppercase(&self) -> Utf32String { let mut s = Utf32String::with_capacity(self.len()); for c in self.chars() { for lower in c.to_uppercase() { s.push(lower); } } s } } impl AsMut<[char]> for Utf32Str { #[inline] fn as_mut(&mut self) -> &mut [char] { self.as_char_slice_mut() } } impl AsRef<[char]> for Utf32Str { #[inline] fn as_ref(&self) -> &[char] { self.as_char_slice() } } impl<'a> From<&'a [char]> for &'a Utf32Str { #[inline] fn from(value: &'a [char]) -> Self { Utf32Str::from_char_slice(value) } } impl<'a> From<&'a mut [char]> for &'a mut Utf32Str { #[inline] fn from(value: &'a mut [char]) -> Self { Utf32Str::from_char_slice_mut(value) } } impl<'a> From<&'a Utf32Str> for &'a [char] { #[inline] fn from(value: &'a Utf32Str) -> Self { value.as_char_slice() } } impl<'a> From<&'a mut Utf32Str> for &'a mut [char] { #[inline] fn from(value: &'a mut Utf32Str) -> Self { value.as_char_slice_mut() } } impl Index for Utf16Str where I: RangeBounds + SliceIndex<[u16], Output = [u16]>, { type Output = Utf16Str; #[inline] fn index(&self, index: I) -> &Self::Output { self.get(index) .expect("index out of bounds or not on char boundary") } } impl Index for Utf32Str where I: SliceIndex<[u32], Output = [u32]>, { type Output = Utf32Str; #[inline] fn index(&self, index: I) -> &Self::Output { self.get(index).expect("index out of bounds") } } impl IndexMut for Utf16Str where I: RangeBounds + SliceIndex<[u16], Output = [u16]>, { #[inline] fn index_mut(&mut self, index: I) -> &mut Self::Output { self.get_mut(index) .expect("index out of bounds or not on char boundary") } } impl IndexMut for Utf32Str where I: SliceIndex<[u32], Output = [u32]>, { #[inline] fn index_mut(&mut self, index: I) -> &mut Self::Output { self.get_mut(index).expect("index out of bounds") } } impl PartialEq<[char]> for Utf32Str { #[inline] fn eq(&self, other: &[char]) -> bool { self.as_char_slice() == other } } impl PartialEq for [char] { #[inline] fn eq(&self, other: &Utf32Str) -> bool { self == other.as_char_slice() } } impl PartialEq for Utf32Str { #[inline] fn eq(&self, other: &Utf16Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for Utf16Str { #[inline] fn eq(&self, other: &Utf32Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<&Utf16Str> for Utf32Str { #[inline] fn eq(&self, other: &&Utf16Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<&Utf32Str> for Utf16Str { #[inline] fn eq(&self, other: &&Utf32Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for &Utf32Str { #[inline] fn eq(&self, other: &Utf16Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for &Utf16Str { #[inline] fn eq(&self, other: &Utf32Str) -> bool { self.chars().eq(other.chars()) } } impl<'a> TryFrom<&'a [u16]> for &'a Utf16Str { type Error = Utf16Error; #[inline] fn try_from(value: &'a [u16]) -> Result { Utf16Str::from_slice(value) } } impl<'a> TryFrom<&'a mut [u16]> for &'a mut Utf16Str { type Error = Utf16Error; #[inline] fn try_from(value: &'a mut [u16]) -> Result { Utf16Str::from_slice_mut(value) } } impl<'a> TryFrom<&'a [u32]> for &'a Utf32Str { type Error = Utf32Error; #[inline] fn try_from(value: &'a [u32]) -> Result { Utf32Str::from_slice(value) } } impl<'a> TryFrom<&'a mut [u32]> for &'a mut Utf32Str { type Error = Utf32Error; #[inline] fn try_from(value: &'a mut [u32]) -> Result { Utf32Str::from_slice_mut(value) } } /// Alias for [`Utf16Str`] or [`Utf32Str`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(not(windows))] pub type WideUtfStr = Utf32Str; /// Alias for [`Utf16Str`] or [`Utf32Str`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(windows)] pub type WideUtfStr = Utf16Str; #[cfg(test)] mod test { use crate::*; #[test] fn utf16_trim() { let s = utf16str!(" Hello\tworld\t"); assert_eq!(utf16str!("Hello\tworld\t"), s.trim_start()); let s = utf16str!(" English "); assert!(Some('E') == s.trim_start().chars().next()); let s = utf16str!(" ืขื‘ืจื™ืช "); assert!(Some('ืข') == s.trim_start().chars().next()); } #[test] fn utf32_trim() { let s = utf32str!(" Hello\tworld\t"); assert_eq!(utf32str!("Hello\tworld\t"), s.trim_start()); let s = utf32str!(" English "); assert!(Some('E') == s.trim_start().chars().next()); let s = utf32str!(" ืขื‘ืจื™ืช "); assert!(Some('ืข') == s.trim_start().chars().next()); } } widestring-1.1.0/src/utfstring/iter.rs000064400000000000000000000070301046102023000161270ustar 00000000000000use super::{Utf16String, Utf32String}; use crate::utfstr::{CharsUtf16, CharsUtf32}; #[allow(unused_imports)] use core::iter::{DoubleEndedIterator, ExactSizeIterator, FusedIterator, Iterator}; /// A draining iterator for [`Utf16String`]. /// /// This struct is created by the [`drain`][Utf16String::drain] method on [`Utf16String`]. See its /// documentation for more. pub struct DrainUtf16<'a> { pub(super) start: usize, pub(super) end: usize, pub(super) iter: CharsUtf16<'a>, pub(super) string: *mut Utf16String, } unsafe impl Sync for DrainUtf16<'_> {} unsafe impl Send for DrainUtf16<'_> {} impl Drop for DrainUtf16<'_> { fn drop(&mut self) { unsafe { // Use Vec::drain. "Reaffirm" the bounds checks to avoid // panic code being inserted again. let self_vec = (*self.string).as_mut_vec(); if self.start <= self.end && self.end <= self_vec.len() { self_vec.drain(self.start..self.end); } } } } impl core::fmt::Debug for DrainUtf16<'_> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(&self.iter, f) } } impl core::fmt::Display for DrainUtf16<'_> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Display::fmt(&self.iter, f) } } impl Iterator for DrainUtf16<'_> { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl DoubleEndedIterator for DrainUtf16<'_> { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back() } } impl FusedIterator for DrainUtf16<'_> {} /// A draining iterator for [`Utf32String`]. /// /// This struct is created by the [`drain`][Utf32String::drain] method on [`Utf32String`]. See its /// documentation for more. pub struct DrainUtf32<'a> { pub(super) start: usize, pub(super) end: usize, pub(super) iter: CharsUtf32<'a>, pub(super) string: *mut Utf32String, } unsafe impl Sync for DrainUtf32<'_> {} unsafe impl Send for DrainUtf32<'_> {} impl Drop for DrainUtf32<'_> { fn drop(&mut self) { unsafe { // Use Vec::drain. "Reaffirm" the bounds checks to avoid // panic code being inserted again. let self_vec = (*self.string).as_mut_vec(); if self.start <= self.end && self.end <= self_vec.len() { self_vec.drain(self.start..self.end); } } } } impl core::fmt::Debug for DrainUtf32<'_> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(&self.iter, f) } } impl core::fmt::Display for DrainUtf32<'_> { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Display::fmt(&self.iter, f) } } impl Iterator for DrainUtf32<'_> { type Item = char; #[inline] fn next(&mut self) -> Option { self.iter.next() } #[inline] fn size_hint(&self) -> (usize, Option) { self.iter.size_hint() } } impl DoubleEndedIterator for DrainUtf32<'_> { #[inline] fn next_back(&mut self) -> Option { self.iter.next_back() } } impl FusedIterator for DrainUtf32<'_> {} impl ExactSizeIterator for DrainUtf32<'_> { #[inline] fn len(&self) -> usize { self.iter.len() } } widestring-1.1.0/src/utfstring.rs000064400000000000000000002621701046102023000151740ustar 00000000000000//! Owned, growable UTF strings. //! //! This module contains UTF strings and related types. use crate::{ decode_utf16_surrogate_pair, error::{Utf16Error, Utf32Error}, is_utf16_low_surrogate, is_utf16_surrogate, validate_utf16, validate_utf16_vec, validate_utf32, validate_utf32_vec, Utf16Str, Utf32Str, }; #[allow(unused_imports)] use alloc::{ borrow::{Cow, ToOwned}, boxed::Box, string::String, vec::Vec, }; #[allow(unused_imports)] use core::{ borrow::{Borrow, BorrowMut}, convert::{AsMut, AsRef, From, Infallible, TryFrom}, fmt::Write, iter::FromIterator, mem, ops::{Add, AddAssign, Deref, DerefMut, Index, IndexMut, RangeBounds}, ptr, slice::SliceIndex, str::FromStr, }; mod iter; pub use iter::*; macro_rules! utfstring_common_impl { { $(#[$utfstring_meta:meta])* struct $utfstring:ident([$uchar:ty]); type UtfStr = $utfstr:ident; type UStr = $ustr:ident; type UCStr = $ucstr:ident; type UString = $ustring:ident; type UCString = $ucstring:ident; type UtfError = $utferror:ident; $(#[$from_vec_unchecked_meta:meta])* fn from_vec_unchecked() -> {} $(#[$from_str_meta:meta])* fn from_str() -> {} $(#[$push_utfstr_meta:meta])* fn push_utfstr() -> {} $(#[$as_mut_vec_meta:meta])* fn as_mut_vec() -> {} } => { $(#[$utfstring_meta])* #[derive(Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)] #[cfg_attr(docsrs, doc(cfg(feature = "alloc")))] pub struct $utfstring { inner: Vec<$uchar>, } impl $utfstring { /// Creates a new empty string. /// /// Given that the string is empty, this will not allocate any initial buffer. While /// that means this initial operation is very inexpensive, it may cause excessive /// allocations later when you add data. If you have an idea of how much data the /// string will hold, consider [`with_capacity`][Self::with_capacity] instead to /// prevent excessive re-allocation. #[inline] #[must_use] pub const fn new() -> Self { Self { inner: Vec::new() } } /// Creates a new empty string with a particular capacity. /// /// This string has an internal buffer to hold its data. The capacity is the length of /// that buffer, and can be queried with the [`capacity`][Self::capacity] method. This /// method creates and empty string, but one with an initial buffer that can hold /// `capacity` elements. This is useful when you may be appending a bunch of data to /// the string, reducing the number of reallocations it needs to do. /// /// If the given capacity is `0`, no allocation will occur, and this method is identical /// to the [`new`][Self::new] method. #[inline] #[must_use] pub fn with_capacity(capacity: usize) -> Self { Self { inner: Vec::with_capacity(capacity), } } $(#[$from_vec_unchecked_meta])* #[inline] #[must_use] pub unsafe fn from_vec_unchecked(v: impl Into>) -> Self { Self { inner: v.into() } } $(#[$from_str_meta])* #[inline] #[allow(clippy::should_implement_trait)] #[must_use] pub fn from_str + ?Sized>(s: &S) -> Self { let s = s.as_ref(); let mut string = Self::new(); string.extend(s.chars()); string } /// Converts a string into a string slice. #[inline] #[must_use] pub fn as_utfstr(&self) -> &$utfstr { unsafe { $utfstr::from_slice_unchecked(self.inner.as_slice()) } } /// Converts a string into a mutable string slice. #[inline] #[must_use] pub fn as_mut_utfstr(&mut self) -> &mut $utfstr { unsafe { $utfstr::from_slice_unchecked_mut(&mut self.inner) } } /// Converts this string into a wide string of undefined encoding. #[inline] #[must_use] pub fn as_ustr(&self) -> &crate::$ustr { crate::$ustr::from_slice(self.as_slice()) } /// Converts a string into a vector of its elements. /// /// This consumes the string without copying its contents. #[inline] #[must_use] pub fn into_vec(self) -> Vec<$uchar> { self.inner } $(#[$push_utfstr_meta])* #[inline] pub fn push_utfstr + ?Sized>(&mut self, string: &S) { self.inner.extend_from_slice(string.as_ref().as_slice()) } /// Returns this string's capacity, in number of elements. #[inline] #[must_use] pub fn capacity(&self) -> usize { self.inner.capacity() } /// Ensures that this string's capacity is at least `additional` elements larger than /// its length. /// /// The capacity may be increased by more than `additional` elements if it chooses, to /// prevent frequent reallocations. /// /// If you do not want this "at least" behavior, see the /// [`reserve_exact`][Self::reserve_exact] method. /// /// # Panics /// /// Panics if the new capacity overflows [`usize`]. #[inline] pub fn reserve(&mut self, additional: usize) { self.inner.reserve(additional) } /// Ensures that this string's capacity is `additional` elements larger than its length. /// /// Consider using the [`reserve`][Self::reserve] method unless you absolutely know /// better than the allocator. /// /// # Panics /// /// Panics if the new capacity overflows [`usize`]. #[inline] pub fn reserve_exact(&mut self, additional: usize) { self.inner.reserve_exact(additional) } /// Shrinks the capacity of this string to match its length. #[inline] pub fn shrink_to_fit(&mut self) { self.inner.shrink_to_fit() } /// Shrinks the capacity of this string with a lower bound. /// /// The capacity will remain at least as large as both the length and the supplied /// value. /// /// If the current capacity is less than the lower limit, this is a no-op. #[inline] pub fn shrink_to(&mut self, min_capacity: usize) { self.inner.shrink_to(min_capacity) } /// Returns a slice of this string's contents. #[inline] #[must_use] pub fn as_slice(&self) -> &[$uchar] { self.inner.as_slice() } unsafe fn insert_slice(&mut self, idx: usize, slice: &[$uchar]) { let len = self.inner.len(); let amt = slice.len(); self.inner.reserve(amt); ptr::copy( self.inner.as_ptr().add(idx), self.inner.as_mut_ptr().add(idx + amt), len - idx, ); ptr::copy_nonoverlapping(slice.as_ptr(), self.inner.as_mut_ptr().add(idx), amt); self.inner.set_len(len + amt); } $(#[$as_mut_vec_meta])* #[inline] #[must_use] pub unsafe fn as_mut_vec(&mut self) -> &mut Vec<$uchar> { &mut self.inner } /// Returns the length of this string in number of elements, not [`char`]s or /// graphemes. /// /// In other words, it might not be what a human considers the length of the string. #[inline] #[must_use] pub fn len(&self) -> usize { self.inner.len() } /// Returns `true` if this string has a length of zero, and `false` otherwise. #[inline] #[must_use] pub fn is_empty(&self) -> bool { self.inner.is_empty() } /// Truncates the string, removing all contents. /// /// While this means the string will have a length of zero, it does not touch its /// capacity. #[inline] pub fn clear(&mut self) { self.inner.clear() } /// Converts this string into a boxed string slice. /// /// This will drop excess capacity. #[inline] #[must_use] pub fn into_boxed_utfstr(self) -> Box<$utfstr> { let slice = self.inner.into_boxed_slice(); // SAFETY: Already valid UTF-16 unsafe { $utfstr::from_boxed_slice_unchecked(slice) } } /// Appends a given UTF-8 string slice onto the end of this string, converting it to /// UTF-16. #[inline] pub fn push_str + ?Sized>(&mut self, string: &S) { self.extend(string.as_ref().chars()) } } impl Add<&$utfstr> for $utfstring { type Output = $utfstring; #[inline] fn add(mut self, rhs: &$utfstr) -> Self::Output { self.push_utfstr(rhs); self } } impl Add<&str> for $utfstring { type Output = $utfstring; #[inline] fn add(mut self, rhs: &str) -> Self::Output { self.push_str(rhs); self } } impl AddAssign<&$utfstr> for $utfstring { #[inline] fn add_assign(&mut self, rhs: &$utfstr) { self.push_utfstr(rhs) } } impl AddAssign<&str> for $utfstring { #[inline] fn add_assign(&mut self, rhs: &str) { self.push_str(rhs) } } impl AsMut<$utfstr> for $utfstring { #[inline] fn as_mut(&mut self) -> &mut $utfstr { self.as_mut_utfstr() } } impl AsRef<$utfstr> for $utfstring { #[inline] fn as_ref(&self) -> &$utfstr { self.as_utfstr() } } impl AsRef<[$uchar]> for $utfstring { #[inline] fn as_ref(&self) -> &[$uchar] { &self.inner } } impl AsRef for $utfstring { #[inline] fn as_ref(&self) -> &crate::$ustr { self.as_ustr() } } impl Borrow<$utfstr> for $utfstring { #[inline] fn borrow(&self) -> &$utfstr { self.as_utfstr() } } impl BorrowMut<$utfstr> for $utfstring { #[inline] fn borrow_mut(&mut self) -> &mut $utfstr { self.as_mut_utfstr() } } impl core::fmt::Debug for $utfstring { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Debug::fmt(self.as_utfstr(), f) } } impl Deref for $utfstring { type Target = $utfstr; #[inline] fn deref(&self) -> &Self::Target { self.as_utfstr() } } impl DerefMut for $utfstring { #[inline] fn deref_mut(&mut self) -> &mut Self::Target { self.as_mut_utfstr() } } impl core::fmt::Display for $utfstring { #[inline] fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { core::fmt::Display::fmt(self.as_utfstr(), f) } } impl Extend for $utfstring { #[inline] fn extend>(&mut self, iter: T) { let iter = iter.into_iter(); let (lower_bound, _) = iter.size_hint(); self.reserve(lower_bound); iter.for_each(|c| self.push(c)); } } impl<'a> Extend<&'a char> for $utfstring { #[inline] fn extend>(&mut self, iter: T) { self.extend(iter.into_iter().copied()) } } impl<'a> Extend<&'a $utfstr> for $utfstring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_utfstr(s)) } } impl Extend<$utfstring> for $utfstring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter() .for_each(|s| self.push_utfstr(&s)) } } impl<'a> Extend> for $utfstring { #[inline] fn extend>>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_utfstr(&s)) } } impl Extend> for $utfstring { #[inline] fn extend>>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_utfstr(&s)) } } impl<'a> Extend<&'a str> for $utfstring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_str(s)) } } impl Extend for $utfstring { #[inline] fn extend>(&mut self, iter: T) { iter.into_iter().for_each(|s| self.push_str(&s)) } } impl From<&mut $utfstr> for $utfstring { #[inline] fn from(value: &mut $utfstr) -> Self { value.to_owned() } } impl From<&$utfstr> for $utfstring { #[inline] fn from(value: &$utfstr) -> Self { value.to_owned() } } impl From<&$utfstring> for $utfstring { #[inline] fn from(value: &$utfstring) -> Self { value.clone() } } impl From<$utfstring> for Cow<'_, $utfstr> { #[inline] fn from(value: $utfstring) -> Self { Cow::Owned(value) } } impl<'a> From<&'a $utfstring> for Cow<'a, $utfstr> { #[inline] fn from(value: &'a $utfstring) -> Self { Cow::Borrowed(value) } } impl From> for $utfstring { #[inline] fn from(value: Cow<'_, $utfstr>) -> Self { value.into_owned() } } impl From<&str> for $utfstring { #[inline] fn from(value: &str) -> Self { Self::from_str(value) } } impl From for $utfstring { #[inline] fn from(value: String) -> Self { Self::from_str(&value) } } impl From<$utfstring> for crate::$ustring { #[inline] fn from(value: $utfstring) -> Self { crate::$ustring::from_vec(value.into_vec()) } } impl From<&$utfstr> for String { #[inline] fn from(value: &$utfstr) -> Self { value.to_string() } } impl From<$utfstring> for String { #[inline] fn from(value: $utfstring) -> Self { value.to_string() } } #[cfg(feature = "std")] impl From<$utfstring> for std::ffi::OsString { #[inline] fn from(value: $utfstring) -> std::ffi::OsString { value.as_ustr().to_os_string() } } impl FromIterator for $utfstring { #[inline] fn from_iter>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl<'a> FromIterator<&'a char> for $utfstring { #[inline] fn from_iter>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl<'a> FromIterator<&'a $utfstr> for $utfstring { #[inline] fn from_iter>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl FromIterator<$utfstring> for $utfstring { fn from_iter>(iter: T) -> Self { let mut iterator = iter.into_iter(); // Because we're iterating over `String`s, we can avoid at least // one allocation by getting the first string from the iterator // and appending to it all the subsequent strings. match iterator.next() { None => Self::new(), Some(mut buf) => { buf.extend(iterator); buf } } } } impl FromIterator> for $utfstring { #[inline] fn from_iter>>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl<'a> FromIterator> for $utfstring { #[inline] fn from_iter>>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl<'a> FromIterator<&'a str> for $utfstring { #[inline] fn from_iter>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl FromIterator for $utfstring { #[inline] fn from_iter>(iter: T) -> Self { let mut s = Self::new(); s.extend(iter); s } } impl FromStr for $utfstring { type Err = Infallible; #[inline] fn from_str(s: &str) -> Result { Ok($utfstring::from_str(s)) } } impl Index for $utfstring where I: RangeBounds + SliceIndex<[$uchar], Output = [$uchar]>, { type Output = $utfstr; #[inline] fn index(&self, index: I) -> &Self::Output { &self.deref()[index] } } impl IndexMut for $utfstring where I: RangeBounds + SliceIndex<[$uchar], Output = [$uchar]>, { #[inline] fn index_mut(&mut self, index: I) -> &mut Self::Output { &mut self.deref_mut()[index] } } impl PartialEq<$utfstr> for $utfstring { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<&$utfstr> for $utfstring { #[inline] fn eq(&self, other: &&$utfstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq> for $utfstring { #[inline] fn eq(&self, other: &Cow<'_, $utfstr>) -> bool { self == other.as_ref() } } impl PartialEq<$utfstring> for Cow<'_, $utfstr> { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_ref() == other } } impl PartialEq<$utfstring> for $utfstr { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstring> for &$utfstr { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstring { #[inline] fn eq(&self, other: &str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<&str> for $utfstring { #[inline] fn eq(&self, other: &&str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<$utfstring> for str { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<$utfstring> for &str { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for $utfstring { #[inline] fn eq(&self, other: &String) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<$utfstring> for String { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for $utfstr { #[inline] fn eq(&self, other: &String) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<$utfstr> for String { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.chars().eq(other.chars()) } } impl PartialEq> for $utfstring { #[inline] fn eq(&self, other: &Cow<'_, str>) -> bool { self == other.as_ref() } } impl PartialEq<$utfstring> for Cow<'_, str> { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_ref() == other } } impl PartialEq for $utfstring { #[inline] fn eq(&self, other: &crate::$ustr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstring> for crate::$ustr { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstring { #[inline] fn eq(&self, other: &crate::$ustring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstring> for crate::$ustring { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstr { #[inline] fn eq(&self, other: &crate::$ustring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstr> for crate::$ustring { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstring { #[inline] fn eq(&self, other: &crate::$ucstr) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstring> for crate::$ucstr { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstring { #[inline] fn eq(&self, other: &crate::$ucstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstring> for crate::$ucstring { #[inline] fn eq(&self, other: &$utfstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq for $utfstr { #[inline] fn eq(&self, other: &crate::$ucstring) -> bool { self.as_slice() == other.as_slice() } } impl PartialEq<$utfstr> for crate::$ucstring { #[inline] fn eq(&self, other: &$utfstr) -> bool { self.as_slice() == other.as_slice() } } impl ToOwned for $utfstr { type Owned = $utfstring; #[inline] fn to_owned(&self) -> Self::Owned { unsafe { $utfstring::from_vec_unchecked(&self.inner) } } } impl TryFrom for $utfstring { type Error = $utferror; #[inline] fn try_from(value: crate::$ustring) -> Result { $utfstring::from_ustring(value) } } impl TryFrom for $utfstring { type Error = $utferror; #[inline] fn try_from(value: crate::$ucstring) -> Result { $utfstring::from_ustring(value) } } impl TryFrom<&crate::$ustr> for $utfstring { type Error = $utferror; #[inline] fn try_from(value: &crate::$ustr) -> Result { $utfstring::from_ustring(value) } } impl TryFrom<&crate::$ucstr> for $utfstring { type Error = $utferror; #[inline] fn try_from(value: &crate::$ucstr) -> Result { $utfstring::from_ustring(value) } } impl Write for $utfstring { #[inline] fn write_str(&mut self, s: &str) -> core::fmt::Result { self.push_str(s); Ok(()) } #[inline] fn write_char(&mut self, c: char) -> core::fmt::Result { self.push(c); Ok(()) } } }; } utfstring_common_impl! { /// A UTF-16 encoded, growable owned string. /// /// [`Utf16String`] is a version of [`String`] that uses UTF-16 encoding instead of UTF-8 /// encoding. The equivalent of [`str`] for [`Utf16String`] is [`Utf16Str`]. /// /// Unlike [`U16String`][crate::U16String] which does not specify a coding, [`Utf16String`] is /// always valid UTF-16 encoding. Using unsafe methods to construct a [`Utf16String`] with /// invalid UTF-16 encoding results in undefined behavior. /// /// # UTF-16 /// /// [`Utf16String`] is always UTF-16. This means if you need non-UTF-16 wide strings, you should /// use [`U16String`][crate::U16String] instead. It is similar, but does not constrain the /// encoding. /// /// This also means you cannot directly index a single element of the string, as UTF-16 encoding /// may be a single `u16` value or a pair of `u16` surrogates. Instead, you can index subslices /// of the string, or use the [`chars`][Utf16Str::chars] iterator instead. /// /// # Examples /// /// The easiest way to use [`Utf16String`] is with the [`utf16str!`][crate::utf16str] macro to /// convert string literals into UTF-16 string slices at compile time: /// /// ``` /// use widestring::{Utf16String, utf16str}; /// let hello = Utf16String::from(utf16str!("Hello, world!")); /// ``` /// /// Because this string is always valid UTF-16, it is a non-fallible, lossless conversion to and /// from standard Rust strings: /// /// ``` /// use widestring::Utf16String; /// // Unlike the utf16str macro, this will do conversion at runtime instead of compile time /// let hello = Utf16String::from_str("Hello, world!"); /// let hello_string: String = hello.to_string(); /// assert_eq!(hello, hello_string); // Can easily compare between string types /// ``` struct Utf16String([u16]); type UtfStr = Utf16Str; type UStr = U16Str; type UCStr = U16CStr; type UString = U16String; type UCString = U16CString; type UtfError = Utf16Error; /// Converts a [`u16`] vector to a string without checking that the string contains valid /// UTF-16. /// /// See the safe version, [`from_vec`][Self::from_vec], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the vector passed to it is valid /// UTF-16. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf16String`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = unsafe { Utf16String::from_vec_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` fn from_vec_unchecked() -> {} /// Re-encodes a UTF-8--encoded string slice into a UTF-16--encoded string. /// /// This operation is lossless and infallible, but requires a memory allocation. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// use widestring::Utf16String; /// let music = Utf16String::from_str("๐„žmusic"); /// assert_eq!(utf16str!("๐„žmusic"), music); /// ``` fn from_str() -> {} /// Appends a given string slice onto the end of this string. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("foo"); /// s.push_utfstr(utf16str!("bar")); /// assert_eq!(utf16str!("foobar"), s); /// ``` fn push_utfstr() -> {} /// Returns a mutable reference to the contents of this string. /// /// # Safety /// /// This function is unsafe because it does not check that the values in the vector are valid /// UTF-16. If this constraint is violated, it may cause undefined beahvior with future /// users of the string, as it is assumed that this string is always valid UTF-16. fn as_mut_vec() -> {} } utfstring_common_impl! { /// A UTF-32 encoded, growable owned string. /// /// [`Utf32String`] is a version of [`String`] that uses UTF-32 encoding instead of UTF-8 /// encoding. The equivalent of [`str`] for [`Utf32String`] is [`Utf32Str`]. /// /// Unlike [`U32String`][crate::U32String] which does not specify a coding, [`Utf32String`] is /// always valid UTF-32 encoding. Using unsafe methods to construct a [`Utf32String`] with /// invalid UTF-32 encoding results in undefined behavior. /// /// # UTF-32 /// /// [`Utf32String`] is always UTF-32. This means if you need non-UTF-32 wide strings, you should /// use [`U32String`][crate::U32String] instead. It is similar, but does not constrain the /// encoding. /// /// Unlike UTF-16 or UTF-8 strings, you may index single elements of UTF-32 strings in addition /// to subslicing. This is due to it being a fixed-length encoding for [`char`]s. This also /// means that [`Utf32String`] is the same representation as a `Vec`; indeed conversions /// between the two exist and are simple typecasts. /// /// # Examples /// /// The easiest way to use [`Utf32String`] is with the [`utf32str!`][crate::utf32str] macro to /// convert string literals into UTF-32 string slices at compile time: /// /// ``` /// use widestring::{Utf32String, utf32str}; /// let hello = Utf32String::from(utf32str!("Hello, world!")); /// ``` /// /// Because this string is always valid UTF-32, it is a non-fallible, lossless conversion to and /// from standard Rust strings: /// /// ``` /// use widestring::Utf32String; /// // Unlike the utf32str macro, this will do conversion at runtime instead of compile time /// let hello = Utf32String::from_str("Hello, world!"); /// let hello_string: String = hello.to_string(); /// assert_eq!(hello, hello_string); // Can easily compare between string types /// ``` struct Utf32String([u32]); type UtfStr = Utf32Str; type UStr = U32Str; type UCStr = U32CStr; type UString = U32String; type UCString = U32CString; type UtfError = Utf32Error; /// Converts a [`u32`] vector to a string without checking that the string contains valid /// UTF-32. /// /// See the safe version, [`from_vec`][Self::from_vec], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the vector passed to it is valid /// UTF-32. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf32String`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = unsafe { Utf32String::from_vec_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` fn from_vec_unchecked() -> {} /// Re-encodes a UTF-8--encoded string slice into a UTF-32--encoded string. /// /// This operation is lossless and infallible, but requires a memory allocation. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// use widestring::Utf32String; /// let music = Utf32String::from_str("๐„žmusic"); /// assert_eq!(utf32str!("๐„žmusic"), music); /// ``` fn from_str() -> {} /// Appends a given string slice onto the end of this string. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("foo"); /// s.push_utfstr(utf32str!("bar")); /// assert_eq!(utf32str!("foobar"), s); /// ``` fn push_utfstr() -> {} /// Returns a mutable reference to the contents of this string. /// /// # Safety /// /// This function is unsafe because it does not check that the values in the vector are valid /// UTF-16. If this constraint is violated, it may cause undefined beahvior with future /// users of the string, as it is assumed that this string is always valid UTF-16. fn as_mut_vec() -> {} } impl Utf16String { /// Converts a [`u16`] vector of UTF-16 data to a string. /// /// Not all slices of [`u16`] values are valid to convert, since [`Utf16String`] requires that /// it is always valid UTF-16. This function checks to ensure that the values are valid UTF-16, /// and then does the conversion. This does not do any copying. /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_vec_unchecked`][Self::from_vec_unchecked], which has the same behavior but skips /// the check. /// /// If you need a string slice, consider using [`Utf16Str::from_slice`] instead. /// /// The inverse of this method is [`into_vec`][Self::into_vec]. /// /// # Errors /// /// Returns an error if the vector is not UTF-16 with a description as to why the provided /// vector is not UTF-16. The error will contain the original [`Vec`] that can be reclaimed with /// [`into_vec`][Utf16Error::into_vec]. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = Utf16String::from_vec(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::Utf16String; /// /// let sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// /// assert!(Utf16String::from_vec(sparkle_heart).is_err()); /// ``` pub fn from_vec(v: impl Into>) -> Result { let v = validate_utf16_vec(v.into())?; Ok(unsafe { Self::from_vec_unchecked(v) }) } /// Converts a slice of [`u16`] data to a string, including invalid characters. /// /// Since the given [`u16`] slice may not be valid UTF-16, and [`Utf16String`] requires that /// it is always valid UTF-16, during the conversion this function replaces any invalid UTF-16 /// sequences with [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which /// looks like this: ๏ฟฝ /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the conversion, there is an unsafe version of this function, /// [`from_vec_unchecked`][Self::from_vec_unchecked], which has the same behavior but skips /// the checks. /// /// This function returns a [`Cow<'_, Utf16Str>`][std::borrow::Cow]. If the given slice is /// invalid UTF-16, then we need to insert our replacement characters which will change the size /// of the string, and hence, require an owned [`Utf16String`]. But if it's already valid /// UTF-16, we don't need a new allocation. This return type allows us to handle both cases. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// use widestring::Utf16String; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = Utf16String::from_slice_lossy(&sparkle_heart); /// /// assert_eq!(utf16str!("๐Ÿ’–"), sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// # use widestring::utf16str; /// use widestring::Utf16String; /// /// let sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// let sparkle_heart = Utf16String::from_slice_lossy(&sparkle_heart); /// /// assert_eq!(utf16str!("\u{fffd}\u{0000}"), sparkle_heart); /// ``` #[must_use] pub fn from_slice_lossy(s: &[u16]) -> Cow<'_, Utf16Str> { match validate_utf16(s) { // SAFETY: validated as UTF-16 Ok(()) => Cow::Borrowed(unsafe { Utf16Str::from_slice_unchecked(s) }), Err(e) => { let mut v = Vec::with_capacity(s.len()); // Valid up until index v.extend_from_slice(&s[..e.index()]); let mut index = e.index(); let mut replacement_char = [0; 2]; let replacement_char = char::REPLACEMENT_CHARACTER.encode_utf16(&mut replacement_char); while index < s.len() { let u = s[index]; if is_utf16_surrogate(u) { if is_utf16_low_surrogate(u) || index + 1 >= s.len() { v.extend_from_slice(replacement_char); } else { let low = s[index + 1]; if is_utf16_low_surrogate(low) { // Valid surrogate pair v.push(u); v.push(low); index += 1; } else { v.extend_from_slice(replacement_char); } } } else { v.push(u); } index += 1; } // SATEFY: Is now valid UTF-16 with replacement chars Cow::Owned(unsafe { Self::from_vec_unchecked(v) }) } } } /// Converts a wide string of undefined encoding to a UTF-16 string without checking that the /// string contains valid UTF-16. /// /// See the safe version, [`from_ustring`][Self::from_ustring], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string passed to it is valid /// UTF-16. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf16String`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::{U16String, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = U16String::from_vec(sparkle_heart); /// let sparkle_heart = unsafe { Utf16String::from_ustring_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub unsafe fn from_ustring_unchecked(s: impl Into) -> Self { Self::from_vec_unchecked(s.into().into_vec()) } /// Converts a wide string of undefined encoding into a UTF-16 string. /// /// Not all strings with undefined encoding are valid to convert, since [`Utf16String`] requires /// that it is always valid UTF-16. This function checks to ensure that the string is valid /// UTF-16, and then does the conversion. This does not do any copying. /// /// If you are sure that the string is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ustring_unchecked`][Self::from_ustring_unchecked], which has the same behavior but /// skips the check. /// /// If you need a string slice, consider using [`Utf16Str::from_ustr`] instead. /// /// # Errors /// /// Returns an error if the string is not UTF-16 with a description as to why the provided /// string is not UTF-16. /// /// # Examples /// /// ``` /// use widestring::{U16String, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = U16String::from_vec(sparkle_heart); /// let sparkle_heart = Utf16String::from_ustring(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::{U16String, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// let sparkle_heart = U16String::from_vec(sparkle_heart); // Valid for a U16String /// /// assert!(Utf16String::from_ustring(sparkle_heart).is_err()); // But not for a Utf16String /// ``` #[inline] pub fn from_ustring(s: impl Into) -> Result { Self::from_vec(s.into().into_vec()) } /// Converts a wide string slice of undefined encoding of to a UTF-16 string, including invalid /// characters. /// /// Since the given string slice may not be valid UTF-16, and [`Utf16String`] requires that /// it is always valid UTF-16, during the conversion this function replaces any invalid UTF-16 /// sequences with [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which /// looks like this: ๏ฟฝ /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the conversion, there is an unsafe version of this function, /// [`from_ustring_unchecked`][Self::from_ustring_unchecked], which has the same behavior but /// skips the checks. /// /// This function returns a [`Cow<'_, Utf16Str>`][std::borrow::Cow]. If the given slice is /// invalid UTF-16, then we need to insert our replacement characters which will change the size /// of the string, and hence, require an owned [`Utf16String`]. But if it's already valid /// UTF-16, we don't need a new allocation. This return type allows us to handle both cases. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// use widestring::{U16Str, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = U16Str::from_slice(&sparkle_heart); /// let sparkle_heart = Utf16String::from_ustr_lossy(sparkle_heart); /// /// assert_eq!(utf16str!("๐Ÿ’–"), sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// # use widestring::utf16str; /// use widestring::{U16Str, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// let sparkle_heart = U16Str::from_slice(&sparkle_heart); /// let sparkle_heart = Utf16String::from_ustr_lossy(sparkle_heart); /// /// assert_eq!(utf16str!("\u{fffd}\u{0000}"), sparkle_heart); /// ``` #[inline] #[must_use] pub fn from_ustr_lossy(s: &crate::U16Str) -> Cow<'_, Utf16Str> { Self::from_slice_lossy(s.as_slice()) } /// Converts a wide C string to a UTF-16 string without checking that the string contains /// valid UTF-16. /// /// The resulting string does *not* contain the nul terminator. /// /// See the safe version, [`from_ucstring`][Self::from_ucstring], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string passed to it is valid /// UTF-16. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf16String`] is always valid UTF-16. /// /// # Examples /// /// ``` /// use widestring::{U16CString, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = U16CString::from_vec(sparkle_heart).unwrap(); /// let sparkle_heart = unsafe { Utf16String::from_ucstring_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub unsafe fn from_ucstring_unchecked(s: impl Into) -> Self { Self::from_vec_unchecked(s.into().into_vec()) } /// Converts a wide C string into a UTF-16 string. /// /// The resulting string does *not* contain the nul terminator. /// /// Not all wide C strings are valid to convert, since [`Utf16String`] requires that /// it is always valid UTF-16. This function checks to ensure that the string is valid UTF-16, /// and then does the conversion. This does not do any copying. /// /// If you are sure that the string is valid UTF-16, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ucstring_unchecked`][Self::from_ucstring_unchecked], which has the same behavior but /// skips the check. /// /// If you need a string slice, consider using [`Utf16Str::from_ucstr`] instead. /// /// # Errors /// /// Returns an error if the string is not UTF-16 with a description as to why the provided /// string is not UTF-16. /// /// # Examples /// /// ``` /// use widestring::{U16CString, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // Raw surrogate pair /// let sparkle_heart = U16CString::from_vec(sparkle_heart).unwrap(); /// let sparkle_heart = Utf16String::from_ucstring(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::{U16CString, Utf16String}; /// /// let sparkle_heart = vec![0xd83d]; // This is an invalid unpaired surrogate /// let sparkle_heart = U16CString::from_vec(sparkle_heart).unwrap(); // Valid for a U16CString /// /// assert!(Utf16String::from_ucstring(sparkle_heart).is_err()); // But not for a Utf16String /// ``` #[inline] pub fn from_ucstring(s: impl Into) -> Result { Self::from_vec(s.into().into_vec()) } /// Converts a wide C string slice of to a UTF-16 string, including invalid characters. /// /// The resulting string does *not* contain the nul terminator. /// /// Since the given string slice may not be valid UTF-16, and [`Utf16String`] requires that /// it is always valid UTF-16, during the conversion this function replaces any invalid UTF-16 /// sequences with [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which /// looks like this: ๏ฟฝ /// /// If you are sure that the slice is valid UTF-16, and you don't want to incur the overhead of /// the conversion, there is an unsafe version of this function, /// [`from_ucstring_unchecked`][Self::from_ucstring_unchecked], which has the same behavior but /// skips the checks. /// /// This function returns a [`Cow<'_, Utf16Str>`][std::borrow::Cow]. If the given slice is /// invalid UTF-16, then we need to insert our replacement characters which will change the size /// of the string, and hence, require an owned [`Utf16String`]. But if it's already valid /// UTF-16, we don't need a new allocation. This return type allows us to handle both cases. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// use widestring::{U16CStr, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96, 0x0]; // Raw surrogate pair /// let sparkle_heart = U16CStr::from_slice(&sparkle_heart).unwrap(); /// let sparkle_heart = Utf16String::from_ucstr_lossy(sparkle_heart); /// /// assert_eq!(utf16str!("๐Ÿ’–"), sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// # use widestring::utf16str; /// use widestring::{U16CStr, Utf16String}; /// /// let sparkle_heart = vec![0xd83d, 0x0]; // This is an invalid unpaired surrogate /// let sparkle_heart = U16CStr::from_slice(&sparkle_heart).unwrap(); /// let sparkle_heart = Utf16String::from_ucstr_lossy(sparkle_heart); /// /// assert_eq!(utf16str!("\u{fffd}"), sparkle_heart); /// ``` #[inline] #[must_use] pub fn from_ucstr_lossy(s: &crate::U16CStr) -> Cow<'_, Utf16Str> { Self::from_slice_lossy(s.as_slice()) } /// Appends the given [`char`] to the end of this string. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("abc"); /// /// s.push('1'); /// s.push('2'); /// s.push('3'); /// /// assert_eq!("abc123", s); /// ``` #[inline] pub fn push(&mut self, ch: char) { let mut buf = [0; 2]; self.inner.extend_from_slice(ch.encode_utf16(&mut buf)) } /// Shortens this string to the specified length. /// /// If `new_len` is greater than the string's current length, this has no effect. /// /// Note that this method has no effect on the allocated capacity of the string. /// /// # Panics /// /// Panics if `new_len` does not lie on a [`char`] boundary. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("hello"); /// s.truncate(2); /// assert_eq!("he", s); /// ``` #[inline] pub fn truncate(&mut self, new_len: usize) { assert!(self.is_char_boundary(new_len)); self.inner.truncate(new_len) } /// Removes the last character from the string buffer and returns it. /// /// Returns [`None`] if this string is empty. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("foo๐„ž"); /// /// assert_eq!(s.pop(), Some('๐„ž')); /// assert_eq!(s.pop(), Some('o')); /// assert_eq!(s.pop(), Some('o')); /// assert_eq!(s.pop(), Some('f')); /// /// assert_eq!(s.pop(), None); /// ``` pub fn pop(&mut self) -> Option { let c = self.inner.pop(); if let Some(c) = c { if is_utf16_low_surrogate(c) { let high = self.inner.pop().unwrap(); // SAFETY: string is always valid UTF-16, so pair is valid Some(unsafe { decode_utf16_surrogate_pair(high, c) }) } else { // SAFETY: not a surrogate Some(unsafe { char::from_u32_unchecked(c as u32) }) } } else { None } } /// Removes a [`char`] from this string at an offset and returns it. /// /// This is an _O(n)_ operation, as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than or equal to the string's length, or if it does not lie on a /// [`char`] boundary. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("๐„žfoo"); /// /// assert_eq!(s.remove(0), '๐„ž'); /// assert_eq!(s.remove(1), 'o'); /// assert_eq!(s.remove(0), 'f'); /// assert_eq!(s.remove(0), 'o'); /// ``` #[inline] pub fn remove(&mut self, idx: usize) -> char { let c = self[idx..].chars().next().unwrap(); let next = idx + c.len_utf16(); let len = self.len(); unsafe { ptr::copy( self.inner.as_ptr().add(next), self.inner.as_mut_ptr().add(idx), len - next, ); self.inner.set_len(len - (next - idx)); } c } /// Retains only the characters specified by the predicate. /// /// In other words, remove all characters `c` such that `f(c)` returns `false`. This method /// operates in place, visiting each character exactly once in the original order, and preserves /// the order of the retained characters. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("f_o_ob_ar"); /// /// s.retain(|c| c != '_'); /// /// assert_eq!(s, "foobar"); /// ``` /// /// Because the elements are visited exactly once in the original order, external state may be /// used to decide which elements to keep. /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("abcde"); /// let keep = [false, true, true, false, true]; /// let mut iter = keep.iter(); /// s.retain(|_| *iter.next().unwrap()); /// assert_eq!(s, "bce"); /// ``` pub fn retain(&mut self, mut f: F) where F: FnMut(char) -> bool, { let mut index = 0; while index < self.len() { // SAFETY: always in bounds and incremented by len_utf16 only let c = unsafe { self.get_unchecked(index..) } .chars() .next() .unwrap(); if !f(c) { self.inner.drain(index..index + c.len_utf16()); } else { index += c.len_utf16(); } } } /// Inserts a character into this string at an offset. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length, or if it does not lie on a [`char`] /// boundary. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::with_capacity(5); /// /// s.insert(0, '๐„ž'); /// s.insert(0, 'f'); /// s.insert(1, 'o'); /// s.insert(4, 'o'); /// /// assert_eq!("fo๐„žo", s); /// ``` #[inline] pub fn insert(&mut self, idx: usize, ch: char) { assert!(self.is_char_boundary(idx)); let mut bits = [0; 2]; let bits = ch.encode_utf16(&mut bits); unsafe { self.insert_slice(idx, bits); } } /// Inserts a UTF-16 string slice into this string at an offset. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length, or if it does not lie on a [`char`] /// boundary. /// /// # Examples /// /// ``` /// # use widestring::utf16str; /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("bar"); /// /// s.insert_utfstr(0, utf16str!("foo")); /// /// assert_eq!("foobar", s); /// ``` #[inline] pub fn insert_utfstr(&mut self, idx: usize, string: &Utf16Str) { assert!(self.is_char_boundary(idx)); unsafe { self.insert_slice(idx, string.as_slice()); } } /// Splits the string into two at the given index. /// /// Returns a newly allocated string. `self` contains elements [0, at), and the returned string /// contains elements [at, len). `at` must be on the boundary of a UTF-16 code point. /// /// Note that the capacity of `self` does not change. /// /// # Panics /// /// Panics if `at` is not on a UTF-16 code point boundary, or if it is beyond the last code /// point of the string. /// /// # Examples /// /// ``` /// use widestring::Utf16String; /// let mut hello = Utf16String::from_str("Hello, World!"); /// let world = hello.split_off(7); /// assert_eq!(hello, "Hello, "); /// assert_eq!(world, "World!"); /// ``` #[inline] #[must_use] pub fn split_off(&mut self, at: usize) -> Self { assert!(self.is_char_boundary(at)); unsafe { Self::from_vec_unchecked(self.inner.split_off(at)) } } /// Creates a draining iterator that removes the specified range in the string and yields the /// removed [`char`]s. /// /// Note: The element range is removed even if the iterator is not consumed until the end. /// /// # Panics /// /// Panics if the starting point or end point do not lie on a [`char`] boundary, or if they're /// out of bounds. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::Utf16String; /// let mut s = Utf16String::from_str("ฮฑ is alpha, ฮฒ is beta"); /// let beta_offset = 12; /// /// // Remove the range up until the ฮฒ from the string /// let t: Utf16String = s.drain(..beta_offset).collect(); /// assert_eq!(t, "ฮฑ is alpha, "); /// assert_eq!(s, "ฮฒ is beta"); /// /// // A full range clears the string /// s.drain(..); /// assert_eq!(s, ""); /// ``` pub fn drain(&mut self, range: R) -> DrainUtf16<'_> where R: RangeBounds, { // WARNING: Using range again would be unsound // TODO: replace with core::slice::range when it is stabilized let core::ops::Range { start, end } = crate::range(range, ..self.len()); assert!(self.is_char_boundary(start)); assert!(self.is_char_boundary(end)); // Take out two simultaneous borrows. The self_ptr won't be accessed // until iteration is over, in Drop. let self_ptr: *mut _ = self; // SAFETY: `slice::range` and `is_char_boundary` do the appropriate bounds checks. let chars_iter = unsafe { self.get_unchecked(start..end) }.chars(); DrainUtf16 { start, end, iter: chars_iter, string: self_ptr, } } /// Removes the specified range in the string, and replaces it with the given string. /// /// The given string doesn't need to be the same length as the range. /// /// # Panics /// /// Panics if the starting point or end point do not lie on a [`char`] boundary, or if they're /// out of bounds. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::{utf16str, Utf16String}; /// let mut s = Utf16String::from_str("ฮฑ is alpha, ฮฒ is beta"); /// let beta_offset = 12; /// /// // Replace the range up until the ฮฒ from the string /// s.replace_range(..beta_offset, utf16str!("ฮ‘ is capital alpha; ")); /// assert_eq!(s, "ฮ‘ is capital alpha; ฮฒ is beta"); /// ``` pub fn replace_range(&mut self, range: R, replace_with: &Utf16Str) where R: RangeBounds, { use core::ops::Bound::*; // WARNING: Using range again would be unsound let start = range.start_bound(); match start { Included(&n) => assert!(self.is_char_boundary(n)), Excluded(&n) => assert!(self.is_char_boundary(n + 1)), Unbounded => {} }; // WARNING: Inlining this variable would be unsound let end = range.end_bound(); match end { Included(&n) => assert!(self.is_char_boundary(n + 1)), Excluded(&n) => assert!(self.is_char_boundary(n)), Unbounded => {} }; // Using `range` again would be unsound // We assume the bounds reported by `range` remain the same, but // an adversarial implementation could change between calls self.inner .splice((start, end), replace_with.as_slice().iter().copied()); } } impl Utf32String { /// Converts a [`u32`] vector of UTF-32 data to a string. /// /// Not all slices of [`u32`] values are valid to convert, since [`Utf32String`] requires that /// it is always valid UTF-32. This function checks to ensure that the values are valid UTF-32, /// and then does the conversion. This does not do any copying. /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_vec_unchecked`][Self::from_vec_unchecked], which has the same behavior but skips /// the check. /// /// If you need a string slice, consider using [`Utf32Str::from_slice`] instead. /// /// The inverse of this method is [`into_vec`][Self::into_vec]. /// /// # Errors /// /// Returns an error if the vector is not UTF-32 with a description as to why the provided /// vector is not UTF-32. The error will contain the original [`Vec`] that can be reclaimed with /// [`into_vec`][Utf32Error::into_vec]. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = Utf32String::from_vec(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::Utf32String; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// /// assert!(Utf32String::from_vec(sparkle_heart).is_err()); /// ``` pub fn from_vec(v: impl Into>) -> Result { let v = validate_utf32_vec(v.into())?; Ok(unsafe { Self::from_vec_unchecked(v) }) } /// Converts a slice of [`u32`] data to a string, including invalid characters. /// /// Since the given [`u32`] slice may not be valid UTF-32, and [`Utf32String`] requires that /// it is always valid UTF-32, during the conversion this function replaces any invalid UTF-32 /// sequences with [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which /// looks like this: ๏ฟฝ /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the conversion, there is an unsafe version of this function, /// [`from_vec_unchecked`][Self::from_vec_unchecked], which has the same behavior but skips /// the checks. /// /// This function returns a [`Cow<'_, Utf32Str>`][std::borrow::Cow]. If the given slice is /// invalid UTF-32, then we need to insert our replacement characters which will change the size /// of the string, and hence, require an owned [`Utf32String`]. But if it's already valid /// UTF-32, we don't need a new allocation. This return type allows us to handle both cases. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// use widestring::Utf32String; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = Utf32String::from_slice_lossy(&sparkle_heart); /// /// assert_eq!(utf32str!("๐Ÿ’–"), sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// # use widestring::utf32str; /// use widestring::Utf32String; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// let sparkle_heart = Utf32String::from_slice_lossy(&sparkle_heart); /// /// assert_eq!(utf32str!("\u{fffd}\u{fffd}"), sparkle_heart); /// ``` #[must_use] pub fn from_slice_lossy(s: &[u32]) -> Cow<'_, Utf32Str> { match validate_utf32(s) { // SAFETY: validated as UTF-32 Ok(()) => Cow::Borrowed(unsafe { Utf32Str::from_slice_unchecked(s) }), Err(e) => { let mut v = Vec::with_capacity(s.len()); // Valid up until index v.extend_from_slice(&s[..e.index()]); for u in s[e.index()..].iter().copied() { if char::from_u32(u).is_some() { v.push(u); } else { v.push(char::REPLACEMENT_CHARACTER as u32); } } // SATEFY: Is now valid UTF-32 with replacement chars Cow::Owned(unsafe { Self::from_vec_unchecked(v) }) } } } /// Converts a wide string of undefined encoding to a UTF-32 string without checking that the /// string contains valid UTF-32. /// /// See the safe version, [`from_ustring`][Self::from_ustring], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string passed to it is valid /// UTF-32. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf32String`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::{U32String, Utf32String}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32String::from_vec(sparkle_heart); /// let sparkle_heart = unsafe { Utf32String::from_ustring_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub unsafe fn from_ustring_unchecked(s: impl Into) -> Self { Self::from_vec_unchecked(s.into().into_vec()) } /// Converts a wide string of undefined encoding string into a UTF-32 string. /// /// Not all strings of undefined encoding are valid to convert, since [`Utf32String`] requires /// that it is always valid UTF-32. This function checks to ensure that the string is valid /// UTF-32, and then does the conversion. This does not do any copying. /// /// If you are sure that the string is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ustring_unchecked`][Self::from_ustring_unchecked], which has the same behavior but /// skips the check. /// /// If you need a string slice, consider using [`Utf32Str::from_ustr`] instead. /// /// # Errors /// /// Returns an error if the string is not UTF-32 with a description as to why the provided /// string is not UTF-32. /// /// # Examples /// /// ``` /// use widestring::{U32String, Utf32String}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32String::from_vec(sparkle_heart); /// let sparkle_heart = Utf32String::from_ustring(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::{U32String, Utf32String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// let sparkle_heart = U32String::from_vec(sparkle_heart); // Valid for a U32String /// /// assert!(Utf32String::from_ustring(sparkle_heart).is_err()); // But not for a Utf32String /// ``` #[inline] pub fn from_ustring(s: impl Into) -> Result { Self::from_vec(s.into().into_vec()) } /// Converts a wide string slice of undefined encoding to a UTF-32 string, including invalid /// characters. /// /// Since the given string slice may not be valid UTF-32, and [`Utf32String`] requires that /// it is always valid UTF-32, during the conversion this function replaces any invalid UTF-32 /// sequences with [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which /// looks like this: ๏ฟฝ /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the conversion, there is an unsafe version of this function, /// [`from_ustring_unchecked`][Self::from_ustring_unchecked], which has the same behavior but /// skips the checks. /// /// This function returns a [`Cow<'_, Utf32Str>`][std::borrow::Cow]. If the given slice is /// invalid UTF-32, then we need to insert our replacement characters which will change the size /// of the string, and hence, require an owned [`Utf32String`]. But if it's already valid /// UTF-32, we don't need a new allocation. This return type allows us to handle both cases. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// use widestring::{U32Str, Utf32String}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32Str::from_slice(&sparkle_heart); /// let sparkle_heart = Utf32String::from_ustr_lossy(sparkle_heart); /// /// assert_eq!(utf32str!("๐Ÿ’–"), sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// # use widestring::utf32str; /// use widestring::{U32Str, Utf32String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// let sparkle_heart = U32Str::from_slice(&sparkle_heart); /// let sparkle_heart = Utf32String::from_ustr_lossy(sparkle_heart); /// /// assert_eq!(utf32str!("\u{fffd}\u{fffd}"), sparkle_heart); /// ``` #[inline] #[must_use] pub fn from_ustr_lossy(s: &crate::U32Str) -> Cow<'_, Utf32Str> { Self::from_slice_lossy(s.as_slice()) } /// Converts a wide C string to a UTF-32 string without checking that the string contains /// valid UTF-32. /// /// The resulting string does *not* contain the nul terminator. /// /// See the safe version, [`from_ucstring`][Self::from_ucstring], for more information. /// /// # Safety /// /// This function is unsafe because it does not check that the string passed to it is valid /// UTF-32. If this constraint is violated, undefined behavior results as it is assumed the /// [`Utf32String`] is always valid UTF-32. /// /// # Examples /// /// ``` /// use widestring::{U32CString, Utf32String}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32CString::from_vec(sparkle_heart).unwrap(); /// let sparkle_heart = unsafe { Utf32String::from_ucstring_unchecked(sparkle_heart) }; /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub unsafe fn from_ucstring_unchecked(s: impl Into) -> Self { Self::from_vec_unchecked(s.into().into_vec()) } /// Converts a wide C string into a UTF-32 string. /// /// The resulting string does *not* contain the nul terminator. /// /// Not all wide C strings are valid to convert, since [`Utf32String`] requires that /// it is always valid UTF-32. This function checks to ensure that the string is valid UTF-32, /// and then does the conversion. This does not do any copying. /// /// If you are sure that the string is valid UTF-32, and you don't want to incur the overhead of /// the validity check, there is an unsafe version of this function, /// [`from_ucstring_unchecked`][Self::from_ucstring_unchecked], which has the same behavior but /// skips the check. /// /// If you need a string slice, consider using [`Utf32Str::from_ucstr`] instead. /// /// # Errors /// /// Returns an error if the string is not UTF-32 with a description as to why the provided /// string is not UTF-32. /// /// # Examples /// /// ``` /// use widestring::{U32CString, Utf32String}; /// /// let sparkle_heart = vec![0x1f496]; /// let sparkle_heart = U32CString::from_vec(sparkle_heart).unwrap(); /// let sparkle_heart = Utf32String::from_ucstring(sparkle_heart).unwrap(); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// use widestring::{U32CString, Utf32String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96]; // UTF-16 surrogates are invalid /// let sparkle_heart = U32CString::from_vec(sparkle_heart).unwrap(); // Valid for a U32CString /// /// assert!(Utf32String::from_ucstring(sparkle_heart).is_err()); // But not for a Utf32String /// ``` #[inline] pub fn from_ucstring(s: impl Into) -> Result { Self::from_vec(s.into().into_vec()) } /// Converts a wide C string slice of to a UTF-32 string, including invalid characters. /// /// The resulting string does *not* contain the nul terminator. /// /// Since the given string slice may not be valid UTF-32, and [`Utf32String`] requires that /// it is always valid UTF-32, during the conversion this function replaces any invalid UTF-32 /// sequences with [`U+FFFD REPLACEMENT CHARACTER`][core::char::REPLACEMENT_CHARACTER], which /// looks like this: ๏ฟฝ /// /// If you are sure that the slice is valid UTF-32, and you don't want to incur the overhead of /// the conversion, there is an unsafe version of this function, /// [`from_ucstring_unchecked`][Self::from_ucstring_unchecked], which has the same behavior but /// skips the checks. /// /// This function returns a [`Cow<'_, Utf32Str>`][std::borrow::Cow]. If the given slice is /// invalid UTF-32, then we need to insert our replacement characters which will change the size /// of the string, and hence, require an owned [`Utf32String`]. But if it's already valid /// UTF-32, we don't need a new allocation. This return type allows us to handle both cases. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// use widestring::{U32CStr, Utf32String}; /// /// let sparkle_heart = vec![0x1f496, 0x0]; /// let sparkle_heart = U32CStr::from_slice(&sparkle_heart).unwrap(); /// let sparkle_heart = Utf32String::from_ucstr_lossy(sparkle_heart); /// /// assert_eq!(utf32str!("๐Ÿ’–"), sparkle_heart); /// ``` /// /// With incorrect values that return an error: /// /// ``` /// # use widestring::utf32str; /// use widestring::{U32CStr, Utf32String}; /// /// let sparkle_heart = vec![0xd83d, 0xdc96, 0x0]; // UTF-16 surrogates are invalid /// let sparkle_heart = U32CStr::from_slice(&sparkle_heart).unwrap(); /// let sparkle_heart = Utf32String::from_ucstr_lossy(sparkle_heart); /// /// assert_eq!(utf32str!("\u{fffd}\u{fffd}"), sparkle_heart); /// ``` #[inline] #[must_use] pub fn from_ucstr_lossy(s: &crate::U32CStr) -> Cow<'_, Utf32Str> { Self::from_slice_lossy(s.as_slice()) } /// Converts a vector of [`char`]s into a UTF-32 string. /// /// Since [`char`]s are always valid UTF-32, this is infallible and efficient. /// /// If you need a string slice, consider using [`Utf32Str::from_char_slice`] instead. /// /// # Examples /// /// ``` /// use widestring::{U32CString, Utf32String}; /// /// let sparkle_heart = vec!['๐Ÿ’–']; /// let sparkle_heart = Utf32String::from_chars(sparkle_heart); /// /// assert_eq!("๐Ÿ’–", sparkle_heart); /// ``` #[inline] #[must_use] pub fn from_chars(s: impl Into>) -> Self { // SAFETY: Char slices are always valid UTF-32 // TODO: replace mem:transmute when Vec::into_raw_parts is stabilized // Clippy reports this is unsound due to different sized types; but the sizes are the same // size. Still best to swap to Vec::into_raw_parts asap. #[allow(clippy::unsound_collection_transmute)] unsafe { let vec: Vec = mem::transmute(s.into()); Self::from_vec_unchecked(vec) } } /// Appends the given [`char`] to the end of this string. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("abc"); /// /// s.push('1'); /// s.push('2'); /// s.push('3'); /// /// assert_eq!("abc123", s); /// ``` #[inline] pub fn push(&mut self, ch: char) { self.inner.push(ch.into()) } /// Shortens this string to the specified length. /// /// If `new_len` is greater than the string's current length, this has no effect. /// /// Note that this method has no effect on the allocated capacity of the string. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("hello"); /// s.truncate(2); /// assert_eq!("he", s); /// ``` #[inline] pub fn truncate(&mut self, new_len: usize) { self.inner.truncate(new_len) } /// Removes the last character from the string buffer and returns it. /// /// Returns [`None`] if this string is empty. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("foo"); /// /// assert_eq!(s.pop(), Some('o')); /// assert_eq!(s.pop(), Some('o')); /// assert_eq!(s.pop(), Some('f')); /// /// assert_eq!(s.pop(), None); /// ``` #[inline] pub fn pop(&mut self) -> Option { // SAFETY: String is already valid UTF-32 self.inner .pop() .map(|c| unsafe { core::char::from_u32_unchecked(c) }) } /// Removes a [`char`] from this string at an offset and returns it. /// /// This is an _O(n)_ operation, as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than or equal to the string's length. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("foo"); /// /// assert_eq!(s.remove(1), 'o'); /// assert_eq!(s.remove(0), 'f'); /// assert_eq!(s.remove(0), 'o'); /// ``` #[inline] pub fn remove(&mut self, idx: usize) -> char { let next = idx + 1; let len = self.len(); unsafe { let c = core::char::from_u32_unchecked(self.inner[idx]); ptr::copy( self.inner.as_ptr().add(next), self.inner.as_mut_ptr().add(idx), len - next, ); self.inner.set_len(len - (next - idx)); c } } /// Retains only the characters specified by the predicate. /// /// In other words, remove all characters `c` such that `f(c)` returns `false`. This method /// operates in place, visiting each character exactly once in the original order, and preserves /// the order of the retained characters. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("f_o_ob_ar"); /// /// s.retain(|c| c != '_'); /// /// assert_eq!(s, "foobar"); /// ``` /// /// Because the elements are visited exactly once in the original order, external state may be /// used to decide which elements to keep. /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("abcde"); /// let keep = [false, true, true, false, true]; /// let mut iter = keep.iter(); /// s.retain(|_| *iter.next().unwrap()); /// assert_eq!(s, "bce"); /// ``` pub fn retain(&mut self, mut f: F) where F: FnMut(char) -> bool, { let mut index = 0; while index < self.len() { // SAFETY: always in bounds let c = unsafe { self.get_unchecked(index..) } .chars() .next() .unwrap(); if !f(c) { self.inner.remove(index); } else { index += 1; } } } /// Inserts a character into this string at an offset. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::with_capacity(3); /// /// s.insert(0, 'f'); /// s.insert(1, 'o'); /// s.insert(1, 'o'); /// /// assert_eq!("foo", s); /// ``` #[inline] pub fn insert(&mut self, idx: usize, ch: char) { unsafe { self.insert_slice(idx, &[ch as u32]); } } /// Inserts a UTF-32 string slice into this string at an offset. /// /// This is an _O(n)_ operation as it requires copying every element in the buffer. /// /// # Panics /// /// Panics if `idx` is larger than the string's length. /// /// # Examples /// /// ``` /// # use widestring::utf32str; /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("bar"); /// /// s.insert_utfstr(0, utf32str!("foo")); /// /// assert_eq!("foobar", s); /// ``` #[inline] pub fn insert_utfstr(&mut self, idx: usize, string: &Utf32Str) { unsafe { self.insert_slice(idx, string.as_slice()); } } /// Splits the string into two at the given index. /// /// Returns a newly allocated string. `self` contains elements [0, at), and the returned string /// contains elements [at, len). /// /// Note that the capacity of `self` does not change. /// /// # Panics /// /// Panics if `at`it is beyond the last code point of the string. /// /// # Examples /// /// ``` /// use widestring::Utf32String; /// let mut hello = Utf32String::from_str("Hello, World!"); /// let world = hello.split_off(7); /// assert_eq!(hello, "Hello, "); /// assert_eq!(world, "World!"); /// ``` #[inline] #[must_use] pub fn split_off(&mut self, at: usize) -> Self { unsafe { Self::from_vec_unchecked(self.inner.split_off(at)) } } /// Creates a draining iterator that removes the specified range in the string and yields the /// removed [`char`]s. /// /// Note: The element range is removed even if the iterator is not consumed until the end. /// /// # Panics /// /// Panics if the starting point or end point are out of bounds. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::Utf32String; /// let mut s = Utf32String::from_str("ฮฑ is alpha, ฮฒ is beta"); /// let beta_offset = 12; /// /// // Remove the range up until the ฮฒ from the string /// let t: Utf32String = s.drain(..beta_offset).collect(); /// assert_eq!(t, "ฮฑ is alpha, "); /// assert_eq!(s, "ฮฒ is beta"); /// /// // A full range clears the string /// s.drain(..); /// assert_eq!(s, ""); /// ``` pub fn drain(&mut self, range: R) -> DrainUtf32<'_> where R: RangeBounds, { // WARNING: Using range again would be unsound // TODO: replace with core::slice::range when it is stabilized let core::ops::Range { start, end } = crate::range(range, ..self.len()); // Take out two simultaneous borrows. The self_ptr won't be accessed // until iteration is over, in Drop. let self_ptr: *mut _ = self; // SAFETY: `slice::range` and `is_char_boundary` do the appropriate bounds checks. let chars_iter = unsafe { self.get_unchecked(start..end) }.chars(); DrainUtf32 { start, end, iter: chars_iter, string: self_ptr, } } /// Removes the specified range in the string, and replaces it with the given string. /// /// The given string doesn't need to be the same length as the range. /// /// # Panics /// /// Panics if the starting point or end point are out of bounds. /// /// # Examples /// /// Basic usage: /// /// ``` /// use widestring::{utf32str, Utf32String}; /// let mut s = Utf32String::from_str("ฮฑ is alpha, ฮฒ is beta"); /// let beta_offset = 12; /// /// // Replace the range up until the ฮฒ from the string /// s.replace_range(..beta_offset, utf32str!("ฮ‘ is capital alpha; ")); /// assert_eq!(s, "ฮ‘ is capital alpha; ฮฒ is beta"); /// ``` #[inline] pub fn replace_range(&mut self, range: R, replace_with: &Utf32Str) where R: RangeBounds, { self.inner .splice(range, replace_with.as_slice().iter().copied()); } /// Converts string into a [`Vec`] of [`char`]s. /// /// This consumes the string without copying its contents. #[allow(trivial_casts)] #[inline] #[must_use] pub fn into_char_vec(self) -> Vec { let mut v = mem::ManuallyDrop::new(self.into_vec()); let (ptr, len, cap) = (v.as_mut_ptr(), v.len(), v.capacity()); // SAFETY: Self should be valid UTF-32 so chars will be in range unsafe { Vec::from_raw_parts(ptr as *mut char, len, cap) } } } impl AsMut<[char]> for Utf32String { #[inline] fn as_mut(&mut self) -> &mut [char] { self.as_char_slice_mut() } } impl AsRef<[char]> for Utf32String { #[inline] fn as_ref(&self) -> &[char] { self.as_char_slice() } } impl From> for Utf32String { #[inline] fn from(value: Vec) -> Self { Utf32String::from_chars(value) } } impl From<&[char]> for Utf32String { #[inline] fn from(value: &[char]) -> Self { Utf32String::from_chars(value) } } impl From for Vec { #[inline] fn from(value: Utf32String) -> Self { value.into_char_vec() } } impl PartialEq<[char]> for Utf32String { #[inline] fn eq(&self, other: &[char]) -> bool { self.as_char_slice() == other } } impl PartialEq for Utf32String { #[inline] fn eq(&self, other: &Utf16String) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for Utf16String { #[inline] fn eq(&self, other: &Utf32String) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<&Utf16Str> for Utf32String { #[inline] fn eq(&self, other: &&Utf16Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq<&Utf32Str> for Utf16String { #[inline] fn eq(&self, other: &&Utf32Str) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for &Utf16Str { #[inline] fn eq(&self, other: &Utf32String) -> bool { self.chars().eq(other.chars()) } } impl PartialEq for &Utf32Str { #[inline] fn eq(&self, other: &Utf16String) -> bool { self.chars().eq(other.chars()) } } impl TryFrom> for Utf16String { type Error = Utf16Error; #[inline] fn try_from(value: Vec) -> Result { Utf16String::from_vec(value) } } impl TryFrom> for Utf32String { type Error = Utf32Error; #[inline] fn try_from(value: Vec) -> Result { Utf32String::from_vec(value) } } impl TryFrom<&[u16]> for Utf16String { type Error = Utf16Error; #[inline] fn try_from(value: &[u16]) -> Result { Utf16String::from_vec(value) } } impl TryFrom<&[u32]> for Utf32String { type Error = Utf32Error; #[inline] fn try_from(value: &[u32]) -> Result { Utf32String::from_vec(value) } } /// Alias for [`Utf16String`] or [`Utf32String`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(not(windows))] pub type WideUtfString = Utf32String; /// Alias for [`Utf16String`] or [`Utf32String`] depending on platform. Intended to match typical C /// `wchar_t` size on platform. #[cfg(windows)] pub type WideUtfString = Utf16String; widestring-1.1.0/tests/debugger_visualizer.rs000064400000000000000000000053021046102023000175530ustar 00000000000000use debugger_test::debugger_test; use widestring::*; #[inline(never)] fn __break() {} #[debugger_test( debugger = "cdb", commands = r#" .nvlist dx u16_string dx u32_string dx u16_cstring dx u32_cstring dx utf16_string dx utf32_string dx u16_cstr dx u32_cstr "#, expected_statements = r#" u16_string : "my u16 string" [Type: widestring::ustring::U16String] [] [Type: widestring::ustring::U16String] [len] : 0xd [Type: unsigned __int64] [chars] u32_string : "my u32 string" [Type: widestring::ustring::U32String] [] [Type: widestring::ustring::U32String] [len] : 0xd [Type: unsigned __int64] [chars] u16_cstring : "my u16 cstring" [Type: widestring::ucstring::U16CString] [] [Type: widestring::ucstring::U16CString] [len] : 0xf [Type: unsigned __int64] [chars] u32_cstring : "my u32 cstring" [Type: widestring::ucstring::U32CString] [] [Type: widestring::ucstring::U32CString] [len] : 0xf [Type: unsigned __int64] [chars] utf16_string : "my utf16 string" [Type: widestring::utfstring::Utf16String] [] [Type: widestring::utfstring::Utf16String] [len] : 0xf [Type: unsigned __int64] [chars] utf32_string : "my utf32 string" [Type: widestring::utfstring::Utf32String] [] [Type: widestring::utfstring::Utf32String] [len] : 0xf [Type: unsigned __int64] [chars] u16_cstr [Type: ref$] pattern:\[\+0x000\] data_ptr : 0x[0-9a-f]+ : "my u16 cstr" \[Type: widestring::ucstr::U16CStr \*\] [+0x008] length : 0xc [Type: unsigned __int64] u32_cstr [Type: ref$] pattern:\[\+0x000\] data_ptr : 0x[0-9a-f]+ : "my u32 cstr" \[Type: widestring::ucstr::U32CStr \*\] [+0x008] length : 0xc [Type: unsigned __int64] "# )] fn test_debugger_visualizer() { let u16_string = U16String::from_str("my u16 string"); assert!(!u16_string.is_empty()); let u32_string = U32String::from_str("my u32 string"); assert!(!u32_string.is_empty()); let u16_cstring = U16CString::from_str("my u16 cstring").unwrap(); assert!(!u16_cstring.is_empty()); let u32_cstring = U32CString::from_str("my u32 cstring").unwrap(); assert!(!u32_cstring.is_empty()); let utf16_string = Utf16String::from_str("my utf16 string"); assert!(!utf16_string.is_empty()); let utf32_string = Utf32String::from_str("my utf32 string"); assert!(!utf32_string.is_empty()); let u16_cstr = u16cstr!("my u16 cstr"); assert!(!u16_cstr.is_empty()); let u32_cstr = u32cstr!("my u32 cstr"); assert!(!u32_cstr.is_empty()); __break(); // #break }