unicode-display_width-1.6.1/ 0000755 0000041 0000041 00000000000 13626455771 016057 5 ustar www-data www-data unicode-display_width-1.6.1/README.md 0000644 0000041 0000041 00000011745 13626455771 017346 0 ustar www-data www-data ## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [
](https://travis-ci.org/janlelis/unicode-display_width)
Determines the monospace display width of a string in Ruby. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. Other than [wcwidth()](https://github.com/janlelis/wcswidth-ruby), which fulfills a similar purpose, it does not rely on the OS vendor to provide an up-to-date method for measuring string width.
Unicode version: **12.1.0** (May 2019)
Supported Rubies: **2.7**, **2.6**, **2.5**, **2.4**
Old Rubies that might still work: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
## Introduction to Character Widths
Guessing the correct space a character will consume on terminals is not easy. There is no single standard. Most implementations combine data from [East Asian Width](https://www.unicode.org/reports/tr11/), some [General Categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category), and hand-picked adjustments.
### How this Library Handles Widths
Further at the top means higher precedence. Please expect changes to this algorithm with every MINOR version update (the X in 1.X.0)!
Width | Characters | Comment
-------|------------------------------|--------------------------------------------------
X | (user defined) | Overwrites any other values
-1 | `"\b"` | Backspace (total width never below 0)
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) that do not change horizontal width
1 | `"\u{00AD}"` | SOFT HYPHEN
2 | `"\u{2E3A}"` | TWO-EM DASH
3 | `"\u{2E3B}"` | THREE-EM DASH
0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
0 | `"\u{1160}".."\u{11FF}"` | HANGUL JUNGSEONG
0 | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
2 | East Asian Width: F, W | Full-width characters
2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1
1 | All other codepoints | -
## Install
Install the gem with:
$ gem install unicode-display_width
Or add to your Gemfile:
gem 'unicode-display_width'
## Usage
```ruby
require 'unicode/display_width'
Unicode::DisplayWidth.of("⚀") # => 1
Unicode::DisplayWidth.of("一") # => 2
```
### Ambiguous Characters
The second parameter defines the value returned by characters defined as ambiguous:
```ruby
Unicode::DisplayWidth.of("·", 1) # => 1
Unicode::DisplayWidth.of("·", 2) # => 2
```
### Custom Overwrites
You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:
```ruby
Unicode::DisplayWidth.of("a\tb", 1, 0x09 => 10)) # => 12
```
### Emoji Support
Experimental emoji support is included. It will adjust the string's size for modifier and zero-width joiner sequences. You will need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
```ruby
gem 'unicode-display_width'
gem 'unicode-emoji'
```
You can then activate the emoji string width adjustments by passing `emoji: true` as fourth parameter:
```ruby
Unicode::DisplayWidth.of "🤾🏽♀️" # => 5
Unicode::DisplayWidth.of "🤾🏽♀️", 1, {}, emoji: true # => 2
```
### Usage with String Extension
Activated by default. Will be deactivated in version 2.0:
```ruby
require 'unicode/display_width/string_ext'
"⚀".display_width #=> 1
'一'.display_width #=> 2
```
You can actively opt-out from the string extension with: `require 'unicode/display_width/no_string_ext'`
### Usage From the CLI
Use this one-liner to print out display widths for strings from the command-line:
```
$ gem install unicode-display_width
$ ruby -r unicode/display_width -e 'puts Unicode::DisplayWidth.of $*[0]' -- "一"
```
Replace "一" with the actual string to measure
## Other Implementations & Discussion
- Python: https://github.com/jquast/wcwidth
- JavaScript: https://github.com/mycoboco/wcwidth.js
- C: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
- C for Julia: https://github.com/JuliaLang/utf8proc/issues/2
See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
## Copyright & Info
- Copyright (c) 2011, 2015-2020 Jan Lelis, https://janlelis.com, released under the MIT
license
- Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
unicode-display_width-1.6.1/CHANGELOG.md 0000644 0000041 0000041 00000005303 13626455771 017671 0 ustar www-data www-data # CHANGELOG
## 1.6.1
- Fix that ambiguous and overwrite options where ignored for emoji-measuring
## 1.6.0
- Unicode 12.1
## 1.5.0
- Unicode 12
## 1.4.1
- Only bundle required lib/* and data/* files in actual rubygem, patch by @tas50
## 1.4.0
- Unicode 11
## 1.3.3
- Replace Gem::Util.gunzip with direct zlib implementation
This removes the dependency on rubygems, fixes #17
## 1.3.2
- Explicitly load rubygems/util, fixes regression in 1.3.1 (autoload issue)
## 1.3.1
- Use `Gem::Util` for `gunzip`, removes deprecation warning, patch by @Schwad
## 1.3.0
- Unicode 10
## 1.2.1
- Fix bug that `emoji: true` would fail for emoji without modifier
## 1.2.0
- Add zero-width codepoint ranges: U+2060..U+206F, U+FFF0..U+FFF8, U+E0000..U+E0FFF
- Add full-witdh codepoint ranges: U+3400..U+4DBF, U+4E00..U+9FFF, U+F900..U+FAFF, U+20000..U+2FFFD, U+30000..U+3FFFD
- Experimental emoji support using the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem
- Fix minor bug in index compression scheme
## 1.1.3
- Fix that non-UTF-8 encodings do not throw errors, patch by @windwiny
## 1.1.2
- Reduce memory consumption and increase performance, patch by @rrosenblum
## 1.1.1
- Always load index into memory, fixes #9
## 1.1.0
- Support Unicode 9.0
## 1.0.5
- Actually include new index from 1.0.4
## 1.0.4
- New index format (much smaller) and internal API changes
- Move index generation to a builder plugin for the unicoder gem
- No public API changes
## 1.0.3
- Avoid circular dependency warning
## 1.0.2
- Fix error that gemspec might be invalid under some circumstances (see gh#6)
## 1.0.1
- Inofficially allow Ruby 1.9
## 1.0.0
- Faster than 0.3.1
- Advanced determination of character width
- This includes: Treat width of most chars of general categories (Mn, Me, Cf) as 0
- This includes: Introduce list of characters with special widths
- Allow custom overrides for specific codepoints
- Set required Ruby version to 2.0
- Add NO_STRING_EXT mode to disable monkey patching
- Internal API & index format changed drastically
- Remove require 'unicode/display_size' (use 'unicode/display_width' instead)
## 0.3.1
- Faster than 0.3.0
- Deprecate usage of aliases: String#display_size and String#display_length
- Eliminate Ruby warnings (@amatsuda)
## 0.3.0
- Update EastAsianWidth from 7.0 to 8.0
- Add rake task to update EastAsianWidth.txt
- Move code to generate index from library to Rakefile
- Update project's meta files
- Deprecate requiring 'unicode-display_size'
## 0.2.0
- Update EastAsianWidth from 6.0 to 7.0
- Don't build index table automatically when not available
- Don't include EastAsianWidth.txt in gem (only index)
## 0.1.0
- Fix github issue #1
## 0.1.0
- Initial release
unicode-display_width-1.6.1/data/ 0000755 0000041 0000041 00000000000 13626455771 016770 5 ustar www-data www-data unicode-display_width-1.6.1/data/display_width.marshal.gz 0000644 0000041 0000041 00000003106 13626455771 023624 0 ustar www-data www-data Ђ\ ZKv:͠Nb'uZ?I ?ډԧ\H}^zԫ>/ۃBz-~-Ww}wKZgGi1a|7A |
R8
yB0X:_}FJQ(A{e[3zpnhn[
_WMokø|4(5N;j!]?
lF'(vC^Ig3Xz1rG&HDR*2 DZo.R,!3se(|*aQמzj[f Ss#Dbzc2 uGRra?eKG1c.[+gQ/攬EQ]Wπڭ>+/(`_}&q՞s&܊Gzzyswa |wpǶ>FU, Z(0^IIkQ41L66<
hg汣L%G!jׯ*i8" ̗t3q/tՉ+NV|r|G sQ%riS.] yۥzBs^i渆©ւBH!uyP,qm5JP}~w,6}f,S52bWHA73E"4FفUJ b(p XFm
"m"ʸtIq)d3e^ #c&EkJ-+૭OL
4>ݲ{k4Y H@anտf$ \[JIS&aTb ̒7̒)6ͨ`wwsdܱ;&I0Uڜ"5ƄV96&yRn3駼+:èQĤՂ)rPx(Wx=Ӯݤm+8EpEXJTXy6A8κR-*>>01bf^&Ҝ$$k>sD*U|9L8y4[ߡPs\߶|XeyB+7^p2FIͦmS:H.݃ .$ᜃpM*߱"As}aDS-o4.NS!ɀE6ŌR(oA\`%otk^m#@E$?:L/dńdI_[+p kH䍤c2"]PE:ˁ_Kgi"$c@tC
ub=gd{Hˈ^'ڹ'& Y2z,0ػƉ[^F\Xux Hڦ!))O-BMy^K-_*ғZ%i)kLVaT!Rlp-&{IyVY+ev
y0QGʖgLkyQ2H(P!:sۘ?h%@f/Ij/V͂R' unicode-display_width-1.6.1/unicode-display_width.gemspec 0000644 0000041 0000041 00000004422 13626455771 023716 0 ustar www-data www-data #########################################################
# This file has been automatically generated by gem2tgz #
#########################################################
# -*- encoding: utf-8 -*-
# stub: unicode-display_width 1.6.1 ruby lib
Gem::Specification.new do |s|
s.name = "unicode-display_width".freeze
s.version = "1.6.1"
s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
s.metadata = { "bug_tracker_uri" => "https://github.com/janlelis/unicode-display_width/issues", "changelog_uri" => "https://github.com/janlelis/unicode-display_width/blob/master/CHANGELOG.md", "source_code_uri" => "https://github.com/janlelis/unicode-display_width" } if s.respond_to? :metadata=
s.require_paths = ["lib".freeze]
s.authors = ["Jan Lelis".freeze]
s.date = "2020-01-16"
s.description = "[Unicode 12.1.0] Determines the monospace display width of a string using EastAsianWidth.txt, Unicode general category, and other data.".freeze
s.email = ["hi@ruby.consulting".freeze]
s.extra_rdoc_files = ["CHANGELOG.md".freeze, "MIT-LICENSE.txt".freeze, "README.md".freeze]
s.files = ["CHANGELOG.md".freeze, "MIT-LICENSE.txt".freeze, "README.md".freeze, "data/display_width.marshal.gz".freeze, "lib/unicode/display_width.rb".freeze, "lib/unicode/display_width/constants.rb".freeze, "lib/unicode/display_width/index.rb".freeze, "lib/unicode/display_width/no_string_ext.rb".freeze, "lib/unicode/display_width/string_ext.rb".freeze]
s.homepage = "https://github.com/janlelis/unicode-display_width".freeze
s.licenses = ["MIT".freeze]
s.required_ruby_version = Gem::Requirement.new(">= 1.9.3".freeze)
s.rubygems_version = "2.5.2.1".freeze
s.summary = "Determines the monospace display width of a string in Ruby.".freeze
if s.respond_to? :specification_version then
s.specification_version = 4
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
s.add_development_dependency(%q.freeze, ["~> 10.4"])
s.add_development_dependency(%q.freeze, ["~> 3.4"])
else
s.add_dependency(%q.freeze, ["~> 10.4"])
s.add_dependency(%q.freeze, ["~> 3.4"])
end
else
s.add_dependency(%q.freeze, ["~> 10.4"])
s.add_dependency(%q.freeze, ["~> 3.4"])
end
end
unicode-display_width-1.6.1/lib/ 0000755 0000041 0000041 00000000000 13626455771 016625 5 ustar www-data www-data unicode-display_width-1.6.1/lib/unicode/ 0000755 0000041 0000041 00000000000 13626455771 020253 5 ustar www-data www-data unicode-display_width-1.6.1/lib/unicode/display_width.rb 0000644 0000041 0000041 00000003415 13626455771 023447 0 ustar www-data www-data require_relative 'display_width/constants'
require_relative 'display_width/index'
module Unicode
module DisplayWidth
DEPTHS = [0x10000, 0x1000, 0x100, 0x10].freeze
def self.of(string, ambiguous = 1, overwrite = {}, options = {})
res = string.codepoints.inject(0){ |total_width, codepoint|
index_or_value = INDEX
codepoint_depth_offset = codepoint
DEPTHS.each{ |depth|
index_or_value = index_or_value[codepoint_depth_offset / depth]
codepoint_depth_offset = codepoint_depth_offset % depth
break unless index_or_value.is_a? Array
}
width = index_or_value.is_a?(Array) ? index_or_value[codepoint_depth_offset] : index_or_value
width = ambiguous if width == :A
total_width + (overwrite[codepoint] || width || 1)
}
res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
res < 0 ? 0 : res
end
def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
require "unicode/emoji"
extra_width = 0
modifier_regex = /[#{ Unicode::Emoji::EMOJI_MODIFIERS.pack("U*") }]/
zwj_regex = /(?<=#{ [Unicode::Emoji::ZWJ].pack("U") })./
string.scan(Unicode::Emoji::REGEX){ |emoji|
extra_width += 2 * emoji.scan(modifier_regex).size
emoji.scan(zwj_regex){ |zwj_succ|
extra_width += self.of(zwj_succ, ambiguous, overwrite)
}
}
extra_width
end
end
end
# Allows you to opt-out of the default string extension. Will eventually be removed,
# so you must opt-in for the core extension by requiring 'display_width/string_ext'
unless defined?(Unicode::DisplayWidth::NO_STRING_EXT) && Unicode::DisplayWidth::NO_STRING_EXT
require_relative 'display_width/string_ext'
end
unicode-display_width-1.6.1/lib/unicode/display_width/ 0000755 0000041 0000041 00000000000 13626455771 023117 5 ustar www-data www-data unicode-display_width-1.6.1/lib/unicode/display_width/string_ext.rb 0000644 0000041 0000041 00000001043 13626455771 025630 0 ustar www-data www-data require_relative '../display_width' unless defined? Unicode::DisplayWidth
class String
def display_width(ambiguous = 1, overwrite = {}, options = {})
Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
end
def display_size(*args)
warn "Deprecation warning: Please use `String#display_width` instead of `String#display_size`"
display_width(*args)
end
def display_length(*args)
warn "Deprecation warning: Please use `String#display_width` instead of `String#display_length`"
display_width(*args)
end
end
unicode-display_width-1.6.1/lib/unicode/display_width/index.rb 0000644 0000041 0000041 00000000454 13626455771 024556 0 ustar www-data www-data require 'zlib'
require_relative 'constants'
module Unicode
module DisplayWidth
File.open(INDEX_FILENAME, "rb") do |file|
serialized_data = Zlib::GzipReader.new(file).read
serialized_data.force_encoding Encoding::BINARY
INDEX = Marshal.load(serialized_data)
end
end
end
unicode-display_width-1.6.1/lib/unicode/display_width/constants.rb 0000644 0000041 0000041 00000000417 13626455771 025462 0 ustar www-data www-data module Unicode
module DisplayWidth
VERSION = '1.6.1'
UNICODE_VERSION = "12.1.0".freeze
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
INDEX_FILENAME = (DATA_DIRECTORY + '/display_width.marshal.gz').freeze
end
end
unicode-display_width-1.6.1/lib/unicode/display_width/no_string_ext.rb 0000644 0000041 0000041 00000000155 13626455771 026327 0 ustar www-data www-data module Unicode
module DisplayWidth
NO_STRING_EXT = true
end
end
require_relative '../display_width'
unicode-display_width-1.6.1/MIT-LICENSE.txt 0000644 0000041 0000041 00000002071 13626455771 020331 0 ustar www-data www-data The MIT LICENSE
Copyright (c) 2011, 2015-2020 Jan Lelis
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.