wcwidth-0.1.7/0000755000076500000240000000000012735767333013642 5ustar jquaststaff00000000000000wcwidth-0.1.7/LICENSE.txt0000644000076500000240000000211512644025417015451 0ustar jquaststaff00000000000000The MIT License (MIT) Copyright (c) 2014 Jeff Quast Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. wcwidth-0.1.7/MANIFEST.in0000644000076500000240000000003212644025417015360 0ustar jquaststaff00000000000000include LICENSE.txt *.rst wcwidth-0.1.7/PKG-INFO0000644000076500000240000002404212735767333014741 0ustar jquaststaff00000000000000Metadata-Version: 1.1 Name: wcwidth Version: 0.1.7 Summary: Measures number of Terminal column cells of wide-character codes Home-page: https://github.com/jquast/wcwidth Author: Jeff Quast Author-email: contact@jeffquast.com License: MIT Description: .. image:: https://img.shields.io/travis/jquast/wcwidth.svg :target: https://travis-ci.org/jquast/wcwidth :alt: Travis Continous Integration .. image:: https://img.shields.io/coveralls/jquast/wcwidth.svg :target: https://coveralls.io/r/jquast/wcwidth :alt: Coveralls Code Coverage .. image:: https://img.shields.io/pypi/v/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: Latest Version .. image:: https://img.shields.io/github/license/jquast/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: License .. image:: https://img.shields.io/pypi/wheel/wcwidth.svg :alt: Wheel Status .. image:: https://img.shields.io/pypi/dm/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: Downloads ============ Introduction ============ This Library is mainly for those implementing a Terminal Emulator, or programs that carefully produce output to be interpreted by one. **Problem Statement**: When printed to the screen, the length of the string is usually equal to the number of cells it occupies. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0. **Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide `wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's functions precisely copy. *These functions return the number of cells a unicode string is expected to occupy.* This library aims to be forward-looking, portable, and most correct. The most current release of this API is based on the Unicode Standard release files: ``DerivedGeneralCategory-9.0.0.txt`` *Date: 2016-06-01, 10:34:26 GMT* © 2016 Unicode®, Inc. ``EastAsianWidth-9.0.0.txt`` *Date: 2016-05-27, 17:00:00 GMT [KW, LI]* © 2016 Unicode®, Inc. Installation ------------ The stable version of this package is maintained on pypi, install using pip:: pip install wcwidth Example ------- To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns:: >>> from wcwidth import wcswidth >>> text = u'コンニチハ' >>> text_len = wcswidth(text) >>> print(u' ' * (80 - text_len) + text) wcwidth, wcswidth ----------------- Use function ``wcwidth()`` to determine the length of a *single unicode character*, and ``wcswidth()`` to determine the length of a several, or a *string of unicode characters*. Briefly, return values of function ``wcwidth()`` are: ``-1`` Indeterminate (not printable). ``0`` Does not advance the cursor, such as NULL or Combining. ``2`` Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells. ``1`` All others. Function ``wcswidth()`` simply returns the sum of all values for each character along a string, or ``-1`` when it occurs anywhere along a string. More documentation is available using pydoc:: $ pydoc wcwidth ======= Caveats ======= This library attempts to determine the printable width by an unknown targeted terminal emulator. It does not provide any ability to discern what the target emulator software, version, of level of support is. Results may vary! A `crude method `_ of determining the level of unicode support by the target emulator may be performed using the VT100 Query Cursor Position sequence. The libc version of `wcwidth(3)`_ is often several unicode releases behind, and therefor several levels of support lower than this python library. You may determine an exacting list of these discrepancies using the project file `wcwidth-libc-comparator.py `_. ========== Developing ========== Install wcwidth in editable mode:: pip install -e. Install developer requirements:: pip install -r requirements-develop.txt Execute unit tests using tox:: tox Updating Tables --------------- The command ``python setup.py update`` will fetch the following resources: - http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt - http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt And generates the table files: - `wcwidth/table_wide.py `_ - `wcwidth/table_zero.py `_ Uses ---- This library is used in: - `jquast/blessed`_, a simplified wrapper around curses. - `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful interactive command lines in Python. Additional tools for displaying and testing wcwidth are found in the `bin/ `_ folder of this project's source code. They are not distributed. ======= History ======= 0.1.7 *2016-07-01* * **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_). 0.1.6 *2016-01-08 Production/Stable* * ``LICENSE`` file now included with distribution. 0.1.5 *2015-09-13 Alpha* * **Bugfix**: Resolution of "combining_ character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by `Philip Craig`_ via `PR #11`_. * **Deprecated**: The module path ``wcwidth.table_comb`` is no longer available, it has been superseded by module path ``wcwidth.table_zero``. 0.1.4 *2014-11-20 Pre-Alpha* * **Feature**: ``wcswidth()`` now determines printable length for (most) combining_ characters. The developer's tool `bin/wcwidth-browser.py`_ is improved to display combining_ characters when provided the ``--combining`` option (`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_). * **Feature**: added static analysis (prospector_) to testing framework. 0.1.3 *2014-10-29 Pre-Alpha* * **Bugfix**: 2nd parameter of wcswidth was not honored. (`Thomas Ballinger`_, `PR #4`_). 0.1.2 *2014-10-28 Pre-Alpha* * **Updated** tables to Unicode Specification 7.0.0. (`Thomas Ballinger`_, `PR #3`_). 0.1.1 *2014-05-14 Pre-Alpha* * Initial release to pypi, Based on Unicode Specification 6.3.0 This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:: * Markus Kuhn -- 2007-05-26 (Unicode 5.0) * * Permission to use, copy, modify, and distribute this software * for any purpose and without fee is hereby granted. The author * disclaims all warranties with regard to this software. .. _`prospector`: https://github.com/landscapeio/prospector .. _`combining`: https://en.wikipedia.org/wiki/Combining_character .. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py .. _`Thomas Ballinger`: https://github.com/thomasballinger .. _`Leta Montopoli`: https://github.com/lmontopo .. _`Philip Craig`: https://github.com/philipc .. _`PR #3`: https://github.com/jquast/wcwidth/pull/3 .. _`PR #4`: https://github.com/jquast/wcwidth/pull/4 .. _`PR #5`: https://github.com/jquast/wcwidth/pull/5 .. _`PR #11`: https://github.com/jquast/wcwidth/pull/11 .. _`PR #18`: https://github.com/jquast/wcwidth/pull/18 .. _`jquast/blessed`: https://github.com/jquast/blessed .. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit .. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html .. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html Keywords: terminal,emulator,wcwidth,wcswidth,cjk,combining,xterm,console Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Natural Language :: English Classifier: Development Status :: 3 - Alpha Classifier: Environment :: Console Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: POSIX Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Software Development :: Localization Classifier: Topic :: Software Development :: Internationalization Classifier: Topic :: Terminals wcwidth-0.1.7/README.rst0000644000076500000240000001653712735766622015345 0ustar jquaststaff00000000000000.. image:: https://img.shields.io/travis/jquast/wcwidth.svg :target: https://travis-ci.org/jquast/wcwidth :alt: Travis Continous Integration .. image:: https://img.shields.io/coveralls/jquast/wcwidth.svg :target: https://coveralls.io/r/jquast/wcwidth :alt: Coveralls Code Coverage .. image:: https://img.shields.io/pypi/v/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: Latest Version .. image:: https://img.shields.io/github/license/jquast/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: License .. image:: https://img.shields.io/pypi/wheel/wcwidth.svg :alt: Wheel Status .. image:: https://img.shields.io/pypi/dm/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: Downloads ============ Introduction ============ This Library is mainly for those implementing a Terminal Emulator, or programs that carefully produce output to be interpreted by one. **Problem Statement**: When printed to the screen, the length of the string is usually equal to the number of cells it occupies. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0. **Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide `wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's functions precisely copy. *These functions return the number of cells a unicode string is expected to occupy.* This library aims to be forward-looking, portable, and most correct. The most current release of this API is based on the Unicode Standard release files: ``DerivedGeneralCategory-9.0.0.txt`` *Date: 2016-06-01, 10:34:26 GMT* © 2016 Unicode®, Inc. ``EastAsianWidth-9.0.0.txt`` *Date: 2016-05-27, 17:00:00 GMT [KW, LI]* © 2016 Unicode®, Inc. Installation ------------ The stable version of this package is maintained on pypi, install using pip:: pip install wcwidth Example ------- To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns:: >>> from wcwidth import wcswidth >>> text = u'コンニチハ' >>> text_len = wcswidth(text) >>> print(u' ' * (80 - text_len) + text) wcwidth, wcswidth ----------------- Use function ``wcwidth()`` to determine the length of a *single unicode character*, and ``wcswidth()`` to determine the length of a several, or a *string of unicode characters*. Briefly, return values of function ``wcwidth()`` are: ``-1`` Indeterminate (not printable). ``0`` Does not advance the cursor, such as NULL or Combining. ``2`` Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells. ``1`` All others. Function ``wcswidth()`` simply returns the sum of all values for each character along a string, or ``-1`` when it occurs anywhere along a string. More documentation is available using pydoc:: $ pydoc wcwidth ======= Caveats ======= This library attempts to determine the printable width by an unknown targeted terminal emulator. It does not provide any ability to discern what the target emulator software, version, of level of support is. Results may vary! A `crude method `_ of determining the level of unicode support by the target emulator may be performed using the VT100 Query Cursor Position sequence. The libc version of `wcwidth(3)`_ is often several unicode releases behind, and therefor several levels of support lower than this python library. You may determine an exacting list of these discrepancies using the project file `wcwidth-libc-comparator.py `_. ========== Developing ========== Install wcwidth in editable mode:: pip install -e. Install developer requirements:: pip install -r requirements-develop.txt Execute unit tests using tox:: tox Updating Tables --------------- The command ``python setup.py update`` will fetch the following resources: - http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt - http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt And generates the table files: - `wcwidth/table_wide.py `_ - `wcwidth/table_zero.py `_ Uses ---- This library is used in: - `jquast/blessed`_, a simplified wrapper around curses. - `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful interactive command lines in Python. Additional tools for displaying and testing wcwidth are found in the `bin/ `_ folder of this project's source code. They are not distributed. ======= History ======= 0.1.7 *2016-07-01* * **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_). 0.1.6 *2016-01-08 Production/Stable* * ``LICENSE`` file now included with distribution. 0.1.5 *2015-09-13 Alpha* * **Bugfix**: Resolution of "combining_ character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by `Philip Craig`_ via `PR #11`_. * **Deprecated**: The module path ``wcwidth.table_comb`` is no longer available, it has been superseded by module path ``wcwidth.table_zero``. 0.1.4 *2014-11-20 Pre-Alpha* * **Feature**: ``wcswidth()`` now determines printable length for (most) combining_ characters. The developer's tool `bin/wcwidth-browser.py`_ is improved to display combining_ characters when provided the ``--combining`` option (`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_). * **Feature**: added static analysis (prospector_) to testing framework. 0.1.3 *2014-10-29 Pre-Alpha* * **Bugfix**: 2nd parameter of wcswidth was not honored. (`Thomas Ballinger`_, `PR #4`_). 0.1.2 *2014-10-28 Pre-Alpha* * **Updated** tables to Unicode Specification 7.0.0. (`Thomas Ballinger`_, `PR #3`_). 0.1.1 *2014-05-14 Pre-Alpha* * Initial release to pypi, Based on Unicode Specification 6.3.0 This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:: * Markus Kuhn -- 2007-05-26 (Unicode 5.0) * * Permission to use, copy, modify, and distribute this software * for any purpose and without fee is hereby granted. The author * disclaims all warranties with regard to this software. .. _`prospector`: https://github.com/landscapeio/prospector .. _`combining`: https://en.wikipedia.org/wiki/Combining_character .. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py .. _`Thomas Ballinger`: https://github.com/thomasballinger .. _`Leta Montopoli`: https://github.com/lmontopo .. _`Philip Craig`: https://github.com/philipc .. _`PR #3`: https://github.com/jquast/wcwidth/pull/3 .. _`PR #4`: https://github.com/jquast/wcwidth/pull/4 .. _`PR #5`: https://github.com/jquast/wcwidth/pull/5 .. _`PR #11`: https://github.com/jquast/wcwidth/pull/11 .. _`PR #18`: https://github.com/jquast/wcwidth/pull/18 .. _`jquast/blessed`: https://github.com/jquast/blessed .. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit .. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html .. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html wcwidth-0.1.7/setup.cfg0000644000076500000240000000013012735767333015455 0ustar jquaststaff00000000000000[bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 wcwidth-0.1.7/setup.py0000755000076500000240000002567312735640546015370 0ustar jquaststaff00000000000000#!/usr/bin/env python """ Setup module for wcwidth. https://github.com/jquast/wcwidth You may execute setup.py with special arguments: - ``update``: Updates unicode reference files of the project to latest. - ``test``: Executes test runner (tox) """ from __future__ import print_function import os import setuptools import setuptools.command.test try: # py2 from urllib2 import urlopen except ImportError: # py3 from urllib.request import urlopen HERE = os.path.dirname(__file__) # use chr() for py3.x, # unichr() for py2.x try: _ = unichr(0) except NameError as err: if err.args[0] == "name 'unichr' is not defined": # pylint: disable=C0103,W0622 # Invalid constant name "unichr" (col 8) # Redefining built-in 'unichr' (col 8) unichr = chr else: raise class SetupUpdate(setuptools.Command): """ 'setup.py update' fetches and updates local unicode code tables. """ # pylint: disable=R0904 # Too many public methods (43/20) description = "Fetch and update unicode code tables" user_options = [] EAW_URL = ('http://www.unicode.org/Public/UNIDATA/' 'EastAsianWidth.txt') UCD_URL = ('http://www.unicode.org/Public/UNIDATA/extracted/' 'DerivedGeneralCategory.txt') EAW_IN = os.path.join(HERE, 'data', 'EastAsianWidth.txt') UCD_IN = os.path.join(HERE, 'data', 'DerivedGeneralCategory.txt') EAW_OUT = os.path.join(HERE, 'wcwidth', 'table_wide.py') ZERO_OUT = os.path.join(HERE, 'wcwidth', 'table_zero.py') README_RST = os.path.join(HERE, 'README.RST') README_PATCH_FROM = "the Unicode Standard release files:" README_PATCH_TO = "Installation" def initialize_options(self): """Override builtin method: no options are available.""" pass def finalize_options(self): """Override builtin method: no options are available.""" pass def run(self): """Update east-asian, combining and zero width tables.""" self._do_east_asian() self._do_zero_width() self._do_readme_update() def _do_readme_update(self): """Patch README.rst to reflect the data files used in release.""" import codecs import glob # read in, data_in = codecs.open( os.path.join(HERE, 'README.rst'), 'r', 'utf8').read() # search for beginning and end positions, pos_begin = data_in.find(self.README_PATCH_FROM) assert pos_begin != -1, (pos_begin, self.README_PATCH_FROM) pos_begin += len(self.README_PATCH_FROM) pos_end = data_in.find(self.README_PATCH_TO) assert pos_end != -1, (pos_end, self.README_PATCH_TO) glob_pattern = os.path.join(HERE, 'data', '*.txt') file_descriptions = [ self._describe_file_header(fpath) for fpath in glob.glob(glob_pattern)] # patch, data_out = ( data_in[:pos_begin] + '\n\n' + '\n'.join(file_descriptions) + '\n\n' + data_in[pos_end:] ) # write. print("patching {} ..".format(self.README_RST)) codecs.open( self.README_RST, 'w', 'utf8').write(data_out) def _do_east_asian(self): """Fetch and update east-asian tables.""" self._do_retrieve(self.EAW_URL, self.EAW_IN) (version, date, values) = self._parse_east_asian( fname=self.EAW_IN, properties=(u'W', u'F',) ) table = self._make_table(values) self._do_write(self.EAW_OUT, 'WIDE_EASTASIAN', version, date, table) def _do_zero_width(self): """Fetch and update zero width tables.""" self._do_retrieve(self.UCD_URL, self.UCD_IN) (version, date, values) = self._parse_category( fname=self.UCD_IN, categories=('Me', 'Mn',) ) table = self._make_table(values) self._do_write(self.ZERO_OUT, 'ZERO_WIDTH', version, date, table) @staticmethod def _make_table(values): """Return a tuple of lookup tables for given values.""" import collections table = collections.deque() start, end = values[0], values[0] for num, value in enumerate(values): if num == 0: table.append((value, value,)) continue start, end = table.pop() if end == value - 1: table.append((start, value,)) else: table.append((start, end,)) table.append((value, value,)) return tuple(table) @staticmethod def _do_retrieve(url, fname): """Retrieve given url to target filepath fname.""" folder = os.path.dirname(fname) if not os.path.exists(folder): os.makedirs(folder) print("{}/ created.".format(folder)) if not os.path.exists(fname): with open(fname, 'wb') as fout: print("retrieving {}.".format(url)) resp = urlopen(url) fout.write(resp.read()) print("{} saved.".format(fname)) else: print("re-using artifact {}".format(fname)) return fname @staticmethod def _describe_file_header(fpath): import codecs header_3 = [line.lstrip('# ').rstrip() for line in codecs.open(fpath, 'r', 'utf8').readlines()[:3]] return ('``{0}``\n' # ``EastAsianWidth-8.0.0.txt`` ' *{1}*\n' # *2015-02-10, 21:00:00 GMT [KW, LI]* ' {2}\n' # (c) 2016 Unicode(R), Inc. .format(*header_3)) @staticmethod def _parse_east_asian(fname, properties=(u'W', u'F',)): """Parse unicode east-asian width tables.""" version, date, values = None, None, [] print("parsing {} ..".format(fname)) for line in open(fname, 'rb'): uline = line.decode('utf-8') if version is None: version = uline.split(None, 1)[1].rstrip() continue elif date is None: date = uline.split(':', 1)[1].rstrip() continue if uline.startswith('#') or not uline.lstrip(): continue addrs, details = uline.split(';', 1) if any(details.startswith(property) for property in properties): start, stop = addrs, addrs if '..' in addrs: start, stop = addrs.split('..') values.extend(range(int(start, 16), int(stop, 16) + 1)) return version, date, sorted(values) @staticmethod def _parse_category(fname, categories): """Parse unicode category tables.""" version, date, values = None, None, [] print("parsing {} ..".format(fname)) for line in open(fname, 'rb'): uline = line.decode('utf-8') if version is None: version = uline.split(None, 1)[1].rstrip() continue elif date is None: date = uline.split(':', 1)[1].rstrip() continue if uline.startswith('#') or not uline.lstrip(): continue addrs, details = uline.split(';', 1) addrs, details = addrs.rstrip(), details.lstrip() if any(details.startswith('{} #'.format(value)) for value in categories): start, stop = addrs, addrs if '..' in addrs: start, stop = addrs.split('..') values.extend(range(int(start, 16), int(stop, 16) + 1)) return version, date, sorted(values) @staticmethod def _do_write(fname, variable, version, date, table): """Write combining tables to filesystem as python code.""" # pylint: disable=R0914 # Too many local variables (19/15) (col 4) print("writing {} ..".format(fname)) import unicodedata import datetime import string utc_now = datetime.datetime.utcnow() indent = 4 with open(fname, 'w') as fout: fout.write( '"""{variable_proper} table. Created by setup.py."""\n' "# Generated: {iso_utc}\n" "# Source: {version}\n" "# Date: {date}\n" "{variable} = (".format(iso_utc=utc_now.isoformat(), version=version, date=date, variable=variable, variable_proper=variable.title())) for start, end in table: ucs_start, ucs_end = unichr(start), unichr(end) hex_start, hex_end = ('0x{0:04x}'.format(start), '0x{0:04x}'.format(end)) try: name_start = string.capwords(unicodedata.name(ucs_start)) except ValueError: name_start = u'' try: name_end = string.capwords(unicodedata.name(ucs_end)) except ValueError: name_end = u'' fout.write('\n' + (' ' * indent)) fout.write('({0}, {1},),'.format(hex_start, hex_end)) fout.write(' # {0:24s}..{1}'.format( name_start[:24].rstrip() or '(nil)', name_end[:24].rstrip())) fout.write('\n)\n') print("complete.") def main(): """Setup.py entry point.""" import codecs setuptools.setup( name='wcwidth', version='0.1.7', description=("Measures number of Terminal column cells " "of wide-character codes"), long_description=codecs.open( os.path.join(HERE, 'README.rst'), 'r', 'utf8').read(), author='Jeff Quast', author_email='contact@jeffquast.com', license='MIT', packages=['wcwidth', 'wcwidth.tests'], url='https://github.com/jquast/wcwidth', include_package_data=True, test_suite='wcwidth.tests', zip_safe=True, classifiers=[ 'Intended Audience :: Developers', 'Natural Language :: English', 'Development Status :: 3 - Alpha', 'Environment :: Console', 'License :: OSI Approved :: MIT License', 'Operating System :: POSIX', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Topic :: Software Development :: Libraries', 'Topic :: Software Development :: Localization', 'Topic :: Software Development :: Internationalization', 'Topic :: Terminals' ], keywords=['terminal', 'emulator', 'wcwidth', 'wcswidth', 'cjk', 'combining', 'xterm', 'console', ], cmdclass={'update': SetupUpdate}, ) if __name__ == '__main__': main() wcwidth-0.1.7/wcwidth/0000755000076500000240000000000012735767333015313 5ustar jquaststaff00000000000000wcwidth-0.1.7/wcwidth/__init__.py0000644000076500000240000000021412513047753017410 0ustar jquaststaff00000000000000"""wcwidth module, https://github.com/jquast/wcwidth.""" from .wcwidth import wcwidth, wcswidth # noqa __all__ = ('wcwidth', 'wcswidth',) wcwidth-0.1.7/wcwidth/tests/0000755000076500000240000000000012735767333016455 5ustar jquaststaff00000000000000wcwidth-0.1.7/wcwidth/tests/__init__.py0000644000076500000240000000005212575462562020561 0ustar jquaststaff00000000000000"""This file intentionally left blank.""" wcwidth-0.1.7/wcwidth/tests/test_core.py0000755000076500000240000000745612735637547021040 0ustar jquaststaff00000000000000# coding: utf-8 """Core tests module for wcwidth.""" import wcwidth def test_hello_jp(): u""" Width of Japanese phrase: コンニチハ, セカイ! Given a phrase of 5 and 3 Katakana ideographs, joined with 3 English-ASCII punctuation characters, totaling 11, this phrase consumes 19 cells of a terminal emulator. """ # given, phrase = u'コンニチハ, セカイ!' expect_length_each = (2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1) expect_length_phrase = sum(expect_length_each) # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase def test_wcswidth_substr(): """ Test wcswidth() optional 2nd parameter, ``n``. ``n`` determines at which position of the string to stop counting length. """ # given, phrase = u'コンニチハ, セカイ!' end = 7 expect_length_each = (2, 2, 2, 2, 2, 1, 1,) expect_length_phrase = sum(expect_length_each) # exercise, length_phrase = wcwidth.wcswidth(phrase, end) # verify, assert length_phrase == expect_length_phrase def test_null_width_0(): """NULL (0) reports width 0.""" # given, phrase = u'abc\x00def' expect_length_each = (1, 1, 1, 0, 1, 1, 1) expect_length_phrase = sum(expect_length_each) # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase, len(phrase)) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase def test_control_c0_width_negative_1(): """CSI (Control sequence initiate) reports width -1.""" # given, phrase = u'\x1b[0m' expect_length_each = (-1, 1, 1, 1) expect_length_phrase = -1 # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase, len(phrase)) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase def test_combining_width_negative_1(): """Simple test combining reports total width of 4.""" # given, phrase = u'--\u05bf--' expect_length_each = (1, 1, 0, 1, 1) expect_length_phrase = 4 # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase, len(phrase)) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase def test_combining_cafe(): u"""Phrase cafe + COMBINING ACUTE ACCENT is café of length 4.""" phrase = u"cafe\u0301" expect_length_each = (1, 1, 1, 1, 0) expect_length_phrase = 4 # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase, len(phrase)) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase def test_combining_enclosing(): u"""CYRILLIC CAPITAL LETTER A + COMBINING CYRILLIC HUNDRED THOUSANDS SIGN is А҈ of length 1.""" phrase = u"\u0410\u0488" expect_length_each = (1, 0) expect_length_phrase = 1 # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase, len(phrase)) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase def test_combining_spacing(): u"""Balinese kapal (ship) is ᬓᬨᬮ᭄ of length 4.""" phrase = u"\u1B13\u1B28\u1B2E\u1B44" expect_length_each = (1, 1, 1, 1) expect_length_phrase = 4 # exercise, length_each = tuple(map(wcwidth.wcwidth, phrase)) length_phrase = wcwidth.wcswidth(phrase, len(phrase)) # verify, assert length_each == expect_length_each assert length_phrase == expect_length_phrase wcwidth-0.1.7/wcwidth/wcwidth.py0000644000076500000240000001632412735637513017340 0ustar jquaststaff00000000000000""" This is an implementation of wcwidth() and wcswidth(). Defined in IEEE Std 1002.1-2001. https://github.com/jquast/wcwidth from Markus Kuhn's C code at: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c This is an implementation of wcwidth() and wcswidth() (defined in IEEE Std 1002.1-2001) for Unicode. http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html In fixed-width output devices, Latin characters all occupy a single "cell" position of equal width, whereas ideographic CJK characters occupy two such cells. Interoperability between terminal-line applications and (teletype-style) character terminals using the UTF-8 encoding requires agreement on which character should advance the cursor by how many cell positions. No established formal standards exist at present on which Unicode character shall occupy how many cell positions on character terminals. These routines are a first attempt of defining such behavior based on simple rules applied to data provided by the Unicode Consortium. For some graphical characters, the Unicode standard explicitly defines a character-cell width via the definition of the East Asian FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes. In all these cases, there is no ambiguity about which width a terminal shall use. For characters in the East Asian Ambiguous (A) class, the width choice depends purely on a preference of backward compatibility with either historic CJK or Western practice. Choosing single-width for these characters is easy to justify as the appropriate long-term solution, as the CJK practice of displaying these characters as double-width comes from historic implementation simplicity (8-bit encoded characters were displayed single-width and 16-bit ones double-width, even for Greek, Cyrillic, etc.) and not any typographic considerations. Much less clear is the choice of width for the Not East Asian (Neutral) class. Existing practice does not dictate a width for any of these characters. It would nevertheless make sense typographically to allocate two character cells to characters such as for instance EM SPACE or VOLUME INTEGRAL, which cannot be represented adequately with a single-width glyph. The following routines at present merely assign a single-cell width to all neutral characters, in the interest of simplicity. This is not entirely satisfactory and should be reconsidered before establishing a formal standard in this area. At the moment, the decision which Not East Asian (Neutral) characters should be represented by double-width glyphs cannot yet be answered by applying a simple rule from the Unicode database content. Setting up a proper standard for the behavior of UTF-8 character terminals will require a careful analysis not only of each Unicode character, but also of each presentation form, something the author of these routines has avoided to do so far. http://www.unicode.org/unicode/reports/tr11/ Markus Kuhn -- 2007-05-26 (Unicode 5.0) Permission to use, copy, modify, and distribute this software for any purpose and without fee is hereby granted. The author disclaims all warranties with regard to this software. Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c """ from __future__ import division from .table_wide import WIDE_EASTASIAN from .table_zero import ZERO_WIDTH def _bisearch(ucs, table): """ Auxiliary function for binary search in interval table. :arg int ucs: Ordinal value of unicode character. :arg list table: List of starting and ending ranges of ordinal values, in form of ``[(start, end), ...]``. :rtype: int :returns: 1 if ordinal value ucs is found within lookup table, else 0. """ lbound = 0 ubound = len(table) - 1 if ucs < table[0][0] or ucs > table[ubound][1]: return 0 while ubound >= lbound: mid = (lbound + ubound) // 2 if ucs > table[mid][1]: lbound = mid + 1 elif ucs < table[mid][0]: ubound = mid - 1 else: return 1 return 0 def wcwidth(wc): r""" Given one unicode character, return its printable length on a terminal. The wcwidth() function returns 0 if the wc argument has no printable effect on a terminal (such as NUL '\0'), -1 if wc is not printable, or has an indeterminate effect on the terminal, such as a control character. Otherwise, the number of column positions the character occupies on a graphic terminal (1 or 2) is returned. The following have a column width of -1: - C0 control characters (U+001 through U+01F). - C1 control characters and DEL (U+07F through U+0A0). The following have a column width of 0: - Non-spacing and enclosing combining characters (general category code Mn or Me in the Unicode database). - NULL (U+0000, 0). - COMBINING GRAPHEME JOINER (U+034F). - ZERO WIDTH SPACE (U+200B) through RIGHT-TO-LEFT MARK (U+200F). - LINE SEPERATOR (U+2028) and PARAGRAPH SEPERATOR (U+2029). - LEFT-TO-RIGHT EMBEDDING (U+202A) through RIGHT-TO-LEFT OVERRIDE (U+202E). - WORD JOINER (U+2060) through INVISIBLE SEPARATOR (U+2063). The following have a column width of 1: - SOFT HYPHEN (U+00AD) has a column width of 1. - All remaining characters (including all printable ISO 8859-1 and WGL4 characters, Unicode control characters, etc.) have a column width of 1. The following have a column width of 2: - Spacing characters in the East Asian Wide (W) or East Asian Full-width (F) category as defined in Unicode Technical Report #11 have a column width of 2. """ # pylint: disable=C0103 # Invalid argument name "wc" ucs = ord(wc) # NOTE: created by hand, there isn't anything identifiable other than # general Cf category code to identify these, and some characters in Cf # category code are of non-zero width. # pylint: disable=too-many-boolean-expressions # Too many boolean expressions in if statement (7/5) if (ucs == 0 or ucs == 0x034F or 0x200B <= ucs <= 0x200F or ucs == 0x2028 or ucs == 0x2029 or 0x202A <= ucs <= 0x202E or 0x2060 <= ucs <= 0x2063): return 0 # C0/C1 control characters if ucs < 32 or 0x07F <= ucs < 0x0A0: return -1 # combining characters with zero width if _bisearch(ucs, ZERO_WIDTH): return 0 return 1 + _bisearch(ucs, WIDE_EASTASIAN) def wcswidth(pwcs, n=None): """ Given a unicode string, return its printable length on a terminal. Return the width, in cells, necessary to display the first ``n`` characters of the unicode string ``pwcs``. When ``n`` is None (default), return the length of the entire string. Returns ``-1`` if a non-printable character is encountered. """ # pylint: disable=C0103 # Invalid argument name "n" end = len(pwcs) if n is None else n idx = slice(0, end) width = 0 for char in pwcs[idx]: wcw = wcwidth(char) if wcw < 0: return -1 else: width += wcw return width wcwidth-0.1.7/wcwidth.egg-info/0000755000076500000240000000000012735767333017005 5ustar jquaststaff00000000000000wcwidth-0.1.7/wcwidth.egg-info/dependency_links.txt0000644000076500000240000000000112735767331023051 0ustar jquaststaff00000000000000 wcwidth-0.1.7/wcwidth.egg-info/PKG-INFO0000644000076500000240000002404212735767331020102 0ustar jquaststaff00000000000000Metadata-Version: 1.1 Name: wcwidth Version: 0.1.7 Summary: Measures number of Terminal column cells of wide-character codes Home-page: https://github.com/jquast/wcwidth Author: Jeff Quast Author-email: contact@jeffquast.com License: MIT Description: .. image:: https://img.shields.io/travis/jquast/wcwidth.svg :target: https://travis-ci.org/jquast/wcwidth :alt: Travis Continous Integration .. image:: https://img.shields.io/coveralls/jquast/wcwidth.svg :target: https://coveralls.io/r/jquast/wcwidth :alt: Coveralls Code Coverage .. image:: https://img.shields.io/pypi/v/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: Latest Version .. image:: https://img.shields.io/github/license/jquast/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: License .. image:: https://img.shields.io/pypi/wheel/wcwidth.svg :alt: Wheel Status .. image:: https://img.shields.io/pypi/dm/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: Downloads ============ Introduction ============ This Library is mainly for those implementing a Terminal Emulator, or programs that carefully produce output to be interpreted by one. **Problem Statement**: When printed to the screen, the length of the string is usually equal to the number of cells it occupies. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0. **Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide `wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's functions precisely copy. *These functions return the number of cells a unicode string is expected to occupy.* This library aims to be forward-looking, portable, and most correct. The most current release of this API is based on the Unicode Standard release files: ``DerivedGeneralCategory-9.0.0.txt`` *Date: 2016-06-01, 10:34:26 GMT* © 2016 Unicode®, Inc. ``EastAsianWidth-9.0.0.txt`` *Date: 2016-05-27, 17:00:00 GMT [KW, LI]* © 2016 Unicode®, Inc. Installation ------------ The stable version of this package is maintained on pypi, install using pip:: pip install wcwidth Example ------- To Display ``u'コンニチハ'`` right-adjusted on screen of 80 columns:: >>> from wcwidth import wcswidth >>> text = u'コンニチハ' >>> text_len = wcswidth(text) >>> print(u' ' * (80 - text_len) + text) wcwidth, wcswidth ----------------- Use function ``wcwidth()`` to determine the length of a *single unicode character*, and ``wcswidth()`` to determine the length of a several, or a *string of unicode characters*. Briefly, return values of function ``wcwidth()`` are: ``-1`` Indeterminate (not printable). ``0`` Does not advance the cursor, such as NULL or Combining. ``2`` Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells. ``1`` All others. Function ``wcswidth()`` simply returns the sum of all values for each character along a string, or ``-1`` when it occurs anywhere along a string. More documentation is available using pydoc:: $ pydoc wcwidth ======= Caveats ======= This library attempts to determine the printable width by an unknown targeted terminal emulator. It does not provide any ability to discern what the target emulator software, version, of level of support is. Results may vary! A `crude method `_ of determining the level of unicode support by the target emulator may be performed using the VT100 Query Cursor Position sequence. The libc version of `wcwidth(3)`_ is often several unicode releases behind, and therefor several levels of support lower than this python library. You may determine an exacting list of these discrepancies using the project file `wcwidth-libc-comparator.py `_. ========== Developing ========== Install wcwidth in editable mode:: pip install -e. Install developer requirements:: pip install -r requirements-develop.txt Execute unit tests using tox:: tox Updating Tables --------------- The command ``python setup.py update`` will fetch the following resources: - http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt - http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt And generates the table files: - `wcwidth/table_wide.py `_ - `wcwidth/table_zero.py `_ Uses ---- This library is used in: - `jquast/blessed`_, a simplified wrapper around curses. - `jonathanslenders/python-prompt-toolkit`_, a Library for building powerful interactive command lines in Python. Additional tools for displaying and testing wcwidth are found in the `bin/ `_ folder of this project's source code. They are not distributed. ======= History ======= 0.1.7 *2016-07-01* * **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_). 0.1.6 *2016-01-08 Production/Stable* * ``LICENSE`` file now included with distribution. 0.1.5 *2015-09-13 Alpha* * **Bugfix**: Resolution of "combining_ character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by `Philip Craig`_ via `PR #11`_. * **Deprecated**: The module path ``wcwidth.table_comb`` is no longer available, it has been superseded by module path ``wcwidth.table_zero``. 0.1.4 *2014-11-20 Pre-Alpha* * **Feature**: ``wcswidth()`` now determines printable length for (most) combining_ characters. The developer's tool `bin/wcwidth-browser.py`_ is improved to display combining_ characters when provided the ``--combining`` option (`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_). * **Feature**: added static analysis (prospector_) to testing framework. 0.1.3 *2014-10-29 Pre-Alpha* * **Bugfix**: 2nd parameter of wcswidth was not honored. (`Thomas Ballinger`_, `PR #4`_). 0.1.2 *2014-10-28 Pre-Alpha* * **Updated** tables to Unicode Specification 7.0.0. (`Thomas Ballinger`_, `PR #3`_). 0.1.1 *2014-05-14 Pre-Alpha* * Initial release to pypi, Based on Unicode Specification 6.3.0 This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:: * Markus Kuhn -- 2007-05-26 (Unicode 5.0) * * Permission to use, copy, modify, and distribute this software * for any purpose and without fee is hereby granted. The author * disclaims all warranties with regard to this software. .. _`prospector`: https://github.com/landscapeio/prospector .. _`combining`: https://en.wikipedia.org/wiki/Combining_character .. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py .. _`Thomas Ballinger`: https://github.com/thomasballinger .. _`Leta Montopoli`: https://github.com/lmontopo .. _`Philip Craig`: https://github.com/philipc .. _`PR #3`: https://github.com/jquast/wcwidth/pull/3 .. _`PR #4`: https://github.com/jquast/wcwidth/pull/4 .. _`PR #5`: https://github.com/jquast/wcwidth/pull/5 .. _`PR #11`: https://github.com/jquast/wcwidth/pull/11 .. _`PR #18`: https://github.com/jquast/wcwidth/pull/18 .. _`jquast/blessed`: https://github.com/jquast/blessed .. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit .. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html .. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html Keywords: terminal,emulator,wcwidth,wcswidth,cjk,combining,xterm,console Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Natural Language :: English Classifier: Development Status :: 3 - Alpha Classifier: Environment :: Console Classifier: License :: OSI Approved :: MIT License Classifier: Operating System :: POSIX Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Software Development :: Localization Classifier: Topic :: Software Development :: Internationalization Classifier: Topic :: Terminals wcwidth-0.1.7/wcwidth.egg-info/SOURCES.txt0000644000076500000240000000052312735767333020671 0ustar jquaststaff00000000000000LICENSE.txt MANIFEST.in README.rst setup.cfg setup.py wcwidth/__init__.py wcwidth/table_wide.py wcwidth/table_zero.py wcwidth/wcwidth.py wcwidth.egg-info/PKG-INFO wcwidth.egg-info/SOURCES.txt wcwidth.egg-info/dependency_links.txt wcwidth.egg-info/top_level.txt wcwidth.egg-info/zip-safe wcwidth/tests/__init__.py wcwidth/tests/test_core.pywcwidth-0.1.7/wcwidth.egg-info/top_level.txt0000644000076500000240000000001012735767331021524 0ustar jquaststaff00000000000000wcwidth wcwidth-0.1.7/wcwidth.egg-info/zip-safe0000644000076500000240000000000112500141224020404 0ustar jquaststaff00000000000000