zc.catalog-1.6/0000775000177100020040000000000012165325611014535 5ustar menesismenesis00000000000000zc.catalog-1.6/README.txt0000664000177100020040000000041012165324663016234 0ustar menesismenesis00000000000000zc.catalog is an extension to the Zope 3 catalog, Zope 3's indexing and search facility. zc.catalog contains a number of extensions to the Zope 3 catalog, such as some new indexes, improved globbing and stemming support, and an alternative catalog implementation. zc.catalog-1.6/src/0000775000177100020040000000000012165325611015324 5ustar menesismenesis00000000000000zc.catalog-1.6/src/zc.catalog.egg-info/0000775000177100020040000000000012165325611021043 5ustar menesismenesis00000000000000zc.catalog-1.6/src/zc.catalog.egg-info/namespace_packages.txt0000664000177100020040000000000312165325611025367 0ustar menesismenesis00000000000000zc zc.catalog-1.6/src/zc.catalog.egg-info/not-zip-safe0000664000177100020040000000000112165324663023277 0ustar menesismenesis00000000000000 zc.catalog-1.6/src/zc.catalog.egg-info/requires.txt0000664000177100020040000000062612165325611023447 0ustar menesismenesis00000000000000ZODB3 pytz setuptools zope.catalog zope.component zope.container zope.i18nmessageid zope.index>=3.5.1 zope.interface zope.intid zope.publisher >= 3.12 zope.schema zope.security [test] zope.keyreference zope.testing [test_browser] zope.login zope.password zope.securitypolicy zope.testbrowser zope.app.appsetup zope.app.catalog zope.app.testing zope.app.zcmlfiles [browser] zope.app.form zope.browsermenuzc.catalog-1.6/src/zc.catalog.egg-info/dependency_links.txt0000664000177100020040000000000112165325611025111 0ustar menesismenesis00000000000000 zc.catalog-1.6/src/zc.catalog.egg-info/SOURCES.txt0000664000177100020040000000206412165325611022731 0ustar menesismenesis00000000000000CHANGES.txt COPYRIGHT.txt LICENSE.txt MANIFEST.in README.txt bootstrap.py buildout.cfg setup.py src/zc/__init__.py src/zc.catalog.egg-info/PKG-INFO src/zc.catalog.egg-info/SOURCES.txt src/zc.catalog.egg-info/dependency_links.txt src/zc.catalog.egg-info/namespace_packages.txt src/zc.catalog.egg-info/not-zip-safe src/zc.catalog.egg-info/requires.txt src/zc.catalog.egg-info/top_level.txt src/zc/catalog/__init__.py src/zc/catalog/callablewrapper.txt src/zc/catalog/catalogindex.py src/zc/catalog/configure.zcml src/zc/catalog/extentcatalog.py src/zc/catalog/extentcatalog.txt src/zc/catalog/globber.py src/zc/catalog/globber.txt src/zc/catalog/i18n.py src/zc/catalog/index.py src/zc/catalog/interfaces.py src/zc/catalog/legacy.txt src/zc/catalog/normalizedindex.txt src/zc/catalog/setindex.txt src/zc/catalog/stemmer.py src/zc/catalog/stemmer.txt src/zc/catalog/tests.py src/zc/catalog/valueindex.txt src/zc/catalog/browser/README.txt src/zc/catalog/browser/__init__.py src/zc/catalog/browser/configure.zcml src/zc/catalog/browser/ftesting.zcml src/zc/catalog/browser/tests.pyzc.catalog-1.6/src/zc.catalog.egg-info/PKG-INFO0000664000177100020040000024016412165325611022147 0ustar menesismenesis00000000000000Metadata-Version: 1.1 Name: zc.catalog Version: 1.6 Summary: Extensions to the Zope 3 Catalog Home-page: http://pypi.python.org/pypi/zc.catalog Author: Zope Corporation and Contributors Author-email: zope-dev@zope.org License: ZPL 2.1 Description: zc.catalog is an extension to the Zope 3 catalog, Zope 3's indexing and search facility. zc.catalog contains a number of extensions to the Zope 3 catalog, such as some new indexes, improved globbing and stemming support, and an alternative catalog implementation. .. contents:: ======= CHANGES ======= 1.6 (2013-07-04) ---------------- - Using Python's ``doctest`` module instead of deprecated ``zope.testing.doctest``. - Move ``zope.intid`` to dependencies. 1.5.1 (2012-01-20) ------------------ - Fix the extent catalog's `searchResults` method to work when using a local uid source. - Replaced a testing dependency on ``zope.app.authentication`` with ``zope.password``. - Removed ``zope.app.server`` test dependency. 1.5 (2010-10-19) ---------------- - The package's ``configure.zcml`` does not include the browser subpackage's ``configure.zcml`` anymore. This, together with ``browser`` and ``test_browser`` ``extras_require``, decouples the browser view registrations from the main code. As a result projects that do not need the ZMI views to be registered are not pulling in the zope.app.* dependencies anymore. To enable the ZMI views for your project, you will have to do two things: * list ``zc.catalog [browser]`` as a ``install_requires``. * have your project's ``configure.zcml`` include the ``zc.catalog.browser`` subpackage. - Only include the browser tests whenever the dependencies for the browser tests are available. - Python2.7 test fix. 1.4.5 (2010-10-05) ------------------ - Remove implicit test dependency on zope.app.dublincore, that was not needed in the first place. 1.4.4 (2010-07-06) ------------------ * Fixed test-failure happening with more recent ``mechanize`` (>=2.0). 1.4.3 (2010-03-09) ------------------ * Try to import the stemmer from the zopyx.txng3.ext package first, which as of 3.3.2 contains stability and memory leak fixes. 1.4.2 (2010-01-20) ------------------ * Fix missing testing dependencies when using ZTK by adding zope.login. 1.4.1 (2009-02-27) ------------------ * Add FieldIndex-like sorting support for the ValueIndex. * Add sorting indexes support for the NormalizationWrapper. 1.4.0 (2009-02-07) ------------------ Bugs fixed ~~~~~~~~~~ * Fixed a typo in ValueIndex addform and addMenuItem * Use ``zope.container`` instead of ``zope.app.container``. * Use ``zope.keyreference`` instead of ``zope.app.keyreference``. * Use ``zope.intid`` instead of ``zope.app.intid``. * Use ``zope.catalog`` instead of ``zope.app.catalog``. 1.3.0 (2008-09-10) ------------------ Features added ~~~~~~~~~~~~~~ * Added hook point to allow extent catalog to be used with local UID sources. 1.2.0 (2007-11-03) ------------------ Features added ~~~~~~~~~~~~~~ * Updated package meta-data. * zc.catalog now can use 64-bit BTrees ("L") as provided by ZODB 3.8. * Albertas Agejavas (alga@pov.lt) included the new CallableWrapper, for when the typical Zope 3 index-by-adapter story (zope.app.catalog.attribute) is unnecessary trouble, and you just want to use a callable. See callablewrapper.txt. This can also be used for other indexes based on the zope.index interfaces. * Extents now have a __len__. The current implementation defers to the standard BTree len implementation, and shares its performance characteristics: it needs to wake up all of the buckets, but if all of the buckets are awake it is a fairly quick operation. * A simple ISelfPoulatingExtent was added to the extentcatalog module for which populating is a no-op. This is directly useful for catalogs that are used as implementation details of a component, in which objects are indexed explicitly by your own calls rather than by the usual subscribers. It is also potentially slightly useful as a base for other self-populating extents. 1.1.1 (2007-3-17) ----------------- Bugs fixed ~~~~~~~~~~ 'all_of' would return all results when one of the values had no results. Reported, with test and fix provided, by Nando Quintana. 1.1 (2007-01-06) ---------------- Features removed ~~~~~~~~~~~~~~~~ The queueing of events in the extent catalog has been entirely removed. Subtransactions caused significant problems to the code introduced in 1.0. Other solutions also have significant problems, and the win of this kind of queueing is qustionable. Here is a run down of the approaches rejected for getting the queueing to work: * _p_invalidate (used in 1.0). Not really designed for use within a transaction, and reverts to last savepoint, rather than the beginning of the transaction. Could monkeypatch savepoints to iterate over precommit transaction hooks but that just smells too bad. * _p_resolveConflict. Requires application software to exist in ZEO and even ZRS installations, which is counter to our software deployment goals. Also causes useless repeated writes of empty queue to database, but that's not the showstopper. * vague hand-wavy ideas for separate storages or transaction managers for the queue. Never panned out in discussion. 1.0 (2007-01-05) ---------------- Bugs fixed ~~~~~~~~~~ * adjusted extentcatalog tests to trigger (and discuss and test) the queueing behavior. * fixed problem with excessive conflict errors due to queueing code. * updated stemming to work with newest version of TextIndexNG's extensions. * omitted stemming test when TextIndexNG's extensions are unavailable, so tests pass without it. Since TextIndexNG's extensions are optional, this seems reasonable. * removed use of zapi in extentcatalog. 0.2 (2006-11-22) ---------------- Features added ~~~~~~~~~~~~~~ * First release on Cheeseshop. =========== Value Index =========== The valueindex is an index similar to, but more flexible than a standard Zope field index. The index allows searches for documents that contain any of a set of values; between a set of values; any (non-None) values; and any empty values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import ValueIndex >>> index = ValueIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. >>> data = {1: 'a', ... 2: 'b', ... 3: 'a', ... 4: 'c', ... 5: 'd', ... 6: 'c', ... 7: 'c', ... 8: 'b', ... 9: 'c', ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) ['a', 'b', 'c', 'd'] >>> index.documentCount() 9 >>> index.wordCount() 4 >>> index.maxValue() 'd' >>> index.minValue() 'a' >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports four types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are not weighted. >>> list(index.apply({'any_of':('b', 'c')})) [2, 4, 6, 7, 8, 9] >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> list(index.apply({'any_of': ('d',)})) [5] >>> list(index.apply({'any_of':(42,)})) [] Another query is 'any', If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are not weighted. >>> list(index.apply({'between': ('b', 'd')})) [2, 4, 5, 6, 7, 8, 9] >>> list(index.apply({'between': ('c', None)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('c',)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('b', 'd', True, True)})) [4, 6, 7, 9] The 'none' argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values. >>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14] Trying to use more than one of these at a time generates an error. >>> index.apply({'between': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... Using none of them simply returns None. >>> index.apply({}) # returns None Invalid query names cause ValueErrors. >>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... When you unindex a document, the searches and statistics should be updated. >>> index.unindex_doc(5) >>> len(index.apply({'any_of': ('d',)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.values()) ['a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 6, 7, 8, 9] Reindexing a document that has a changed value also is reflected in subsequent searches and statistic checks. >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> data[8] = 'e' >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 4 >>> list(index.apply({'any_of': ('e',)})) [8] >>> list(index.apply({'any_of': ('b',)})) [2] >>> data[2] = 'e' >>> index.index_doc(2, data[2]) >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.apply({'any_of': ('e',)})) [2, 8] >>> list(index.apply({'any_of': ('b',)})) [] Reindexing a document for which the value is now None causes it to be removed from the statistics. >>> data[3] = None >>> index.index_doc(3, data[3]) >>> index.documentCount() 7 >>> index.wordCount() 3 >>> list(index.ids()) [1, 2, 4, 6, 7, 8, 9] This affects both ways of determining the ids that are and are not in the index (that do and do not have values). >>> list(index.apply({'any': None})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'any': extent})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 3, 5, 10, 11, 12, 13, 14] The values method can be used to examine the indexed values for a given document id. For a valueindex, the "values" for a given doc_id will always have a length of 0 or 1. >>> index.values(doc_id=8) ('e',) And the containsValue method provides a way of determining membership in the values. >>> index.containsValue('a') True >>> index.containsValue('q') False Sorting ------- Value indexes supports sorting, just like zope.index.field.FieldIndex. >>> index.clear() >>> index.index_doc(1, 9) >>> index.index_doc(2, 8) >>> index.index_doc(3, 7) >>> index.index_doc(4, 6) >>> index.index_doc(5, 5) >>> index.index_doc(6, 4) >>> index.index_doc(7, 3) >>> index.index_doc(8, 2) >>> index.index_doc(9, 1) >>> list(index.sort([4, 2, 9, 7, 3, 1, 5])) [9, 7, 5, 4, 3, 2, 1] We can also specify the ``reverse`` argument to reverse results: >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True)) [1, 2, 3, 4, 5, 7, 9] And as per IIndexSort, we can limit results by specifying the ``limit`` argument: >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3)) [9, 7, 5] If we pass an id that is not indexed by this index, it won't be included in the result. >>> list(index.sort([2, 10])) [2] ========= Set Index ========= The setindex is an index similar to, but more general than a traditional keyword index. The values indexed are expected to be iterables; the index allows searches for documents that contain any of a set of values; all of a set of values; or between a set of values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import SetIndex >>> index = SetIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. In our example, we hope that strings and integers will sort consistently; this may not be a reasonable hope. >>> data = {1: ['a', 1], ... 2: ['b', 'a', 3, 4, 7], ... 3: [1], ... 4: [1, 4, 'c'], ... 5: [7], ... 6: [5, 6, 7], ... 7: ['c'], ... 8: [1, 6], ... 9: ['a', 'c', 2, 3, 4, 6,], ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) [1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c'] >>> index.documentCount() 9 >>> index.wordCount() 10 >>> index.maxValue() 'c' >>> index.minValue() 1 >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports five types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are weighted. >>> list(index.apply({'any_of':('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of': ('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of':(42,)})) [] >>> index.apply({'any_of': ('a', 3, 7)}) # doctest: +ELLIPSIS BTrees...FBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)]) Another query is 'any'. If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'all_of' argument also takes an iterable of values, but returns an iterable of document ids that contains all of the values. The results are not weighted. >>> list(index.apply({'all_of': ('a',)})) [1, 2, 9] >>> list(index.apply({'all_of': (3, 4)})) [2, 9] These tests illustrate two related reported errors that have been fixed. >>> list(index.apply({'all_of': ('z', 3, 4)})) [] >>> list(index.apply({'all_of': (3, 4, 'z')})) [] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are weighted. >>> list(index.apply({'between': (1, 7)})) [1, 2, 3, 4, 5, 6, 8, 9] >>> list(index.apply({'between': ('b', None)})) [2, 4, 7, 9] >>> list(index.apply({'between': ('b',)})) [2, 4, 7, 9] >>> list(index.apply({'between': (1, 7, True, True)})) [2, 4, 6, 8, 9] >>> index.apply({'between': (2, 6)}) # doctest: +ELLIPSIS BTrees...FBucket([(2, 2.0), (4, 1.0), (6, 2.0), (8, 1.0), (9, 4.0)]) The 'none' argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values. >>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14] Trying to use more than one of these at a time generates an error. >>> index.apply({'all_of': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... Using none of them simply returns None. >>> index.apply({}) # returns None Invalid query names cause ValueErrors. >>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... When you unindex a document, the searches and statistics should be updated. >>> index.unindex_doc(6) >>> len(index.apply({'any_of': (5,)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 9 >>> list(index.values()) [1, 2, 3, 4, 6, 7, 'a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 5, 7, 8, 9] Reindexing a document that has new additional values also is reflected in subsequent searches and statistic checks. >>> data[8].extend([5, 'c']) >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 10 >>> list(index.apply({'any_of': (5,)})) [8] >>> list(index.apply({'any_of': ('c',)})) [4, 7, 8, 9] The same is true for reindexing a document with both additions and removals. >>> 2 in set(index.apply({'any_of': (7,)})) True >>> 2 in set(index.apply({'any_of': (2,)})) False >>> data[2].pop() 7 >>> data[2].append(2) >>> index.index_doc(2, data[2]) >>> 2 in set(index.apply({'any_of': (7,)})) False >>> 2 in set(index.apply({'any_of': (2,)})) True Reindexing a document that no longer has any values causes it to be removed from the statistics. >>> del data[2][:] >>> index.index_doc(2, data[2]) >>> index.documentCount() 7 >>> index.wordCount() 9 >>> list(index.ids()) [1, 3, 4, 5, 7, 8, 9] This affects both ways of determining the ids that are and are not in the index (that do and do not have values). >>> list(index.apply({'any': None})) [1, 3, 4, 5, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 2, 6, 10, 11, 12, 13, 14] The values method can be used to examine the indexed values for a given document id. >>> set(index.values(doc_id=8)) == set([1, 5, 6, 'c']) True And the containsValue method provides a way of determining membership in the values. >>> index.containsValue(5) True >>> index.containsValue(20) False ================ Normalized Index ================ The index module provides a normalizing wrapper, a DateTime normalizer, and a set index and a value index normalized with the DateTime normalizer. The normalizing wrapper implements a full complement of index interfaces-- zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, and zc.catalog.interfaces.IIndexValues-- and delegates all of the behavior to the wrapped index, normalizing values using the normalizer before the index sees them. The normalizing wrapper currently only supports queries offered by zc.catalog.interfaces.ISetIndex and zc.catalog.interfaces.IValueIndex. The normalizer interface requires the following methods, as defined in the interface: def value(value): """normalize or check constraints for an input value; raise an error or return the value to be indexed.""" def any(value, index): """normalize a query value for a "any_of" search; return a sequence of values.""" def all(value, index): """Normalize a query value for an "all_of" search; return the value for query""" def minimum(value, index): """normalize a query value for minimum of a range; return the value for query""" def maximum(value, index): """normalize a query value for maximum of a range; return the value for query""" The DateTime normalizer performs the following normalizations and validations. Whenever a timezone is needed, it tries to get a request from the current interaction and adapt it to zope.interface.common.idatetime.ITZInfo; failing that (no request or no adapter) it uses the system local timezone. - input values must be datetimes with a timezone. They are normalized to the resolution specified when the normalizer is created: a resolution of 0 normalizes values to days; a resolution of 1 to hours; 2 to minutes; 3 to seconds; and 4 to microseconds. - 'any' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to any value from the start to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. - 'all' values may be timezone-aware datetimes or timezone-naive datetimes. timezone-naive datetimes get the found timezone. - 'minimum' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the start of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. - 'maximum' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. Let's look at the DateTime normalizer first, and then an integration of it with the normalizing wrapper and the value and set indexes. The indexed values are parsed with 'value'. >>> from zc.catalog.index import DateTimeNormalizer >>> n = DateTimeNormalizer() # defaults to minutes >>> import datetime >>> import pytz >>> naive_datetime = datetime.datetime(2005, 7, 15, 11, 21, 32, 104) >>> date = naive_datetime.date() >>> aware_datetime = naive_datetime.replace( ... tzinfo=pytz.timezone('US/Eastern')) >>> n.value(naive_datetime) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(date) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, tzinfo=) If we specify a different resolution, the results are different. >>> another = DateTimeNormalizer(1) # hours >>> another.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 0, tzinfo=) Note that changing the resolution of an indexed value may create surprising results, because queries do not change their resolution. Therefore, if you index something with a datetime with a finer resolution that the normalizer's, then searching for that datetime will not find the doc_id. Values in an 'any_of' query are parsed with 'any'. 'any' should return a sequence of values. It requires an index, which we will mock up here. >>> class DummyIndex(object): ... def values(self, start, stop, exclude_start, exclude_stop): ... assert not exclude_start and exclude_stop ... six_hours = datetime.timedelta(hours=6) ... res = [] ... dt = start ... while dt < stop: ... res.append(dt) ... dt += six_hours ... return res ... >>> index = DummyIndex() >>> tuple(n.any(naive_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>),) >>> tuple(n.any(aware_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>),) >>> tuple(n.any(date, index)) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS (datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 6, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 12, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 18, 0, tzinfo=<...Local...>)) Values in an 'all_of' query are parsed with 'all'. >>> n.all(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.all(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.all(date, index) # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError: ... Minimum values in a 'between' query as well as those in other methods are parsed with 'minimum'. They also take an optional exclude boolean, which indicates whether the minimum is to be excluded. For datetimes, it only makes a difference if you pass in a date. >>> n.minimum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) >>> n.minimum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) Maximum values in a 'between' query as well as those in other methods are parsed with 'maximum'. They also take an optional exclude boolean, which indicates whether the maximum is to be excluded. For datetimes, it only makes a difference if you pass in a date. >>> n.maximum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) >>> n.maximum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) Now let's examine these normalizers in the context of a real index. >>> from zc.catalog.index import DateTimeValueIndex, DateTimeSetIndex >>> setindex = DateTimeSetIndex() # minutes resolution >>> data = [] # generate some data >>> def date_gen( ... start=aware_datetime, ... count=12, ... period=datetime.timedelta(hours=10)): ... dt = start ... ix = 0 ... while ix < count: ... yield dt ... dt += period ... ix += 1 ... >>> gen = date_gen() >>> count = 0 >>> while True: ... try: ... next = [gen.next() for i in range(6)] ... except StopIteration: ... break ... data.append((count, next[0:1])) ... count += 1 ... data.append((count, next[1:3])) ... count += 1 ... data.append((count, next[3:6])) ... count += 1 ... >>> print data # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [(0, [datetime.datetime(2005, 7, 15, 11, 21, 32, 104, ...<...Eastern...>)]), (1, [datetime.datetime(2005, 7, 15, 21, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 16, 7, 21, 32, 104, ...<...Eastern...>)]), (2, [datetime.datetime(2005, 7, 16, 17, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 3, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, 32, 104, ...<...Eastern...>)]), (3, [datetime.datetime(2005, 7, 17, 23, 21, 32, 104, ...<...Eastern...>)]), (4, [datetime.datetime(2005, 7, 18, 9, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 18, 19, 21, 32, 104, ...<...Eastern...>)]), (5, [datetime.datetime(2005, 7, 19, 5, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 19, 15, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 20, 1, 21, 32, 104, ...<...Eastern...>)])] >>> data_dict = dict(data) >>> for doc_id, value in data: ... setindex.index_doc(doc_id, value) ... >>> list(setindex.ids()) [0, 1, 2, 3, 4, 5] >>> set(setindex.values()) == set( ... setindex.normalizer.value(v) for v in date_gen()) True For the searches, we will actually use a request and interaction, with an adapter that returns the Eastern timezone. This makes the examples less dependent on the machine that they use. >>> import zope.security.management >>> import zope.publisher.browser >>> import zope.interface.common.idatetime >>> import zope.publisher.interfaces >>> request = zope.publisher.browser.TestRequest() >>> zope.security.management.newInteraction(request) >>> from zope import interface, component >>> @interface.implementer(zope.interface.common.idatetime.ITZInfo) ... @component.adapter(zope.publisher.interfaces.IRequest) ... def tzinfo(req): ... return pytz.timezone('US/Eastern') ... >>> component.provideAdapter(tzinfo) >>> n.all(naive_datetime, index).tzinfo is pytz.timezone('US/Eastern') True >>> set(setindex.apply({'any_of': (datetime.date(2005, 7, 17), ... datetime.date(2005, 7, 20), ... datetime.date(2005, 12, 31))})) == set( ... (2, 3, 5)) True Note that this search is using the normalized values. >>> set(setindex.apply({'all_of': ( ... datetime.datetime( ... 2005, 7, 16, 7, 21, tzinfo=pytz.timezone('US/Eastern')), ... datetime.datetime( ... 2005, 7, 15, 21, 21, tzinfo=pytz.timezone('US/Eastern')),)}) ... ) == set((1,)) True >>> list(setindex.apply({'any': None})) [0, 1, 2, 3, 4, 5] >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1))}) ... ) == set((0, 1, 2, 3, 4, 5)) True >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1), ... True, True)}) ... ) == set((0, 1, 2, 3, 4, 5)) True 'between' searches should deal with dates well. >>> set(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == set((1, 2, 3)) True >>> len(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == len(setindex.apply({'between': ( ... datetime.date(2005, 7, 15), datetime.date(2005, 7, 18), ... True, True)}) ... ) True Removing docs works as usual. >>> setindex.unindex_doc(1) >>> list(setindex.ids()) [0, 2, 3, 4, 5] Value, Minvalue and Maxvalue can take timezone-less datetimes and dates. >>> setindex.minValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, ...<...Eastern...>) >>> setindex.minValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>) >>> setindex.maxValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 20, 1, 21, ...<...Eastern...>) >>> setindex.maxValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>) >>> list(setindex.values( ... datetime.date(2005, 7, 17), datetime.date(2005, 7, 17))) ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)] >>> zope.security.management.endInteraction() # TODO put in tests tearDown Sorting ------- The normalization wrapper provides the zope.index.interfaces.IIndexSort interface if its upstream index provides it. For example, the DateTimeValueIndex will provide IIndexSort, because ValueIndex provides sorting. It will also delegate the ``sort`` method to the value index. >>> from zc.catalog.index import DateTimeValueIndex >>> from zope.index.interfaces import IIndexSort >>> ix = DateTimeValueIndex() >>> IIndexSort.providedBy(ix.index) True >>> IIndexSort.providedBy(ix) True >>> ix.sort.im_self is ix.index True But it won't work for indexes that doesn't do sorting, for example DateTimeSetIndex. >>> ix = DateTimeSetIndex() >>> IIndexSort.providedBy(ix.index) False >>> IIndexSort.providedBy(ix) False >>> ix.sort Traceback (most recent call last): ... AttributeError: 'SetIndex' object has no attribute 'sort' ============== Extent Catalog ============== An extent catalog is very similar to a normal catalog except that it only indexes items addable to its extent. The extent is both a filter and a set that may be merged with other result sets. The filtering is an additional feature we will discuss below; we'll begin with a simple "do nothing" extent that only supports the second use case. We create the state that the text needs here. >>> import zope.keyreference.persistent >>> import zope.component >>> import zope.intid >>> import zope.component >>> import zope.component.interfaces >>> import zope.component.persistentregistry >>> from ZODB.tests.util import DB >>> import transaction >>> zope.component.provideAdapter( ... zope.keyreference.persistent.KeyReferenceToPersistent, ... adapts=(zope.interface.Interface,)) >>> zope.component.provideAdapter( ... zope.keyreference.persistent.connectionOfPersistent, ... adapts=(zope.interface.Interface,)) >>> site_manager = None >>> def getSiteManager(context=None): ... if context is None: ... if site_manager is None: ... return zope.component.getGlobalSiteManager() ... else: ... return site_manager ... else: ... try: ... return zope.component.interfaces.IComponentLookup(context) ... except TypeError, error: ... raise zope.component.ComponentLookupError(*error.args) ... >>> def setSiteManager(sm): ... global site_manager ... site_manager = sm ... if sm is None: ... zope.component.getSiteManager.reset() ... else: ... zope.component.getSiteManager.sethook(getSiteManager) ... >>> def makeRoot(): ... db = DB() ... conn = db.open() ... root = conn.root() ... site_manager = root['components'] = ( ... zope.component.persistentregistry.PersistentComponents()) ... site_manager.__bases__ = (zope.component.getGlobalSiteManager(),) ... site_manager.registerUtility( ... zope.intid.IntIds(family=btrees_family), ... provided=zope.intid.interfaces.IIntIds) ... setSiteManager(site_manager) ... transaction.commit() ... return root ... >>> @zope.component.adapter(zope.interface.Interface) ... @zope.interface.implementer(zope.component.interfaces.IComponentLookup) ... def getComponentLookup(obj): ... return obj._p_jar.root()['components'] ... >>> zope.component.provideAdapter(getComponentLookup) To show the extent catalog at work, we need an intid utility, an index, some items to index. We'll do this within a real ZODB and a real intid utility. >>> import zc.catalog >>> import zc.catalog.interfaces >>> from zc.catalog import interfaces, extentcatalog >>> from zope import interface, component >>> from zope.interface import verify >>> import persistent >>> import BTrees.IFBTree >>> root = makeRoot() >>> intid = zope.component.getUtility( ... zope.intid.interfaces.IIntIds, context=root) >>> TreeSet = btrees_family.IF.TreeSet >>> from zope.container.interfaces import IContained >>> class DummyIndex(persistent.Persistent): ... interface.implements(IContained) ... __parent__ = __name__ = None ... def __init__(self): ... self.uids = TreeSet() ... def unindex_doc(self, uid): ... if uid in self.uids: ... self.uids.remove(uid) ... def index_doc(self, uid, obj): ... self.uids.insert(uid) ... def clear(self): ... self.uids.clear() ... def apply(self, query): ... return [uid for uid in self.uids if uid <= query] ... >>> class DummyContent(persistent.Persistent): ... def __init__(self, name, parent): ... self.id = name ... self.__parent__ = parent ... >>> extent = extentcatalog.Extent(family=btrees_family) >>> verify.verifyObject(interfaces.IExtent, extent) True >>> root['catalog'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit() Now we have a catalog set up with an index and an extent. We can add some data to the extent: >>> matches = [] >>> for i in range(100): ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... matches.append(doc_id) >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True We can get the size of the extent. >>> len(extent) 100 Unindexing an object that is in the catalog should simply remove it from the catalog and index as usual. >>> matches[0] in catalog.extent True >>> matches[0] in catalog['index'].uids True >>> catalog.unindex_doc(matches[0]) >>> matches[0] in catalog.extent False >>> matches[0] in catalog['index'].uids False >>> doc_id = matches.pop(0) >>> sorted(extent) == sorted(index.uids) == matches True Clearing the catalog clears both the extent and the contained indexes. >>> catalog.clear() >>> list(catalog.extent) == list(catalog['index'].uids) == [] True Updating all indexes and an individual index both also update the extent. >>> catalog.updateIndexes() >>> matches.insert(0, doc_id) >>> sorted(extent) == sorted(index.uids) == matches True >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) == matches True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0]) But so why have an extent in the first place? It allows indices to operate against a reliable collection of the full indexed data; therefore, it allows the indices in zc.catalog to perform NOT operations. The extent itself provides a number of merging features to allow its values to be merged with other BTrees.IFBTree data structures. These include intersection, union, difference, and reverse difference. Given an extent named 'extent' and another IFBTree data structure named 'data', intersections can be spelled "extent & data" or "data & extent"; unions can be spelled "extent | data" or "data | extent"; differences can be spelled "extent - data"; and reverse differences can be spelled "data - extent". Unions and intersections are weighted. >>> extent = extentcatalog.Extent(family=btrees_family) >>> for i in range(1, 100, 2): ... extent.add(i, None) ... >>> alt_set = TreeSet() >>> alt_set.update(range(0, 166, 33)) # return value is unimportant here 6 >>> sorted(alt_set) [0, 33, 66, 99, 132, 165] >>> sorted(extent & alt_set) [33, 99] >>> sorted(alt_set & extent) [33, 99] >>> sorted(extent.intersection(alt_set)) [33, 99] >>> original = set(extent) >>> union_matches = original.copy() >>> union_matches.update(alt_set) >>> union_matches = sorted(union_matches) >>> sorted(alt_set | extent) == union_matches True >>> sorted(extent | alt_set) == union_matches True >>> sorted(extent.union(alt_set)) == union_matches True >>> sorted(alt_set - extent) [0, 66, 132, 165] >>> sorted(extent.rdifference(alt_set)) [0, 66, 132, 165] >>> original.remove(33) >>> original.remove(99) >>> set(extent - alt_set) == original True >>> set(extent.difference(alt_set)) == original True We can pass our own instantiated UID utility to extentcatalog.Catalog. >>> extent = extentcatalog.Extent(family=btrees_family) >>> uidutil = zope.intid.IntIds() >>> cat = extentcatalog.Catalog(extent, uidutil) >>> cat["index"] = DummyIndex() >>> cat.UIDSource is uidutil True >>> cat._getUIDSource() is uidutil True The ResultSet instance returned by the catalog's `searchResults` method uses our UID utility. >>> obj = DummyContent(43, root) >>> uid = uidutil.register(obj) >>> cat.index_doc(uid, obj) >>> res = cat.searchResults(index=uid) >>> res.uidutil is uidutil True >>> list(res) == [obj] True `searchResults` may also return None. >>> cat.searchResults() is None True Calling `updateIndex` and `updateIndexes` when the catalog has its uid source set works as well. >>> cat.clear() >>> uid in cat.extent False All objects in the uid utility are indexed. >>> cat.updateIndexes() >>> uid in cat.extent True >>> len(cat.extent) 1 >>> obj2 = DummyContent(44, root) >>> uid2 = uidutil.register(obj2) >>> cat.updateIndexes() >>> len(cat.extent) 2 >>> uid2 in cat.extent True >>> uidutil.unregister(obj2) >>> cat.clear() >>> uid in cat.extent False >>> cat.updateIndex(cat["index"]) >>> uid in cat.extent True With a self-populating extent, calling `updateIndex` or `updateIndexes` means only the objects whose ids are in the extent are updated/reindexed; if present, the catalog will use its uid source to look up the objects by id. >>> extent = extentcatalog.NonPopulatingExtent(family=btrees_family) >>> cat = extentcatalog.Catalog(extent, uidutil) >>> cat["index"] = DummyIndex() >>> extent.add(uid, obj) >>> uid in cat["index"].uids False >>> cat.updateIndexes() >>> uid in cat["index"].uids True >>> cat.clear() >>> uid in cat["index"].uids False >>> uid in cat.extent False >>> cat.extent.add(uid, obj) >>> cat.updateIndex(cat["index"]) >>> uid in cat["index"].uids True Unregister the objects of the previous tests from intid utility: >>> intid = zope.component.getUtility( ... zope.intid.interfaces.IIntIds, context=root) >>> for doc_id in matches: ... intid.unregister(intid.queryObject(doc_id)) Catalog with a filter extent ---------------------------- As discussed at the beginning of this document, extents can not only help with index operations, but also act as a filter, so that a given catalog can answer questions about a subset of the objects contained in the intids. The filter extent only stores objects that match a given filter. >>> def filter(extent, uid, ob): ... assert interfaces.IFilterExtent.providedBy(extent) ... # This is an extent of objects with odd-numbered uids without a ... # True ignore attribute ... return uid % 2 and not getattr(ob, 'ignore', False) ... >>> extent = extentcatalog.FilterExtent(filter, family=btrees_family) >>> verify.verifyObject(interfaces.IFilterExtent, extent) True >>> root['catalog1'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit() Now we have a catalog set up with an index and an extent. If we create some content and ask the catalog to index it, only the ones that match the filter will be in the extent and in the index. >>> matches = [] >>> fails = [] >>> i = 0 >>> while True: ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... if filter(extent, doc_id, c): ... matches.append(doc_id) ... else: ... fails.append(doc_id) ... i += 1 ... if i > 99 and len(matches) > 4: ... break ... >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True If a content object is indexed that used to match the filter but no longer does, it should be removed from the extent and indexes. >>> matches[0] in catalog.extent True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> filter(extent, matches[0], obj) False >>> catalog.index_doc(matches[0], obj) >>> doc_id = matches.pop(0) >>> doc_id in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True Unindexing an object that is not in the catalog should be a no-op. >>> fails[0] in catalog.extent False >>> catalog.unindex_doc(fails[0]) >>> fails[0] in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True Updating all indexes and an individual index both also update the extent. >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0]) If you update a single index and an object is no longer a member of the extent, it is removed from all indexes. >>> matches[0] in catalog.extent True >>> matches[0] in index.uids True >>> matches[0] in index2.uids True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> catalog.updateIndex(index2) >>> matches[0] in catalog.extent False >>> matches[0] in index.uids False >>> matches[0] in index2.uids False >>> doc_id = matches.pop(0) >>> (matches == sorted(catalog.extent) == sorted(index.uids) ... == sorted(index2.uids)) True Self-populating extents ----------------------- An extent may know how to populate itself; this is especially useful if the catalog can be initialized with fewer items than those available in the IIntIds utility that are also within the nearest Zope 3 site (the policy coded in the basic Zope 3 catalog). Such an extent must implement the `ISelfPopulatingExtent` interface, which requires two attributes. Let's use the `FilterExtent` class as a base for implementing such an extent, with a method that selects content item 0 (created and registered above):: >>> class PopulatingExtent( ... extentcatalog.FilterExtent, ... extentcatalog.NonPopulatingExtent): ... ... def populate(self): ... if self.populated: ... return ... self.add(intid.getId(root[0]), root[0]) ... super(PopulatingExtent, self).populate() Creating a catalog based on this extent ignores objects in the database already:: >>> def accept_any(extent, uid, ob): ... return True >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> catalog = extentcatalog.Catalog(extent) >>> index = DummyIndex() >>> catalog['index'] = index >>> root['catalog2'] = catalog >>> transaction.commit() At this point, our extent remains unpopulated:: >>> extent.populated False Iterating over the extent does not cause it to be automatically populated:: >>> list(extent) [] Causing our new index to be filled will cause the `populate()` method to be called, setting the `populate` flag as a side-effect:: >>> catalog.updateIndex(index) >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True The index has been updated with the documents identified by the extent:: >>> list(index.uids) == [intid.getId(root[0])] True Updating the same index repeatedly will continue to use the extent as the source of documents to include:: >>> catalog.updateIndex(index) >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True The `updateIndexes()` method has a similar behavior. If we add an additional index to the catalog, we see that it indexes only those objects from the extent:: >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> catalog.updateIndexes() >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True When we have fresh catalog and extent (not yet populated), we see that `updateIndexes()` will cause the extent to be populated:: >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> root['catalog3'] = catalog = extentcatalog.Catalog(extent) >>> index1 = DummyIndex() >>> index2 = DummyIndex() >>> catalog['index1'] = index1 >>> catalog['index2'] = index2 >>> transaction.commit() >>> extent.populated False >>> catalog.updateIndexes() >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True >>> list(index1.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True We'll make sure everything can be safely committed. >>> transaction.commit() >>> setSiteManager(None) ======= Stemmer ======= The stemmer uses Andreas Jung's stemmer code, which is a Python wrapper of M. F. Porter's Snowball project (http://snowball.tartarus.org/index.php). It is designed to be used as part of a pipeline in a zope/index/text/ lexicon, after a splitter. This enables getting the relevance ranking of the zope/index/text code with the splitting functionality of TextIndexNG 3.x. It requires that the TextIndexNG extensions--specifically txngstemmer--have been compiled and installed in your Python installation. Inclusion of the textindexng package is not necessary. As of this writing (Jan 3, 2007), installing the necessary extensions can be done with the following steps: - `svn co https://svn.sourceforge.net/svnroot/textindexng/extension_modules/trunk ext_mod` - `cd ext_mod` - (using the python you use for Zope) `python setup.py install` Another approach is to simply install TextIndexNG (see http://opensource.zopyx.com/software/textindexng3) The stemmer must be instantiated with the language for which stemming is desired. It defaults to 'english'. For what it is worth, other languages supported as of this writing, using the strings that the stemmer expects, include the following: 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'italian', 'norwegian', 'portuguese', 'russian', 'spanish', and 'swedish'. For instance, let's build an index with an english stemmer. >>> from zope.index.text import textindex, lexicon >>> import zc.catalog.stemmer >>> lex = lexicon.Lexicon( ... lexicon.Splitter(), lexicon.CaseNormalizer(), ... lexicon.StopWordRemover(), zc.catalog.stemmer.Stemmer('english')) >>> ix = textindex.TextIndex(lex) >>> data = [ ... (0, 'consigned consistency consoles the constables'), ... (1, 'knaves kneeled and knocked knees, knowing no knights')] >>> for doc_id, text in data: ... ix.index_doc(doc_id, text) ... >>> list(ix.apply('consoling a constable')) [0] >>> list(ix.apply('knightly kneel')) [1] Note that query terms with globbing characters are not stemmed. >>> list(ix.apply('constables*')) [] ======================= Support for legacy data ======================= Prior to the introduction of btree "families" and the ``BTrees.Interfaces.IBTreeFamily`` interface, the indexes defined by the ``zc.catalog.index`` module used the instance attributes ``btreemodule`` and ``IOBTree``, initialized in the constructor, and the ``BTreeAPI`` property. These are replaced by the ``family`` attribute in the current implementation. This is a white-box test that verifies that the supported values in existing data structures (loaded from pickles) can be used effectively with the current implementation. There are two supported sets of values; one for 32-bit btrees:: >>> import BTrees.IOBTree >>> legacy32 = { ... "btreemodule": "BTrees.IFBTree", ... "IOBTree": BTrees.IOBTree.IOBTree, ... } and another for 64-bit btrees:: >>> import BTrees.LOBTree >>> legacy64 = { ... "btreemodule": "BTrees.LFBTree", ... "IOBTree": BTrees.LOBTree.LOBTree, ... } In each case, actual legacy structures will also include index structures that match the right integer size:: >>> import BTrees.OOBTree >>> import BTrees.Length >>> legacy32["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy32["documents_to_values"] = BTrees.IOBTree.IOBTree() >>> legacy32["documentCount"] = BTrees.Length.Length(0) >>> legacy32["wordCount"] = BTrees.Length.Length(0) >>> legacy64["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy64["documents_to_values"] = BTrees.LOBTree.LOBTree() >>> legacy64["documentCount"] = BTrees.Length.Length(0) >>> legacy64["wordCount"] = BTrees.Length.Length(0) What we want to do is verify that the ``family`` attribute is properly computed for instances loaded from legacy data, and ensure that the structure is updated cleanly without providing cause for a read-only transaction to become a write-transaction. We'll need to create instances that conform to the old data structures, pickle them, and show that unpickling them produces instances that use the correct families. Let's create new instances, and force the internal data to match the old structures:: >>> import pickle >>> import zc.catalog.index >>> vi32 = zc.catalog.index.ValueIndex() >>> vi32.__dict__ = legacy32.copy() >>> legacy32_pickle = pickle.dumps(vi32) >>> vi64 = zc.catalog.index.ValueIndex() >>> vi64.__dict__ = legacy64.copy() >>> legacy64_pickle = pickle.dumps(vi64) Now, let's unpickle these structures and verify the structures. We'll start with the 32-bit variety:: >>> vi32 = pickle.loads(legacy32_pickle) >>> vi32.__dict__["btreemodule"] 'BTrees.IFBTree' >>> vi32.__dict__["IOBTree"] >>> "family" in vi32.__dict__ False >>> vi32._p_changed False The ``family`` property returns the ``BTrees.family32`` singleton:: >>> vi32.family is BTrees.family32 True Once accessed, the legacy values have been cleaned out from the instance dictionary:: >>> "btreemodule" in vi32.__dict__ False >>> "IOBTree" in vi32.__dict__ False >>> "BTreeAPI" in vi32.__dict__ False Accessing these attributes as attributes provides the proper values anyway:: >>> vi32.btreemodule 'BTrees.IFBTree' >>> vi32.IOBTree >>> vi32.BTreeAPI Even though the instance dictionary has been cleaned up, the change flag hasn't been set. This is handled this way to avoid turning a read-only transaction into a write-transaction:: >>> vi32._p_changed False The 64-bit variation provides equivalent behavior:: >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64.__dict__["btreemodule"] 'BTrees.LFBTree' >>> vi64.__dict__["IOBTree"] >>> "family" in vi64.__dict__ False >>> vi64._p_changed False >>> vi64.family is BTrees.family64 True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree >>> vi64.BTreeAPI >>> vi64._p_changed False Now, if we have a legacy structure and explicitly set the ``family`` attribute, the old data structures will be cleared and replaced with the new structure. If the object is associated with a data manager, the changed flag will be set as well:: >>> class DataManager(object): ... def register(self, ob): ... pass >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64._p_jar = DataManager() >>> vi64.family = BTrees.family64 >>> vi64._p_changed True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> "family" in vi64.__dict__ True >>> vi64.family is BTrees.family64 True >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree >>> vi64.BTreeAPI ======= Globber ======= The globber takes a query and makes any term that isn't already a glob into something that ends in a star. It was originally envisioned as a *very* low- rent stemming hack. The author now questions its value, and hopes that the new stemming pipeline option can be used instead. Nonetheless, here is an example of it at work. >>> from zope.index.text import textindex >>> index = textindex.TextIndex() >>> lex = index.lexicon >>> from zc.catalog import globber >>> globber.glob('foo bar and baz or (b?ng not boo)', lex) '(((foo* and bar*) and baz*) or (b?ng and not boo*))' ================ Callable Wrapper ================ If we want to index some value that is easily derivable from a document, we have to define an interface with this value as an attribute, and create an adapter that calculates this value and implements this interface. All this is too much hassle if the want to store a single easily derivable value. CallableWrapper solves this problem, by converting the document to the indexed value with a callable converter. Here's a contrived example. Suppose we have cars that know their mileage expressed in miles per gallon, but we want to index their economy in litres per 100 km. >>> class Car(object): ... def __init__(self, mpg): ... self.mpg = mpg >>> def mpg2lp100(car): ... return 100.0/(1.609344/3.7854118 * car.mpg) Let's create an index that would index cars' l/100 km rating. >>> from zc.catalog import index, catalogindex >>> idx = catalogindex.CallableWrapper(index.ValueIndex(), mpg2lp100) Let's add a couple of cars to the index! >>> hummer = Car(10.0) >>> beamer = Car(22.0) >>> civic = Car(45.0) >>> idx.index_doc(1, hummer) >>> idx.index_doc(2, beamer) >>> idx.index_doc(3, civic) The indexed values should be the converted l/100 km ratings: >>> list(idx.values()) # doctest: +ELLIPSIS [5.22699076283393..., 10.691572014887601, 23.521458432752723] We can query for cars that consume fuel in some range: >>> list(idx.apply({'between': (5.0, 7.0)})) [3] ========================== zc.catalog Browser Support ========================== The zc.catalog.browser package adds simple TTW addition/inspection for SetIndex and ValueIndex. First, we need a browser so we can test the web UI. >>> from zope.testbrowser.testing import Browser >>> browser = Browser() >>> browser.addHeader('Authorization', 'Basic mgr:mgrpw') >>> browser.addHeader('Accept-Language', 'en-US') >>> browser.open('http://localhost') Now we need to add the catalog that these indexes are going to reside within. >>> browser.open('/++etc++site/default/@@contents.html') >>> browser.getLink('Add').click() >>> browser.getControl('Catalog').click() >>> browser.getControl(name='id').value = 'catalog' >>> browser.getControl('Add').click() SetIndex -------- Add the SetIndex to the catalog. >>> browser.getLink('Add').click() >>> browser.getControl('Set Index').click() >>> browser.getControl(name='id').value = 'set_index' >>> browser.getControl('Add').click() The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We'll use a simple interfaces for demonstration purposes, it's not really significant.) >>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'set_index' >>> browser.getControl('Add').click() Now we can look at the index and see how is is configured. >>> browser.getLink('set_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True... We need to go back to the catalog so we can add a different index. >>> browser.open('/++etc++site/default/catalog/@@contents.html') ValueIndex ---------- Add the ValueIndex to the catalog. >>> browser.getLink('Add').click() >>> browser.getControl('Value Index').click() >>> browser.getControl(name='id').value = 'value_index' >>> browser.getControl('Add').click() The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We'll use a simple interfaces for demonstration purposes, it's not really significant.) >>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'value_index' >>> browser.getControl('Add').click() Now we can look at the index and see how is is configured. >>> browser.getLink('value_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True... Keywords: zope3 i18n date time duration catalog index Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Web Environment Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: Zope Public License Classifier: Programming Language :: Python Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Topic :: Internet :: WWW/HTTP Classifier: Framework :: Zope3 zc.catalog-1.6/src/zc.catalog.egg-info/top_level.txt0000664000177100020040000000000312165325611023566 0ustar menesismenesis00000000000000zc zc.catalog-1.6/src/zc/0000775000177100020040000000000012165325611015740 5ustar menesismenesis00000000000000zc.catalog-1.6/src/zc/catalog/0000775000177100020040000000000012165325611017352 5ustar menesismenesis00000000000000zc.catalog-1.6/src/zc/catalog/legacy.txt0000664000177100020040000001171412165324663021371 0ustar menesismenesis00000000000000======================= Support for legacy data ======================= Prior to the introduction of btree "families" and the ``BTrees.Interfaces.IBTreeFamily`` interface, the indexes defined by the ``zc.catalog.index`` module used the instance attributes ``btreemodule`` and ``IOBTree``, initialized in the constructor, and the ``BTreeAPI`` property. These are replaced by the ``family`` attribute in the current implementation. This is a white-box test that verifies that the supported values in existing data structures (loaded from pickles) can be used effectively with the current implementation. There are two supported sets of values; one for 32-bit btrees:: >>> import BTrees.IOBTree >>> legacy32 = { ... "btreemodule": "BTrees.IFBTree", ... "IOBTree": BTrees.IOBTree.IOBTree, ... } and another for 64-bit btrees:: >>> import BTrees.LOBTree >>> legacy64 = { ... "btreemodule": "BTrees.LFBTree", ... "IOBTree": BTrees.LOBTree.LOBTree, ... } In each case, actual legacy structures will also include index structures that match the right integer size:: >>> import BTrees.OOBTree >>> import BTrees.Length >>> legacy32["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy32["documents_to_values"] = BTrees.IOBTree.IOBTree() >>> legacy32["documentCount"] = BTrees.Length.Length(0) >>> legacy32["wordCount"] = BTrees.Length.Length(0) >>> legacy64["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy64["documents_to_values"] = BTrees.LOBTree.LOBTree() >>> legacy64["documentCount"] = BTrees.Length.Length(0) >>> legacy64["wordCount"] = BTrees.Length.Length(0) What we want to do is verify that the ``family`` attribute is properly computed for instances loaded from legacy data, and ensure that the structure is updated cleanly without providing cause for a read-only transaction to become a write-transaction. We'll need to create instances that conform to the old data structures, pickle them, and show that unpickling them produces instances that use the correct families. Let's create new instances, and force the internal data to match the old structures:: >>> import pickle >>> import zc.catalog.index >>> vi32 = zc.catalog.index.ValueIndex() >>> vi32.__dict__ = legacy32.copy() >>> legacy32_pickle = pickle.dumps(vi32) >>> vi64 = zc.catalog.index.ValueIndex() >>> vi64.__dict__ = legacy64.copy() >>> legacy64_pickle = pickle.dumps(vi64) Now, let's unpickle these structures and verify the structures. We'll start with the 32-bit variety:: >>> vi32 = pickle.loads(legacy32_pickle) >>> vi32.__dict__["btreemodule"] 'BTrees.IFBTree' >>> vi32.__dict__["IOBTree"] >>> "family" in vi32.__dict__ False >>> vi32._p_changed False The ``family`` property returns the ``BTrees.family32`` singleton:: >>> vi32.family is BTrees.family32 True Once accessed, the legacy values have been cleaned out from the instance dictionary:: >>> "btreemodule" in vi32.__dict__ False >>> "IOBTree" in vi32.__dict__ False >>> "BTreeAPI" in vi32.__dict__ False Accessing these attributes as attributes provides the proper values anyway:: >>> vi32.btreemodule 'BTrees.IFBTree' >>> vi32.IOBTree >>> vi32.BTreeAPI Even though the instance dictionary has been cleaned up, the change flag hasn't been set. This is handled this way to avoid turning a read-only transaction into a write-transaction:: >>> vi32._p_changed False The 64-bit variation provides equivalent behavior:: >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64.__dict__["btreemodule"] 'BTrees.LFBTree' >>> vi64.__dict__["IOBTree"] >>> "family" in vi64.__dict__ False >>> vi64._p_changed False >>> vi64.family is BTrees.family64 True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree >>> vi64.BTreeAPI >>> vi64._p_changed False Now, if we have a legacy structure and explicitly set the ``family`` attribute, the old data structures will be cleared and replaced with the new structure. If the object is associated with a data manager, the changed flag will be set as well:: >>> class DataManager(object): ... def register(self, ob): ... pass >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64._p_jar = DataManager() >>> vi64.family = BTrees.family64 >>> vi64._p_changed True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> "family" in vi64.__dict__ True >>> vi64.family is BTrees.family64 True >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree >>> vi64.BTreeAPI zc.catalog-1.6/src/zc/catalog/globber.py0000664000177100020040000000307612165324663021354 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """glob all terms at the end. $Id: globber.py 2918 2005-07-19 22:12:38Z jim $ """ from zope.index.text import queryparser, parsetree reconstitute = {} reconstitute["NOT"] = lambda nd: "not %s" % ( reconstitute[nd.getValue().nodeType()](nd.getValue()),) reconstitute["AND"] = lambda nd: "(%s)" % (" and ".join(expand(nd)),) reconstitute["OR"] = lambda nd: "(%s)" % (" or ".join(expand(nd)),) reconstitute["ATOM"] = lambda nd: '%s*' % (nd.getValue()) reconstitute["PHRASE"] = lambda nd: '"%s"' % ( ' '.join((v + '*') for v in nd.getValue()),) reconstitute["GLOB"] = lambda nd: nd.getValue() expand = lambda nd: [reconstitute[n.nodeType()](n) for n in nd.getValue()] def glob(query, lexicon): # lexicon is index.lexicon try: tree = queryparser.QueryParser(lexicon).parseQuery(query) except parsetree.ParseError: return None if tree is not None: return reconstitute[tree.nodeType()](tree) else: return None zc.catalog-1.6/src/zc/catalog/tests.py0000664000177100020040000000633612165324663021104 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """catalog package test runner $Id: tests.py 2918 2005-07-19 22:12:38Z jim $ """ import unittest import doctest from zope.testing import module import zope.component.testing import zope.component.factory import zope.component.interfaces import zc.catalog import zc.catalog.interfaces import BTrees.Interfaces import BTrees.LOBTree import BTrees.OLBTree import BTrees.LFBTree def setUp32bit(test): zope.component.testing.setUp(test) test.globs["btrees_family"] = BTrees.family32 def modSetUp32bit(test): setUp32bit(test) module.setUp(test, 'zc.catalog.doctest_test') def setUp64bit(test): zope.component.testing.setUp(test) test.globs["btrees_family"] = BTrees.family64 def modSetUp64bit(test): setUp64bit(test) module.setUp(test, 'zc.catalog.doctest_test') def tearDown(test): zope.component.testing.tearDown(test) def modTearDown(test): module.tearDown(test) zope.component.testing.tearDown(test) def test_suite(): tests = unittest.TestSuite(( # 32 bits doctest.DocFileSuite( 'extentcatalog.txt', setUp=modSetUp32bit, tearDown=modTearDown), doctest.DocFileSuite( 'setindex.txt', setUp=setUp32bit, tearDown=tearDown), doctest.DocFileSuite( 'valueindex.txt', setUp=setUp32bit, tearDown=tearDown), doctest.DocFileSuite( 'normalizedindex.txt', setUp=setUp32bit, tearDown=tearDown), doctest.DocFileSuite( 'globber.txt', setUp=setUp32bit, tearDown=tearDown), doctest.DocFileSuite( 'callablewrapper.txt', setUp=setUp32bit, tearDown=tearDown), # 64 bits doctest.DocFileSuite( 'extentcatalog.txt', setUp=modSetUp64bit, tearDown=modTearDown), doctest.DocFileSuite('setindex.txt', setUp=setUp64bit, tearDown=tearDown), doctest.DocFileSuite('valueindex.txt', setUp=setUp64bit, tearDown=tearDown), doctest.DocFileSuite('normalizedindex.txt', setUp=setUp64bit, tearDown=tearDown), doctest.DocFileSuite('globber.txt', setUp=setUp64bit, tearDown=tearDown), doctest.DocFileSuite('callablewrapper.txt', setUp=setUp64bit, tearDown=tearDown), # legacy data support doctest.DocFileSuite('legacy.txt', optionflags=doctest.ELLIPSIS), )) import zc.catalog.stemmer if not zc.catalog.stemmer.broken: tests.addTest(doctest.DocFileSuite('stemmer.txt')) return tests if __name__ == '__main__': unittest.main(defaultTest='test_suite') zc.catalog-1.6/src/zc/catalog/setindex.txt0000664000177100020040000001622312165324663021750 0ustar menesismenesis00000000000000========= Set Index ========= The setindex is an index similar to, but more general than a traditional keyword index. The values indexed are expected to be iterables; the index allows searches for documents that contain any of a set of values; all of a set of values; or between a set of values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import SetIndex >>> index = SetIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. In our example, we hope that strings and integers will sort consistently; this may not be a reasonable hope. >>> data = {1: ['a', 1], ... 2: ['b', 'a', 3, 4, 7], ... 3: [1], ... 4: [1, 4, 'c'], ... 5: [7], ... 6: [5, 6, 7], ... 7: ['c'], ... 8: [1, 6], ... 9: ['a', 'c', 2, 3, 4, 6,], ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) [1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c'] >>> index.documentCount() 9 >>> index.wordCount() 10 >>> index.maxValue() 'c' >>> index.minValue() 1 >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports five types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are weighted. >>> list(index.apply({'any_of':('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of': ('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of':(42,)})) [] >>> index.apply({'any_of': ('a', 3, 7)}) # doctest: +ELLIPSIS BTrees...FBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)]) Another query is 'any'. If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'all_of' argument also takes an iterable of values, but returns an iterable of document ids that contains all of the values. The results are not weighted. >>> list(index.apply({'all_of': ('a',)})) [1, 2, 9] >>> list(index.apply({'all_of': (3, 4)})) [2, 9] These tests illustrate two related reported errors that have been fixed. >>> list(index.apply({'all_of': ('z', 3, 4)})) [] >>> list(index.apply({'all_of': (3, 4, 'z')})) [] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are weighted. >>> list(index.apply({'between': (1, 7)})) [1, 2, 3, 4, 5, 6, 8, 9] >>> list(index.apply({'between': ('b', None)})) [2, 4, 7, 9] >>> list(index.apply({'between': ('b',)})) [2, 4, 7, 9] >>> list(index.apply({'between': (1, 7, True, True)})) [2, 4, 6, 8, 9] >>> index.apply({'between': (2, 6)}) # doctest: +ELLIPSIS BTrees...FBucket([(2, 2.0), (4, 1.0), (6, 2.0), (8, 1.0), (9, 4.0)]) The 'none' argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values. >>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14] Trying to use more than one of these at a time generates an error. >>> index.apply({'all_of': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... Using none of them simply returns None. >>> index.apply({}) # returns None Invalid query names cause ValueErrors. >>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... When you unindex a document, the searches and statistics should be updated. >>> index.unindex_doc(6) >>> len(index.apply({'any_of': (5,)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 9 >>> list(index.values()) [1, 2, 3, 4, 6, 7, 'a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 5, 7, 8, 9] Reindexing a document that has new additional values also is reflected in subsequent searches and statistic checks. >>> data[8].extend([5, 'c']) >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 10 >>> list(index.apply({'any_of': (5,)})) [8] >>> list(index.apply({'any_of': ('c',)})) [4, 7, 8, 9] The same is true for reindexing a document with both additions and removals. >>> 2 in set(index.apply({'any_of': (7,)})) True >>> 2 in set(index.apply({'any_of': (2,)})) False >>> data[2].pop() 7 >>> data[2].append(2) >>> index.index_doc(2, data[2]) >>> 2 in set(index.apply({'any_of': (7,)})) False >>> 2 in set(index.apply({'any_of': (2,)})) True Reindexing a document that no longer has any values causes it to be removed from the statistics. >>> del data[2][:] >>> index.index_doc(2, data[2]) >>> index.documentCount() 7 >>> index.wordCount() 9 >>> list(index.ids()) [1, 3, 4, 5, 7, 8, 9] This affects both ways of determining the ids that are and are not in the index (that do and do not have values). >>> list(index.apply({'any': None})) [1, 3, 4, 5, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 2, 6, 10, 11, 12, 13, 14] The values method can be used to examine the indexed values for a given document id. >>> set(index.values(doc_id=8)) == set([1, 5, 6, 'c']) True And the containsValue method provides a way of determining membership in the values. >>> index.containsValue(5) True >>> index.containsValue(20) False zc.catalog-1.6/src/zc/catalog/browser/0000775000177100020040000000000012165325611021035 5ustar menesismenesis00000000000000zc.catalog-1.6/src/zc/catalog/browser/README.txt0000664000177100020040000000606312165324663022546 0ustar menesismenesis00000000000000========================== zc.catalog Browser Support ========================== The zc.catalog.browser package adds simple TTW addition/inspection for SetIndex and ValueIndex. First, we need a browser so we can test the web UI. >>> from zope.testbrowser.testing import Browser >>> browser = Browser() >>> browser.addHeader('Authorization', 'Basic mgr:mgrpw') >>> browser.addHeader('Accept-Language', 'en-US') >>> browser.open('http://localhost') Now we need to add the catalog that these indexes are going to reside within. >>> browser.open('/++etc++site/default/@@contents.html') >>> browser.getLink('Add').click() >>> browser.getControl('Catalog').click() >>> browser.getControl(name='id').value = 'catalog' >>> browser.getControl('Add').click() SetIndex -------- Add the SetIndex to the catalog. >>> browser.getLink('Add').click() >>> browser.getControl('Set Index').click() >>> browser.getControl(name='id').value = 'set_index' >>> browser.getControl('Add').click() The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We'll use a simple interfaces for demonstration purposes, it's not really significant.) >>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'set_index' >>> browser.getControl('Add').click() Now we can look at the index and see how is is configured. >>> browser.getLink('set_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True... We need to go back to the catalog so we can add a different index. >>> browser.open('/++etc++site/default/catalog/@@contents.html') ValueIndex ---------- Add the ValueIndex to the catalog. >>> browser.getLink('Add').click() >>> browser.getControl('Value Index').click() >>> browser.getControl(name='id').value = 'value_index' >>> browser.getControl('Add').click() The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We'll use a simple interfaces for demonstration purposes, it's not really significant.) >>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'value_index' >>> browser.getControl('Add').click() Now we can look at the index and see how is is configured. >>> browser.getLink('value_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True... zc.catalog-1.6/src/zc/catalog/browser/tests.py0000664000177100020040000000413212165324663022557 0ustar menesismenesis00000000000000############################################################################## # # Copyright (c) 2004 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Functional tests for `zc.catalog.browser`. $Id$ """ import doctest import os.path import unittest import transaction import zope.intid import zope.intid.interfaces try: # Only include the functional tests whenever the required dependencies # are installed. import zope.app.appsetup.bootstrap import zope.app.appsetup.interfaces import zope.app.testing.functional here = os.path.dirname(os.path.realpath(__file__)) ZcCatalogLayer = zope.app.testing.functional.ZCMLLayer( os.path.join(here, "ftesting.zcml"), __name__, "ZcCatalogLayer") @zope.component.adapter( zope.app.appsetup.interfaces.IDatabaseOpenedWithRootEvent) def initializeIntIds(event): db, connection, root, root_folder = ( zope.app.appsetup.bootstrap.getInformationFromEvent(event)) sm = root_folder.getSiteManager() intids = zope.intid.IntIds() sm["default"]["test-int-ids"] = intids sm.registerUtility( intids, zope.intid.interfaces.IIntIds) transaction.commit() connection.close() def test_suite(): suite = zope.app.testing.functional.FunctionalDocFileSuite( "README.txt", optionflags=doctest.ELLIPSIS|doctest.NORMALIZE_WHITESPACE) suite.layer = ZcCatalogLayer return suite except (ImportError,), e: def test_suite(): return unittest.TestSuite() if __name__ == "__main__": unittest.main(defaultTest="test_suite") zc.catalog-1.6/src/zc/catalog/browser/ftesting.zcml0000664000177100020040000000261712165324663023563 0ustar menesismenesis00000000000000 zc.catalog-1.6/src/zc/catalog/browser/__init__.py0000664000177100020040000000000212165324663023144 0ustar menesismenesis00000000000000# zc.catalog-1.6/src/zc/catalog/browser/configure.zcml0000664000177100020040000000363412165324663023721 0ustar menesismenesis00000000000000 zc.catalog-1.6/src/zc/catalog/catalogindex.py0000664000177100020040000000532512165324663022401 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Indexes appropriate for zope.catalog $Id: catalogindex.py 2918 2005-07-19 22:12:38Z jim $ """ import zope.interface import zope.catalog.attribute import zope.container.contained import zope.index.interfaces import zc.catalog.index import zc.catalog.interfaces class ValueIndex(zope.catalog.attribute.AttributeIndex, zc.catalog.index.ValueIndex, zope.container.contained.Contained): zope.interface.implements(zc.catalog.interfaces.ICatalogValueIndex) class SetIndex(zope.catalog.attribute.AttributeIndex, zc.catalog.index.SetIndex, zope.container.contained.Contained): zope.interface.implements(zc.catalog.interfaces.ICatalogSetIndex) class NormalizationWrapper( zope.catalog.attribute.AttributeIndex, zc.catalog.index.NormalizationWrapper, zope.container.contained.Contained): pass class CallableWrapper(zc.catalog.index.CallableWrapper, zope.container.contained.Contained): zope.interface.implements(zc.catalog.interfaces.ICallableWrapper) @zope.interface.implementer( zope.interface.implementedBy(NormalizationWrapper), zc.catalog.interfaces.IValueIndex, zope.index.interfaces.IIndexSort) def DateTimeValueIndex( field_name=None, interface=None, field_callable=False, resolution=2): # hour; good for per-day searches ix = NormalizationWrapper( field_name, interface, field_callable, zc.catalog.index.ValueIndex(), zc.catalog.index.DateTimeNormalizer(resolution), False) zope.interface.alsoProvides(ix, zc.catalog.interfaces.IValueIndex) return ix @zope.interface.implementer( zope.interface.implementedBy(NormalizationWrapper), zc.catalog.interfaces.ISetIndex) def DateTimeSetIndex( field_name=None, interface=None, field_callable=False, resolution=2): # hour; good for per-day searches ix = NormalizationWrapper( field_name, interface, field_callable, zc.catalog.index.SetIndex(), zc.catalog.index.DateTimeNormalizer(resolution), True) zope.interface.alsoProvides(ix, zc.catalog.interfaces.ISetIndex) return ix zc.catalog-1.6/src/zc/catalog/i18n.py0000664000177100020040000000213312165324663020510 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """I18N support for tasks. This defines a `MessageFactory` for the I18N domain for the catalog package. This is normally used with this import:: from i18n import MessageFactory as _ The factory is then used normally. Two examples:: text = _('some internationalized text') text = _('helpful-descriptive-message-id', 'default text') """ __docformat__ = "reStructuredText" from zope import i18nmessageid MessageFactory = _ = i18nmessageid.MessageFactory("zc.catalog") zc.catalog-1.6/src/zc/catalog/__init__.py0000664000177100020040000000122412165324663021470 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2007 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """zc.catalog package""" zc.catalog-1.6/src/zc/catalog/callablewrapper.txt0000664000177100020040000000275012165324663023265 0ustar menesismenesis00000000000000================ Callable Wrapper ================ If we want to index some value that is easily derivable from a document, we have to define an interface with this value as an attribute, and create an adapter that calculates this value and implements this interface. All this is too much hassle if the want to store a single easily derivable value. CallableWrapper solves this problem, by converting the document to the indexed value with a callable converter. Here's a contrived example. Suppose we have cars that know their mileage expressed in miles per gallon, but we want to index their economy in litres per 100 km. >>> class Car(object): ... def __init__(self, mpg): ... self.mpg = mpg >>> def mpg2lp100(car): ... return 100.0/(1.609344/3.7854118 * car.mpg) Let's create an index that would index cars' l/100 km rating. >>> from zc.catalog import index, catalogindex >>> idx = catalogindex.CallableWrapper(index.ValueIndex(), mpg2lp100) Let's add a couple of cars to the index! >>> hummer = Car(10.0) >>> beamer = Car(22.0) >>> civic = Car(45.0) >>> idx.index_doc(1, hummer) >>> idx.index_doc(2, beamer) >>> idx.index_doc(3, civic) The indexed values should be the converted l/100 km ratings: >>> list(idx.values()) # doctest: +ELLIPSIS [5.22699076283393..., 10.691572014887601, 23.521458432752723] We can query for cars that consume fuel in some range: >>> list(idx.apply({'between': (5.0, 7.0)})) [3] zc.catalog-1.6/src/zc/catalog/configure.zcml0000664000177100020040000000411112165324663022225 0ustar menesismenesis00000000000000 zc.catalog-1.6/src/zc/catalog/index.py0000664000177100020040000004553512165324663021055 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """indexes, as might be found in zope.index $Id: index.py 2918 2005-07-19 22:12:38Z jim $ """ import sys import datetime import pytz.reference import BTrees import persistent from BTrees import Length from BTrees.Interfaces import IMerge from zope import component, interface import zope.component.interfaces import zope.interface.common.idatetime import zope.index.interfaces from zope.index.field.sorting import SortingIndexMixin import zope.security.management from zope.publisher.interfaces import IRequest import zc.catalog.interfaces from zc.catalog.i18n import _ class FamilyProperty(object): __name__ = "family" def __get__(self, instance, type=None): if instance is None: return self d = instance.__dict__ if "family" in d: return d["family"] if "btreemodule" in d: iftype = d["btreemodule"].split(".")[-1][:2] if iftype == "IF": d["family"] = BTrees.family32 elif iftype == "LF": d["family"] = BTrees.family64 else: raise ValueError("can't determine btree family based on" " btreemodule of %r" % (iftype,)) else: d["family"] = BTrees.family32 self._clear_old_cruft(instance) return d["family"] def __set__(self, instance, value): instance.__dict__["family"] = value self._clear_old_cruft(instance) def _clear_old_cruft(self, instance): d = instance.__dict__ if "btreemodule" in d: del d["btreemodule"] if "IOBTree" in d: del d["IOBTree"] if "BTreeAPI" in d: del d["BTreeAPI"] class AbstractIndex(persistent.Persistent): interface.implements(zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, zc.catalog.interfaces.IIndexValues, ) family = FamilyProperty() def __init__(self, family=None): if family is not None: self.family = family self.clear() # These three are deprecated (they were never interface), but can # all be computed from the family attribute: @property def btreemodule(self): return self.family.IF.__name__ @property def BTreeAPI(self): return self.family.IF @property def IOBTree(self): return self.family.IO.BTree def clear(self): self.values_to_documents = self.family.OO.BTree() self.documents_to_values = self.family.IO.BTree() self.documentCount = Length.Length(0) self.wordCount = Length.Length(0) def minValue(self, min=None): if min is None: return self.values_to_documents.minKey() else: return self.values_to_documents.minKey(min) def maxValue(self, max=None): if max is None: return self.values_to_documents.maxKey() else: return self.values_to_documents.maxKey(max) def values(self, min=None, max=None, excludemin=False, excludemax=False, doc_id=None): if doc_id is None: return iter(self.values_to_documents.keys( min, max, excludemin, excludemax)) else: values = self.documents_to_values.get(doc_id) if values is None: return () else: return iter(values.keys(min, max, excludemin, excludemax)) def containsValue(self, value): return bool(self.values_to_documents.has_key(value)) def ids(self): return self.documents_to_values.keys() def parseQuery(query): if isinstance(query, dict): if len(query) > 1: raise ValueError( 'may only pass one of key, value pair') elif not query: return None, None query_type, query = query.items()[0] query_type = query_type.lower() else: raise ValueError('may only pass a dict to apply') return query_type, query class ValueIndex(SortingIndexMixin, AbstractIndex): interface.implements(zc.catalog.interfaces.IValueIndex) # attributes used by sorting mixin _sorting_num_docs_attr = 'documentCount' # Length object _sorting_fwd_index_attr = 'values_to_documents' # forward BTree index _sorting_rev_index_attr = 'documents_to_values' # reverse BTree index def _add_value(self, doc_id, added): values_to_documents = self.values_to_documents docs = values_to_documents.get(added) if docs is None: values_to_documents[added] = self.family.IF.TreeSet((doc_id,)) self.wordCount.change(1) else: docs.insert(doc_id) def index_doc(self, doc_id, value): if value is None: self.unindex_doc(doc_id) else: values_to_documents = self.values_to_documents documents_to_values = self.documents_to_values old = documents_to_values.get(doc_id) documents_to_values[doc_id] = value if old is None: self.documentCount.change(1) elif old != value: docs = values_to_documents.get(old) docs.remove(doc_id) if not docs: del values_to_documents[old] self.wordCount.change(-1) self._add_value(doc_id, value) def unindex_doc(self, doc_id): documents_to_values = self.documents_to_values value = documents_to_values.get(doc_id) if value is not None: values_to_documents = self.values_to_documents self.documentCount.change(-1) del documents_to_values[doc_id] docs = values_to_documents.get(value) docs.remove(doc_id) if not docs: del values_to_documents[value] self.wordCount.change(-1) def apply(self, query): # any_of, any, between, none, values_to_documents = self.values_to_documents query_type, query = parseQuery(query) if query_type is None: res = None elif query_type == 'any_of': res = self.family.IF.multiunion( [s for s in (values_to_documents.get(v) for v in query) if s is not None]) elif query_type == 'any': if query is None: res = self.family.IF.Set(self.ids()) else: assert zc.catalog.interfaces.IExtent.providedBy(query) res = query & self.family.IF.Set(self.ids()) elif query_type == 'between': res = self.family.IF.multiunion( [s for s in (values_to_documents.get(v) for v in values_to_documents.keys(*query)) if s is not None]) elif query_type == 'none': assert zc.catalog.interfaces.IExtent.providedBy(query) res = query - self.family.IF.Set(self.ids()) else: raise ValueError( "unknown query type", query_type) return res def values(self, min=None, max=None, excludemin=False, excludemax=False, doc_id=None): if doc_id is None: return iter(self.values_to_documents.keys( min, max, excludemin, excludemax)) else: value = self.documents_to_values.get(doc_id) if (value is None or min is not None and ( value < min or excludemin and value == min) or max is not None and ( value > max or excludemax and value == max)): return () else: return (value,) class SetIndex(AbstractIndex): interface.implements(zc.catalog.interfaces.ISetIndex) def _add_values(self, doc_id, added): values_to_documents = self.values_to_documents for v in added: docs = values_to_documents.get(v) if docs is None: values_to_documents[v] = self.family.IF.TreeSet((doc_id,)) self.wordCount.change(1) else: docs.insert(doc_id) def index_doc(self, doc_id, value): new = self.family.OO.TreeSet(v for v in value if v is not None) if not new: self.unindex_doc(doc_id) else: values_to_documents = self.values_to_documents documents_to_values = self.documents_to_values old = documents_to_values.get(doc_id) if old is None: documents_to_values[doc_id] = new self.documentCount.change(1) self._add_values(doc_id, new) else: removed = self.family.OO.difference(old, new) added = self.family.OO.difference(new, old) for v in removed: old.remove(v) docs = values_to_documents.get(v) docs.remove(doc_id) if not docs: del values_to_documents[v] self.wordCount.change(-1) old.update(added) self._add_values(doc_id, added) def unindex_doc(self, doc_id): documents_to_values = self.documents_to_values values = documents_to_values.get(doc_id) if values is not None: values_to_documents = self.values_to_documents self.documentCount.change(-1) del documents_to_values[doc_id] for v in values: docs = values_to_documents.get(v) docs.remove(doc_id) if not docs: del values_to_documents[v] self.wordCount.change(-1) def apply(self, query): # any_of, any, between, none, all_of values_to_documents = self.values_to_documents query_type, query = parseQuery(query) if query_type is None: res = None elif query_type == 'any_of': res = self.family.IF.Bucket() for v in query: _, res = self.family.IF.weightedUnion( res, values_to_documents.get(v)) elif query_type == 'any': if query is None: res = self.family.IF.Set(self.ids()) else: assert zc.catalog.interfaces.IExtent.providedBy(query) res = query & self.family.IF.Set(self.ids()) elif query_type == 'all_of': res = None values = iter(query) empty = self.family.IF.TreeSet() try: res = values_to_documents.get(values.next(), empty) except StopIteration: res = empty while res: try: v = values.next() except StopIteration: break res = self.family.IF.intersection( res, values_to_documents.get(v, empty)) elif query_type == 'between': res = self.family.IF.Bucket() for v in values_to_documents.keys(*query): _, res = self.family.IF.weightedUnion( res, values_to_documents.get(v)) elif query_type == 'none': assert zc.catalog.interfaces.IExtent.providedBy(query) res = query - self.family.IF.Set(self.ids()) else: raise ValueError( "unknown query type", query_type) return res class NormalizationWrapper(persistent.Persistent): interface.implements(zc.catalog.interfaces.INormalizationWrapper) index = normalizer = None collection_index = False def documentCount(self): return self.index.documentCount() def wordCount(self): return self.index.wordCount() def clear(self): """see zope.index.interfaces.IInjection.clear""" return self.index.clear() def __init__(self, index, normalizer, collection_index=False): self.index = index if zope.index.interfaces.IIndexSort.providedBy(index): zope.interface.alsoProvides(self, zope.index.interfaces.IIndexSort) self.normalizer = normalizer self.collection_index = collection_index def index_doc(self, doc_id, value): if self.collection_index: self.index.index_doc( doc_id, (self.normalizer.value(v) for v in value)) else: self.index.index_doc(doc_id, self.normalizer.value(value)) def unindex_doc(self, doc_id): self.index.unindex_doc(doc_id) def apply(self, query): query_type, query = parseQuery(query) if query_type == 'any_of': res = set() for v in query: res.update(self.normalizer.any(v, self.index)) elif query_type == 'all_of': res = [self.normalizer.all(v, self.index) for v in query] elif query_type == 'between': query = tuple(query) # collect iterators len_query = len(query) max_exclude = len_query >= 4 and bool(query[3]) min_exclude = len_query >= 3 and bool(query[2]) max = len_query >= 2 and query[1] and self.normalizer.maximum( query[1], self.index, max_exclude) or None min = len_query >= 1 and query[0] and self.normalizer.minimum( query[0], self.index, min_exclude) or None res = (min, max, min_exclude, max_exclude) else: res = query return self.index.apply({query_type: res}) def minValue(self, min=None): if min is not None: min = self.normalizer.minimum(min, self.index) return self.index.minValue(min) def maxValue(self, max=None): if max is not None: max = self.normalizer.maximum(max, self.index) return self.index.maxValue(max) def values(self, min=None, max=None, excludemin=False, excludemax=False, doc_id=None): if min is not None: min = self.normalizer.minimum(min, self.index) if max is not None: max = self.normalizer.maximum(max, self.index) return self.index.values(min, max, excludemin, excludemax, doc_id=doc_id) def containsValue(self, value): return self.index.containsValue(value) def ids(self): return self.index.ids() @property def sort(self): # delegate upstream or raise AttributeError return self.index.sort class CallableWrapper(persistent.Persistent): interface.implements(zc.catalog.interfaces.ICallableWrapper) converter = None index = None def __init__(self, index, converter): self.index = index self.converter = converter def index_doc(self, docid, value): "See zope.index.interfaces.IInjection" self.index.index_doc(docid, self.converter(value)) def __getattr__(self, name): return getattr(self.index, name) def set_resolution(value, resolution): resolution += 2 if resolution < 6: args = [] args.extend(value.timetuple()[:resolution+1]) args.extend([0]*(6-resolution)) args.append(value.tzinfo) value = datetime.datetime(*args) return value def get_request(): i = zope.security.management.queryInteraction() if i is not None: for p in i.participations: if IRequest.providedBy(p): return p return None def get_tz(default=pytz.reference.Local): request = get_request() if request is None: return default return zope.interface.common.idatetime.ITZInfo(request, default) def add_tz(value): if type(value) is datetime.datetime: if value.tzinfo is None: value = value.replace(tzinfo=get_tz()) return value else: raise ValueError(value) def day_end(value): return ( datetime.datetime.combine( value, datetime.time(tzinfo=get_tz())) + datetime.timedelta(days=1) - # separate for daylight savings datetime.timedelta(microseconds=1)) def day_begin(value): return datetime.datetime.combine( value, datetime.time(tzinfo=get_tz())) class DateTimeNormalizer(persistent.Persistent): interface.implements(zc.catalog.interfaces.IDateTimeNormalizer) def __init__(self, resolution=2): self.resolution = resolution # 0, 1, 2, 3, 4 # day, hour, minute, second, microsecond def value(self, value): if not isinstance(value, datetime.datetime) or value.tzinfo is None: raise ValueError( _('This index only indexes timezone-aware datetimes.')) return set_resolution(value, self.resolution) def any(self, value, index): if type(value) is datetime.date: start = datetime.datetime.combine( value, datetime.time(tzinfo=get_tz())) stop = start + datetime.timedelta(days=1) return index.values(start, stop, False, True) return (add_tz(value),) def all(self, value, index): return add_tz(value) def minimum(self, value, index, exclude=False): if type(value) is datetime.date: if exclude: return day_end(value) else: return day_begin(value) return add_tz(value) def maximum(self, value, index, exclude=False): if type(value) is datetime.date: if exclude: return day_begin(value) else: return day_end(value) return add_tz(value) @interface.implementer( zope.interface.implementedBy(NormalizationWrapper), zope.index.interfaces.IIndexSort, zc.catalog.interfaces.IValueIndex) def DateTimeValueIndex(resolution=2): # 2 == minute; note that hour is good # for timezone-aware per-day searches ix = NormalizationWrapper(ValueIndex(), DateTimeNormalizer(resolution)) interface.alsoProvides(ix, zc.catalog.interfaces.IValueIndex) return ix @interface.implementer( zope.interface.implementedBy(NormalizationWrapper), zc.catalog.interfaces.ISetIndex) def DateTimeSetIndex(resolution=2): # 2 == minute; note that hour is good # for timezone-aware per-day searches ix = NormalizationWrapper(SetIndex(), DateTimeNormalizer(resolution), True) interface.alsoProvides(ix, zc.catalog.interfaces.ISetIndex) return ix zc.catalog-1.6/src/zc/catalog/stemmer.py0000664000177100020040000000467612165324663021423 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """A stemmer based on the textindexng stemmer, itself based on snowball. $Id: stemmer.py 2918 2005-07-19 22:12:38Z jim $ """ import re broken = None try: from zopyx.txng3.ext import stemmer except ImportError: try: from zopyx.txng3 import stemmer except ImportError: try: import txngstemmer as stemmer except ImportError: stemmer = None class Broken: def stem(self, l): return l broken = Broken() # as of this writing, trying to persist a txngstemmer.Stemmer makes the python # process end, only printing a "Bus error" message before quitting. Don't do # that. July 16 2005 # 2010-03-09 While Stemmer still isn't pickleable, zopyx.txng3.ext 3.3.2 fixes # the crashes. class Stemmer(object): def __init__(self, language='english'): self.language = language @property def stemmer(self): if stemmer is None: return broken return stemmer.Stemmer(self.language) rxGlob = re.compile(r"[*?]") # See globToWordIds() in # zope/index/text/lexicon.py def process(self, lst): stemmer = self.stemmer result = [] for s in lst: try: s = unicode(s) except UnicodeDecodeError: pass else: s = stemmer.stem((s,))[0] result.append(s) return result def processGlob(self, lst): stemmer = self.stemmer result = [] rxGlob = self.rxGlob for s in lst: if not rxGlob.search(s): try: s = unicode(s) except UnicodeDecodeError: pass else: s = stemmer.stem((s,))[0] result.append(s) return result zc.catalog-1.6/src/zc/catalog/extentcatalog.txt0000664000177100020040000004172512165324663022774 0ustar menesismenesis00000000000000============== Extent Catalog ============== An extent catalog is very similar to a normal catalog except that it only indexes items addable to its extent. The extent is both a filter and a set that may be merged with other result sets. The filtering is an additional feature we will discuss below; we'll begin with a simple "do nothing" extent that only supports the second use case. We create the state that the text needs here. >>> import zope.keyreference.persistent >>> import zope.component >>> import zope.intid >>> import zope.component >>> import zope.component.interfaces >>> import zope.component.persistentregistry >>> from ZODB.tests.util import DB >>> import transaction >>> zope.component.provideAdapter( ... zope.keyreference.persistent.KeyReferenceToPersistent, ... adapts=(zope.interface.Interface,)) >>> zope.component.provideAdapter( ... zope.keyreference.persistent.connectionOfPersistent, ... adapts=(zope.interface.Interface,)) >>> site_manager = None >>> def getSiteManager(context=None): ... if context is None: ... if site_manager is None: ... return zope.component.getGlobalSiteManager() ... else: ... return site_manager ... else: ... try: ... return zope.component.interfaces.IComponentLookup(context) ... except TypeError, error: ... raise zope.component.ComponentLookupError(*error.args) ... >>> def setSiteManager(sm): ... global site_manager ... site_manager = sm ... if sm is None: ... zope.component.getSiteManager.reset() ... else: ... zope.component.getSiteManager.sethook(getSiteManager) ... >>> def makeRoot(): ... db = DB() ... conn = db.open() ... root = conn.root() ... site_manager = root['components'] = ( ... zope.component.persistentregistry.PersistentComponents()) ... site_manager.__bases__ = (zope.component.getGlobalSiteManager(),) ... site_manager.registerUtility( ... zope.intid.IntIds(family=btrees_family), ... provided=zope.intid.interfaces.IIntIds) ... setSiteManager(site_manager) ... transaction.commit() ... return root ... >>> @zope.component.adapter(zope.interface.Interface) ... @zope.interface.implementer(zope.component.interfaces.IComponentLookup) ... def getComponentLookup(obj): ... return obj._p_jar.root()['components'] ... >>> zope.component.provideAdapter(getComponentLookup) To show the extent catalog at work, we need an intid utility, an index, some items to index. We'll do this within a real ZODB and a real intid utility. >>> import zc.catalog >>> import zc.catalog.interfaces >>> from zc.catalog import interfaces, extentcatalog >>> from zope import interface, component >>> from zope.interface import verify >>> import persistent >>> import BTrees.IFBTree >>> root = makeRoot() >>> intid = zope.component.getUtility( ... zope.intid.interfaces.IIntIds, context=root) >>> TreeSet = btrees_family.IF.TreeSet >>> from zope.container.interfaces import IContained >>> class DummyIndex(persistent.Persistent): ... interface.implements(IContained) ... __parent__ = __name__ = None ... def __init__(self): ... self.uids = TreeSet() ... def unindex_doc(self, uid): ... if uid in self.uids: ... self.uids.remove(uid) ... def index_doc(self, uid, obj): ... self.uids.insert(uid) ... def clear(self): ... self.uids.clear() ... def apply(self, query): ... return [uid for uid in self.uids if uid <= query] ... >>> class DummyContent(persistent.Persistent): ... def __init__(self, name, parent): ... self.id = name ... self.__parent__ = parent ... >>> extent = extentcatalog.Extent(family=btrees_family) >>> verify.verifyObject(interfaces.IExtent, extent) True >>> root['catalog'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit() Now we have a catalog set up with an index and an extent. We can add some data to the extent: >>> matches = [] >>> for i in range(100): ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... matches.append(doc_id) >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True We can get the size of the extent. >>> len(extent) 100 Unindexing an object that is in the catalog should simply remove it from the catalog and index as usual. >>> matches[0] in catalog.extent True >>> matches[0] in catalog['index'].uids True >>> catalog.unindex_doc(matches[0]) >>> matches[0] in catalog.extent False >>> matches[0] in catalog['index'].uids False >>> doc_id = matches.pop(0) >>> sorted(extent) == sorted(index.uids) == matches True Clearing the catalog clears both the extent and the contained indexes. >>> catalog.clear() >>> list(catalog.extent) == list(catalog['index'].uids) == [] True Updating all indexes and an individual index both also update the extent. >>> catalog.updateIndexes() >>> matches.insert(0, doc_id) >>> sorted(extent) == sorted(index.uids) == matches True >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) == matches True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0]) But so why have an extent in the first place? It allows indices to operate against a reliable collection of the full indexed data; therefore, it allows the indices in zc.catalog to perform NOT operations. The extent itself provides a number of merging features to allow its values to be merged with other BTrees.IFBTree data structures. These include intersection, union, difference, and reverse difference. Given an extent named 'extent' and another IFBTree data structure named 'data', intersections can be spelled "extent & data" or "data & extent"; unions can be spelled "extent | data" or "data | extent"; differences can be spelled "extent - data"; and reverse differences can be spelled "data - extent". Unions and intersections are weighted. >>> extent = extentcatalog.Extent(family=btrees_family) >>> for i in range(1, 100, 2): ... extent.add(i, None) ... >>> alt_set = TreeSet() >>> alt_set.update(range(0, 166, 33)) # return value is unimportant here 6 >>> sorted(alt_set) [0, 33, 66, 99, 132, 165] >>> sorted(extent & alt_set) [33, 99] >>> sorted(alt_set & extent) [33, 99] >>> sorted(extent.intersection(alt_set)) [33, 99] >>> original = set(extent) >>> union_matches = original.copy() >>> union_matches.update(alt_set) >>> union_matches = sorted(union_matches) >>> sorted(alt_set | extent) == union_matches True >>> sorted(extent | alt_set) == union_matches True >>> sorted(extent.union(alt_set)) == union_matches True >>> sorted(alt_set - extent) [0, 66, 132, 165] >>> sorted(extent.rdifference(alt_set)) [0, 66, 132, 165] >>> original.remove(33) >>> original.remove(99) >>> set(extent - alt_set) == original True >>> set(extent.difference(alt_set)) == original True We can pass our own instantiated UID utility to extentcatalog.Catalog. >>> extent = extentcatalog.Extent(family=btrees_family) >>> uidutil = zope.intid.IntIds() >>> cat = extentcatalog.Catalog(extent, uidutil) >>> cat["index"] = DummyIndex() >>> cat.UIDSource is uidutil True >>> cat._getUIDSource() is uidutil True The ResultSet instance returned by the catalog's `searchResults` method uses our UID utility. >>> obj = DummyContent(43, root) >>> uid = uidutil.register(obj) >>> cat.index_doc(uid, obj) >>> res = cat.searchResults(index=uid) >>> res.uidutil is uidutil True >>> list(res) == [obj] True `searchResults` may also return None. >>> cat.searchResults() is None True Calling `updateIndex` and `updateIndexes` when the catalog has its uid source set works as well. >>> cat.clear() >>> uid in cat.extent False All objects in the uid utility are indexed. >>> cat.updateIndexes() >>> uid in cat.extent True >>> len(cat.extent) 1 >>> obj2 = DummyContent(44, root) >>> uid2 = uidutil.register(obj2) >>> cat.updateIndexes() >>> len(cat.extent) 2 >>> uid2 in cat.extent True >>> uidutil.unregister(obj2) >>> cat.clear() >>> uid in cat.extent False >>> cat.updateIndex(cat["index"]) >>> uid in cat.extent True With a self-populating extent, calling `updateIndex` or `updateIndexes` means only the objects whose ids are in the extent are updated/reindexed; if present, the catalog will use its uid source to look up the objects by id. >>> extent = extentcatalog.NonPopulatingExtent(family=btrees_family) >>> cat = extentcatalog.Catalog(extent, uidutil) >>> cat["index"] = DummyIndex() >>> extent.add(uid, obj) >>> uid in cat["index"].uids False >>> cat.updateIndexes() >>> uid in cat["index"].uids True >>> cat.clear() >>> uid in cat["index"].uids False >>> uid in cat.extent False >>> cat.extent.add(uid, obj) >>> cat.updateIndex(cat["index"]) >>> uid in cat["index"].uids True Unregister the objects of the previous tests from intid utility: >>> intid = zope.component.getUtility( ... zope.intid.interfaces.IIntIds, context=root) >>> for doc_id in matches: ... intid.unregister(intid.queryObject(doc_id)) Catalog with a filter extent ---------------------------- As discussed at the beginning of this document, extents can not only help with index operations, but also act as a filter, so that a given catalog can answer questions about a subset of the objects contained in the intids. The filter extent only stores objects that match a given filter. >>> def filter(extent, uid, ob): ... assert interfaces.IFilterExtent.providedBy(extent) ... # This is an extent of objects with odd-numbered uids without a ... # True ignore attribute ... return uid % 2 and not getattr(ob, 'ignore', False) ... >>> extent = extentcatalog.FilterExtent(filter, family=btrees_family) >>> verify.verifyObject(interfaces.IFilterExtent, extent) True >>> root['catalog1'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit() Now we have a catalog set up with an index and an extent. If we create some content and ask the catalog to index it, only the ones that match the filter will be in the extent and in the index. >>> matches = [] >>> fails = [] >>> i = 0 >>> while True: ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... if filter(extent, doc_id, c): ... matches.append(doc_id) ... else: ... fails.append(doc_id) ... i += 1 ... if i > 99 and len(matches) > 4: ... break ... >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True If a content object is indexed that used to match the filter but no longer does, it should be removed from the extent and indexes. >>> matches[0] in catalog.extent True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> filter(extent, matches[0], obj) False >>> catalog.index_doc(matches[0], obj) >>> doc_id = matches.pop(0) >>> doc_id in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True Unindexing an object that is not in the catalog should be a no-op. >>> fails[0] in catalog.extent False >>> catalog.unindex_doc(fails[0]) >>> fails[0] in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True Updating all indexes and an individual index both also update the extent. >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0]) If you update a single index and an object is no longer a member of the extent, it is removed from all indexes. >>> matches[0] in catalog.extent True >>> matches[0] in index.uids True >>> matches[0] in index2.uids True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> catalog.updateIndex(index2) >>> matches[0] in catalog.extent False >>> matches[0] in index.uids False >>> matches[0] in index2.uids False >>> doc_id = matches.pop(0) >>> (matches == sorted(catalog.extent) == sorted(index.uids) ... == sorted(index2.uids)) True Self-populating extents ----------------------- An extent may know how to populate itself; this is especially useful if the catalog can be initialized with fewer items than those available in the IIntIds utility that are also within the nearest Zope 3 site (the policy coded in the basic Zope 3 catalog). Such an extent must implement the `ISelfPopulatingExtent` interface, which requires two attributes. Let's use the `FilterExtent` class as a base for implementing such an extent, with a method that selects content item 0 (created and registered above):: >>> class PopulatingExtent( ... extentcatalog.FilterExtent, ... extentcatalog.NonPopulatingExtent): ... ... def populate(self): ... if self.populated: ... return ... self.add(intid.getId(root[0]), root[0]) ... super(PopulatingExtent, self).populate() Creating a catalog based on this extent ignores objects in the database already:: >>> def accept_any(extent, uid, ob): ... return True >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> catalog = extentcatalog.Catalog(extent) >>> index = DummyIndex() >>> catalog['index'] = index >>> root['catalog2'] = catalog >>> transaction.commit() At this point, our extent remains unpopulated:: >>> extent.populated False Iterating over the extent does not cause it to be automatically populated:: >>> list(extent) [] Causing our new index to be filled will cause the `populate()` method to be called, setting the `populate` flag as a side-effect:: >>> catalog.updateIndex(index) >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True The index has been updated with the documents identified by the extent:: >>> list(index.uids) == [intid.getId(root[0])] True Updating the same index repeatedly will continue to use the extent as the source of documents to include:: >>> catalog.updateIndex(index) >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True The `updateIndexes()` method has a similar behavior. If we add an additional index to the catalog, we see that it indexes only those objects from the extent:: >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> catalog.updateIndexes() >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True When we have fresh catalog and extent (not yet populated), we see that `updateIndexes()` will cause the extent to be populated:: >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> root['catalog3'] = catalog = extentcatalog.Catalog(extent) >>> index1 = DummyIndex() >>> index2 = DummyIndex() >>> catalog['index1'] = index1 >>> catalog['index2'] = index2 >>> transaction.commit() >>> extent.populated False >>> catalog.updateIndexes() >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True >>> list(index1.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True We'll make sure everything can be safely committed. >>> transaction.commit() >>> setSiteManager(None) zc.catalog-1.6/src/zc/catalog/extentcatalog.py0000664000177100020040000001500112165324663022571 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """extent catalog $Id: extentcatalog.py 3296 2005-09-09 19:29:20Z benji $ """ import sys import BTrees import persistent from zope import interface, component from zope.catalog import catalog from zope.intid.interfaces import IIntIds import zope.component from zope.component.interfaces import IFactory from BTrees.Interfaces import IMerge import zc.catalog from zc.catalog import interfaces class Extent(persistent.Persistent): interface.implements(interfaces.IExtent) __parent__ = None family = BTrees.family32 def __init__(self, family=None): if family is not None: self.family = family self.set = self.family.IF.TreeSet() # Deprecated. @property def BTreeAPI(self): return sys.modules[self.set.__class__.__module__] def __len__(self): return len(self.set) def add(self, uid, obj): self.set.insert(uid) def clear(self): self.set.clear() def __or__(self, other): "extent | set" return self.union(other) __ror__ = __or__ def union(self, other, self_weight=1, other_weight=1): return self.family.IF.weightedUnion( self.set, other, self_weight, other_weight)[1] def __and__(self, other): "extent & set" return self.intersection(other) __rand__ = __and__ def intersection(self, other, self_weight=1, other_weight=1): return self.family.IF.weightedIntersection( self.set, other, self_weight, other_weight)[1] def __sub__(self, other): "extent - set" return self.difference(other) def difference(self, other): return self.family.IF.difference(self.set, other) def __rsub__(self, other): "set - extent" return self.rdifference(other) def rdifference(self, other): return self.family.IF.difference(other, self.set) def __iter__(self): return iter(self.set) def __nonzero__(self): return bool(self.set) def __contains__(self, uid): return self.set.has_key(uid) def remove(self, uid): self.set.remove(uid) def discard(self, uid): try: self.set.remove(uid) except KeyError: pass class FilterExtent(Extent): interface.implements(interfaces.IFilterExtent) def __init__(self, filter, family=None): super(FilterExtent, self).__init__(family=family) self.filter = filter def add(self, uid, obj): if not self.addable(uid, obj): raise ValueError else: self.set.insert(uid) def addable(self, uid, obj): return self.filter(self, uid, obj) class NonPopulatingExtent(Extent): """Base class for populating extent. This simple, no-op implementation comes in handy surprisingly often for catalogs that handle a very contained domain within an application. """ interface.implements(interfaces.ISelfPopulatingExtent) populated = False def populate(self): self.populated = True class Catalog(catalog.Catalog): interface.implements(interfaces.IExtentCatalog) UIDSource = None def __init__(self, extent, UIDSource=None): """Construct a catalog based on an extent. Note that the `family` keyword parameter of the base class constructor is not supported here; the family of the extent is used. """ self.UIDSource = UIDSource if extent.__parent__ is not None: raise ValueError("extent's __parent__ must be None") super(Catalog, self).__init__(family=extent.family) self.extent = extent extent.__parent__ = self # inform extent of catalog def _getUIDSource(self): res = self.UIDSource if res is None: res = zope.component.getUtility(IIntIds) return res def clear(self): self.extent.clear() super(Catalog, self).clear() def index_doc(self, docid, texts): """Register the data in indexes of this catalog. """ try: self.extent.add(docid, texts) except ValueError: self.unindex_doc(docid) else: super(Catalog, self).index_doc(docid, texts) def unindex_doc(self, docid): if docid in self.extent: super(Catalog, self).unindex_doc(docid) self.extent.remove(docid) def searchResults(self, **kwargs): res = super(Catalog, self).searchResults(**kwargs) if res is not None: res.uidutil = self._getUIDSource() return res def updateIndex(self, index): if index.__parent__ is not self: # not an index in us. Let the superclass handle it. super(Catalog, self).updateIndex(index) else: uidutil = self._getUIDSource() if interfaces.ISelfPopulatingExtent.providedBy(self.extent): if not self.extent.populated: self.extent.populate() assert self.extent.populated for uid in self.extent: obj = uidutil.getObject(uid) index.index_doc(uid, obj) else: for uid in uidutil: obj = uidutil.getObject(uid) try: self.extent.add(uid, obj) except ValueError: self.unindex_doc(uid) else: index.index_doc(uid, obj) def updateIndexes(self): uidutil = self._getUIDSource() if interfaces.ISelfPopulatingExtent.providedBy(self.extent): if not self.extent.populated: self.extent.populate() assert self.extent.populated for uid in self.extent: self.index_doc(uid, uidutil.getObject(uid)) else: for uid in uidutil: self.index_doc(uid, uidutil.getObject(uid)) zc.catalog-1.6/src/zc/catalog/interfaces.py0000664000177100020040000002703412165324663022063 0ustar menesismenesis00000000000000############################################################################# # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """interfaces for zc.catalog $Id: interfaces.py 2918 2005-07-19 22:12:38Z jim $ """ from zope import interface, schema from zope.schema.vocabulary import SimpleVocabulary, SimpleTerm import zope.index.interfaces import zope.catalog.interfaces from zc.catalog.i18n import _ import BTrees.Interfaces class IExtent(interface.Interface): """An extent represents the full set of objects indexed by a catalog. It is useful for a variety of index operations and catalog queries. """ __parent__ = interface.Attribute( """The catalog for which this is an extent; must be None before it is set to a catalog""") def add(uid, obj): """add uid to extent; raise ValueError if it is not addable. If uid is already a member of the extent, calling add is a no-op, except that if the uid and obj are no longer addable to the extent then ValueError is still raised (but without removing the uid)""" def remove(uid): """Remove uid from set. Raise KeyError if not a member""" def discard(uid): """Remove uid from set. Ignore if not a member""" def clear(): """Remove all uids from set.""" def __len__(): """the number of items in the extent.""" def __iter__(): """return iterator of uids in set""" def __or__(other): "Given BTrees.IFBTree data structure, return weighted union" def __ror__(other): "Given BTrees.IFBTree data structure, return weighted union" def union(other, self_weight, other_weight): "Given BTrees.IFBTree data structure, return weighted union" def __and__(other): "Given BTrees.IFBTree data structure, return weighted intersection" def __rand__(other): "Given BTrees.IFBTree data structure, return weighted intersection" def intersection(other, self_weight, other_weight): "Given BTrees.IFBTree data structure, return weighted intersection" def __sub__(other): "extent - set: given BTrees.IFBTree data structure, return difference" def difference(other): "extent - set: given BTrees.IFBTree data structure, return difference" def __rsub__(other): "set - extent: given BTrees.IFBTree data structure, return difference" def rdifference(other): "set - extent: given BTrees.IFBTree data structure, return difference" def __nonzero__(): "return boolean indicating if any uids are in set" def __contains__(uid): "return boolean indicating if uid is in set" class IFilterExtent(IExtent): filter = interface.Attribute( """A (persistent) callable that is passed the extent, a docid, and the associated obj and should return a boolean True (is member of extent) or False (is not member of extent).""") def addable(uid, obj): """returns True or False, indicating whether the obj may be added to the extent""" class ISelfPopulatingExtent(IExtent): """An extent that knows how to create it's own initial population.""" populated = schema.Bool( title=_("Populated"), description=_( "Flag indicating whether self-population has been performed."), readonly=True, ) def populate(): """Populate the extent based on the current content of the database. After a successful call, `populated` will be True. Unsuccessful calls must raise exceptions. If `populated` is true when called, this is a no-op. After the initial population, updates should be maintained via other mechanisms. """ class IExtentCatalog(interface.Interface): """A catalog of only items within an extent. Interface intended to be used with zope.catalog.interfaces.ICatalog""" extent = interface.Attribute( """An IExtent of the objects cataloged""") class IIndexValues(interface.Interface): """An index that allows introspection of the indexed values""" def minValue(min=None): """return the minimum value in the index. if min is provided, return the minimum value equal to or greater than min. Raises ValueError if no min. """ def maxValue(max=None): """return the maximum value in the index. If max is provided, return the maximum value equal to or less than max. Raises ValueError if no max. """ def values(min=None, max=None, excludemin=False, excludemax=False, doc_id=None): """return an iterables of the values in the index. if doc_id is provided, returns the values only for that document id. If a min is specified, then output is constrained to values greater than or equal to the given min, and, if excludemin is specified and true, is further constrained to values strictly greater than min. A min value of None is ignored. If min is None or not specified, and excludemin is true, the smallest value is excluded. If a max is specified, then output is constrained to values less than or equal to the given max, and, if excludemax is specified and true, is further constrained to values strictly less than max. A max value of None is ignored. If max is None or not specified, and excludemax is true, the largest value is excluded. """ def containsValue(value): """whether the value is used in any of the documents in the index""" def ids(): """return a BTrees.IFBTree data structure of the document ids in the index--the ones that have values to be indexed. All document ids should produce at least one value given a call of IIndexValues.values(doc_id=id). """ class ISetIndex(interface.Interface): def apply(query): """Return None or an IFBTree Set of the doc ids that match the query. query is a dict with one of the following keys: any_of, any, all_of, between, and none. Any one of the keys may be used; using more than one is not allowed. The any_of key should have a value of an iterable of values: the result will be the docids whose values contain any of the given values. The all_of key should have a value of an iterable of values: the result will be the docids whose values contain all of the given values. The between key should have a value of an iterable of one to four members. The first is the minimum value, or None; the second is the maximum value, or None; the third is boolean, defaulting to False, declaring if the min should be excluded; and the last is also boolean, defaulting to False, declaring if the max should be excluded. The any key should take None or an extent. If the key is None, the results will be all docids with any value. If the key is an extent, the results will be the intersection of the extent and all docids with any value. The none key should take an extent. It returns the docids in the extent that do not have any values in the index. """ class IValueIndex(interface.Interface): def apply(query): """Return None or an IFBTree Set of the doc ids that match the query. query is a dict with one of the following keys: any_of, any, between, and none. Any one of the keys may be used; using more than one is not allowed. The any_of key should have a value of an iterable of values: the result will be the docids whose values contain any of the given values. The between key should have a value of an iterable of one to four members. The first is the minimum value, or None; the second is the maximum value, or None; the third is boolean, defaulting to False, declaring if the min should be excluded; and the last is also boolean, defaulting to False, declaring if the max should be excluded. The any key should take None or an extent. If the key is None, the results will be all docids with any value. If the key is an extent, the results will be the intersection of the extent and all docids with any value. The none key should take an extent. It returns the docids in the extent that do not have any values in the index. """ class ICatalogValueIndex(zope.catalog.interfaces.IAttributeIndex, zope.catalog.interfaces.ICatalogIndex): """Interface-based catalog value index """ class ICatalogSetIndex(zope.catalog.interfaces.IAttributeIndex, zope.catalog.interfaces.ICatalogIndex): """Interface-based catalog set index """ class INormalizationWrapper(zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, IIndexValues): """A wrapper for an index that uses a normalizer to normalize injection and querying.""" index = interface.Attribute( """an index implementing IInjection, IIndexSearch, IStatistics, and IIndexValues""") normalizer = interface.Attribute("a normalizer, implementing INormalizer") collection_index = interface.Attribute( """boolean: whether indexed values should be treated as collections (each composite value normalized) or not (original value is normalized)""") class INormalizer(interface.Interface): def value(value): """normalize or check constraints for an input value; raise an error or return the value to be indexed.""" def any(value, index): """normalize a query value for a "any_of" search; return a sequence of values.""" def all(value, index): """Normalize a query value for an "all_of" search; return the value for query""" def minimum(value, index, exclude=False): """normalize a query value for minimum of a range; return the value for query""" def maximum(value, index, exclude=False): """normalize a query value for maximum of a range; return the value for query""" resolution_vocabulary = SimpleVocabulary([SimpleTerm(i, t, t) for i, t in enumerate( (_('day'), _('hour'), _('minute'), _('second'), _('microsecond')))]) # 0 1 2 3 4 class IDateTimeNormalizer(INormalizer): resolution = schema.Choice( vocabulary=resolution_vocabulary, title=_('Resolution'), default=2, required=True) class ICallableWrapper(zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, IIndexValues): """A wrapper for an index that uses a callable to convert injection.""" index = interface.Attribute( """An index implementing IInjection, IIndexSearch, IStatistics, and IIndexValues""") converter = interface.Attribute("A callable converter") zc.catalog-1.6/src/zc/catalog/normalizedindex.txt0000664000177100020040000003471312165324663023325 0ustar menesismenesis00000000000000================ Normalized Index ================ The index module provides a normalizing wrapper, a DateTime normalizer, and a set index and a value index normalized with the DateTime normalizer. The normalizing wrapper implements a full complement of index interfaces-- zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, and zc.catalog.interfaces.IIndexValues-- and delegates all of the behavior to the wrapped index, normalizing values using the normalizer before the index sees them. The normalizing wrapper currently only supports queries offered by zc.catalog.interfaces.ISetIndex and zc.catalog.interfaces.IValueIndex. The normalizer interface requires the following methods, as defined in the interface: def value(value): """normalize or check constraints for an input value; raise an error or return the value to be indexed.""" def any(value, index): """normalize a query value for a "any_of" search; return a sequence of values.""" def all(value, index): """Normalize a query value for an "all_of" search; return the value for query""" def minimum(value, index): """normalize a query value for minimum of a range; return the value for query""" def maximum(value, index): """normalize a query value for maximum of a range; return the value for query""" The DateTime normalizer performs the following normalizations and validations. Whenever a timezone is needed, it tries to get a request from the current interaction and adapt it to zope.interface.common.idatetime.ITZInfo; failing that (no request or no adapter) it uses the system local timezone. - input values must be datetimes with a timezone. They are normalized to the resolution specified when the normalizer is created: a resolution of 0 normalizes values to days; a resolution of 1 to hours; 2 to minutes; 3 to seconds; and 4 to microseconds. - 'any' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to any value from the start to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. - 'all' values may be timezone-aware datetimes or timezone-naive datetimes. timezone-naive datetimes get the found timezone. - 'minimum' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the start of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. - 'maximum' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. Let's look at the DateTime normalizer first, and then an integration of it with the normalizing wrapper and the value and set indexes. The indexed values are parsed with 'value'. >>> from zc.catalog.index import DateTimeNormalizer >>> n = DateTimeNormalizer() # defaults to minutes >>> import datetime >>> import pytz >>> naive_datetime = datetime.datetime(2005, 7, 15, 11, 21, 32, 104) >>> date = naive_datetime.date() >>> aware_datetime = naive_datetime.replace( ... tzinfo=pytz.timezone('US/Eastern')) >>> n.value(naive_datetime) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(date) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, tzinfo=) If we specify a different resolution, the results are different. >>> another = DateTimeNormalizer(1) # hours >>> another.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 0, tzinfo=) Note that changing the resolution of an indexed value may create surprising results, because queries do not change their resolution. Therefore, if you index something with a datetime with a finer resolution that the normalizer's, then searching for that datetime will not find the doc_id. Values in an 'any_of' query are parsed with 'any'. 'any' should return a sequence of values. It requires an index, which we will mock up here. >>> class DummyIndex(object): ... def values(self, start, stop, exclude_start, exclude_stop): ... assert not exclude_start and exclude_stop ... six_hours = datetime.timedelta(hours=6) ... res = [] ... dt = start ... while dt < stop: ... res.append(dt) ... dt += six_hours ... return res ... >>> index = DummyIndex() >>> tuple(n.any(naive_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>),) >>> tuple(n.any(aware_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>),) >>> tuple(n.any(date, index)) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS (datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 6, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 12, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 18, 0, tzinfo=<...Local...>)) Values in an 'all_of' query are parsed with 'all'. >>> n.all(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.all(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.all(date, index) # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError: ... Minimum values in a 'between' query as well as those in other methods are parsed with 'minimum'. They also take an optional exclude boolean, which indicates whether the minimum is to be excluded. For datetimes, it only makes a difference if you pass in a date. >>> n.minimum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) >>> n.minimum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) Maximum values in a 'between' query as well as those in other methods are parsed with 'maximum'. They also take an optional exclude boolean, which indicates whether the maximum is to be excluded. For datetimes, it only makes a difference if you pass in a date. >>> n.maximum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) >>> n.maximum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) Now let's examine these normalizers in the context of a real index. >>> from zc.catalog.index import DateTimeValueIndex, DateTimeSetIndex >>> setindex = DateTimeSetIndex() # minutes resolution >>> data = [] # generate some data >>> def date_gen( ... start=aware_datetime, ... count=12, ... period=datetime.timedelta(hours=10)): ... dt = start ... ix = 0 ... while ix < count: ... yield dt ... dt += period ... ix += 1 ... >>> gen = date_gen() >>> count = 0 >>> while True: ... try: ... next = [gen.next() for i in range(6)] ... except StopIteration: ... break ... data.append((count, next[0:1])) ... count += 1 ... data.append((count, next[1:3])) ... count += 1 ... data.append((count, next[3:6])) ... count += 1 ... >>> print data # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [(0, [datetime.datetime(2005, 7, 15, 11, 21, 32, 104, ...<...Eastern...>)]), (1, [datetime.datetime(2005, 7, 15, 21, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 16, 7, 21, 32, 104, ...<...Eastern...>)]), (2, [datetime.datetime(2005, 7, 16, 17, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 3, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, 32, 104, ...<...Eastern...>)]), (3, [datetime.datetime(2005, 7, 17, 23, 21, 32, 104, ...<...Eastern...>)]), (4, [datetime.datetime(2005, 7, 18, 9, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 18, 19, 21, 32, 104, ...<...Eastern...>)]), (5, [datetime.datetime(2005, 7, 19, 5, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 19, 15, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 20, 1, 21, 32, 104, ...<...Eastern...>)])] >>> data_dict = dict(data) >>> for doc_id, value in data: ... setindex.index_doc(doc_id, value) ... >>> list(setindex.ids()) [0, 1, 2, 3, 4, 5] >>> set(setindex.values()) == set( ... setindex.normalizer.value(v) for v in date_gen()) True For the searches, we will actually use a request and interaction, with an adapter that returns the Eastern timezone. This makes the examples less dependent on the machine that they use. >>> import zope.security.management >>> import zope.publisher.browser >>> import zope.interface.common.idatetime >>> import zope.publisher.interfaces >>> request = zope.publisher.browser.TestRequest() >>> zope.security.management.newInteraction(request) >>> from zope import interface, component >>> @interface.implementer(zope.interface.common.idatetime.ITZInfo) ... @component.adapter(zope.publisher.interfaces.IRequest) ... def tzinfo(req): ... return pytz.timezone('US/Eastern') ... >>> component.provideAdapter(tzinfo) >>> n.all(naive_datetime, index).tzinfo is pytz.timezone('US/Eastern') True >>> set(setindex.apply({'any_of': (datetime.date(2005, 7, 17), ... datetime.date(2005, 7, 20), ... datetime.date(2005, 12, 31))})) == set( ... (2, 3, 5)) True Note that this search is using the normalized values. >>> set(setindex.apply({'all_of': ( ... datetime.datetime( ... 2005, 7, 16, 7, 21, tzinfo=pytz.timezone('US/Eastern')), ... datetime.datetime( ... 2005, 7, 15, 21, 21, tzinfo=pytz.timezone('US/Eastern')),)}) ... ) == set((1,)) True >>> list(setindex.apply({'any': None})) [0, 1, 2, 3, 4, 5] >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1))}) ... ) == set((0, 1, 2, 3, 4, 5)) True >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1), ... True, True)}) ... ) == set((0, 1, 2, 3, 4, 5)) True 'between' searches should deal with dates well. >>> set(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == set((1, 2, 3)) True >>> len(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == len(setindex.apply({'between': ( ... datetime.date(2005, 7, 15), datetime.date(2005, 7, 18), ... True, True)}) ... ) True Removing docs works as usual. >>> setindex.unindex_doc(1) >>> list(setindex.ids()) [0, 2, 3, 4, 5] Value, Minvalue and Maxvalue can take timezone-less datetimes and dates. >>> setindex.minValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, ...<...Eastern...>) >>> setindex.minValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>) >>> setindex.maxValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 20, 1, 21, ...<...Eastern...>) >>> setindex.maxValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>) >>> list(setindex.values( ... datetime.date(2005, 7, 17), datetime.date(2005, 7, 17))) ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)] >>> zope.security.management.endInteraction() # TODO put in tests tearDown Sorting ------- The normalization wrapper provides the zope.index.interfaces.IIndexSort interface if its upstream index provides it. For example, the DateTimeValueIndex will provide IIndexSort, because ValueIndex provides sorting. It will also delegate the ``sort`` method to the value index. >>> from zc.catalog.index import DateTimeValueIndex >>> from zope.index.interfaces import IIndexSort >>> ix = DateTimeValueIndex() >>> IIndexSort.providedBy(ix.index) True >>> IIndexSort.providedBy(ix) True >>> ix.sort.im_self is ix.index True But it won't work for indexes that doesn't do sorting, for example DateTimeSetIndex. >>> ix = DateTimeSetIndex() >>> IIndexSort.providedBy(ix.index) False >>> IIndexSort.providedBy(ix) False >>> ix.sort Traceback (most recent call last): ... AttributeError: 'SetIndex' object has no attribute 'sort' zc.catalog-1.6/src/zc/catalog/globber.txt0000664000177100020040000000116212165324663021535 0ustar menesismenesis00000000000000======= Globber ======= The globber takes a query and makes any term that isn't already a glob into something that ends in a star. It was originally envisioned as a *very* low- rent stemming hack. The author now questions its value, and hopes that the new stemming pipeline option can be used instead. Nonetheless, here is an example of it at work. >>> from zope.index.text import textindex >>> index = textindex.TextIndex() >>> lex = index.lexicon >>> from zc.catalog import globber >>> globber.glob('foo bar and baz or (b?ng not boo)', lex) '(((foo* and bar*) and baz*) or (b?ng and not boo*))' zc.catalog-1.6/src/zc/catalog/valueindex.txt0000664000177100020040000001616412165324663022275 0ustar menesismenesis00000000000000=========== Value Index =========== The valueindex is an index similar to, but more flexible than a standard Zope field index. The index allows searches for documents that contain any of a set of values; between a set of values; any (non-None) values; and any empty values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import ValueIndex >>> index = ValueIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. >>> data = {1: 'a', ... 2: 'b', ... 3: 'a', ... 4: 'c', ... 5: 'd', ... 6: 'c', ... 7: 'c', ... 8: 'b', ... 9: 'c', ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) ['a', 'b', 'c', 'd'] >>> index.documentCount() 9 >>> index.wordCount() 4 >>> index.maxValue() 'd' >>> index.minValue() 'a' >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports four types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are not weighted. >>> list(index.apply({'any_of':('b', 'c')})) [2, 4, 6, 7, 8, 9] >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> list(index.apply({'any_of': ('d',)})) [5] >>> list(index.apply({'any_of':(42,)})) [] Another query is 'any', If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are not weighted. >>> list(index.apply({'between': ('b', 'd')})) [2, 4, 5, 6, 7, 8, 9] >>> list(index.apply({'between': ('c', None)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('c',)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('b', 'd', True, True)})) [4, 6, 7, 9] The 'none' argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values. >>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14] Trying to use more than one of these at a time generates an error. >>> index.apply({'between': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... Using none of them simply returns None. >>> index.apply({}) # returns None Invalid query names cause ValueErrors. >>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... When you unindex a document, the searches and statistics should be updated. >>> index.unindex_doc(5) >>> len(index.apply({'any_of': ('d',)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.values()) ['a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 6, 7, 8, 9] Reindexing a document that has a changed value also is reflected in subsequent searches and statistic checks. >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> data[8] = 'e' >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 4 >>> list(index.apply({'any_of': ('e',)})) [8] >>> list(index.apply({'any_of': ('b',)})) [2] >>> data[2] = 'e' >>> index.index_doc(2, data[2]) >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.apply({'any_of': ('e',)})) [2, 8] >>> list(index.apply({'any_of': ('b',)})) [] Reindexing a document for which the value is now None causes it to be removed from the statistics. >>> data[3] = None >>> index.index_doc(3, data[3]) >>> index.documentCount() 7 >>> index.wordCount() 3 >>> list(index.ids()) [1, 2, 4, 6, 7, 8, 9] This affects both ways of determining the ids that are and are not in the index (that do and do not have values). >>> list(index.apply({'any': None})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'any': extent})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 3, 5, 10, 11, 12, 13, 14] The values method can be used to examine the indexed values for a given document id. For a valueindex, the "values" for a given doc_id will always have a length of 0 or 1. >>> index.values(doc_id=8) ('e',) And the containsValue method provides a way of determining membership in the values. >>> index.containsValue('a') True >>> index.containsValue('q') False Sorting ------- Value indexes supports sorting, just like zope.index.field.FieldIndex. >>> index.clear() >>> index.index_doc(1, 9) >>> index.index_doc(2, 8) >>> index.index_doc(3, 7) >>> index.index_doc(4, 6) >>> index.index_doc(5, 5) >>> index.index_doc(6, 4) >>> index.index_doc(7, 3) >>> index.index_doc(8, 2) >>> index.index_doc(9, 1) >>> list(index.sort([4, 2, 9, 7, 3, 1, 5])) [9, 7, 5, 4, 3, 2, 1] We can also specify the ``reverse`` argument to reverse results: >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True)) [1, 2, 3, 4, 5, 7, 9] And as per IIndexSort, we can limit results by specifying the ``limit`` argument: >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3)) [9, 7, 5] If we pass an id that is not indexed by this index, it won't be included in the result. >>> list(index.sort([2, 10])) [2] zc.catalog-1.6/src/zc/catalog/stemmer.txt0000664000177100020040000000416412165324663021602 0ustar menesismenesis00000000000000======= Stemmer ======= The stemmer uses Andreas Jung's stemmer code, which is a Python wrapper of M. F. Porter's Snowball project (http://snowball.tartarus.org/index.php). It is designed to be used as part of a pipeline in a zope/index/text/ lexicon, after a splitter. This enables getting the relevance ranking of the zope/index/text code with the splitting functionality of TextIndexNG 3.x. It requires that the TextIndexNG extensions--specifically txngstemmer--have been compiled and installed in your Python installation. Inclusion of the textindexng package is not necessary. As of this writing (Jan 3, 2007), installing the necessary extensions can be done with the following steps: - `svn co https://svn.sourceforge.net/svnroot/textindexng/extension_modules/trunk ext_mod` - `cd ext_mod` - (using the python you use for Zope) `python setup.py install` Another approach is to simply install TextIndexNG (see http://opensource.zopyx.com/software/textindexng3) The stemmer must be instantiated with the language for which stemming is desired. It defaults to 'english'. For what it is worth, other languages supported as of this writing, using the strings that the stemmer expects, include the following: 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'italian', 'norwegian', 'portuguese', 'russian', 'spanish', and 'swedish'. For instance, let's build an index with an english stemmer. >>> from zope.index.text import textindex, lexicon >>> import zc.catalog.stemmer >>> lex = lexicon.Lexicon( ... lexicon.Splitter(), lexicon.CaseNormalizer(), ... lexicon.StopWordRemover(), zc.catalog.stemmer.Stemmer('english')) >>> ix = textindex.TextIndex(lex) >>> data = [ ... (0, 'consigned consistency consoles the constables'), ... (1, 'knaves kneeled and knocked knees, knowing no knights')] >>> for doc_id, text in data: ... ix.index_doc(doc_id, text) ... >>> list(ix.apply('consoling a constable')) [0] >>> list(ix.apply('knightly kneel')) [1] Note that query terms with globbing characters are not stemmed. >>> list(ix.apply('constables*')) [] zc.catalog-1.6/src/zc/__init__.py0000664000177100020040000000031012165324663020051 0ustar menesismenesis00000000000000# this is a namespace package try: import pkg_resources pkg_resources.declare_namespace(__name__) except ImportError: import pkgutil __path__ = pkgutil.extend_path(__path__, __name__) zc.catalog-1.6/buildout.cfg0000664000177100020040000000045012165324663017052 0ustar menesismenesis00000000000000[buildout] parts = test test_no_browser develop = . [test] recipe = zc.recipe.testrunner eggs = zc.catalog [test, browser, test_browser] defaults = "--exit-with-status".split() [test_no_browser] recipe = zc.recipe.testrunner eggs = zc.catalog [test] defaults = "--exit-with-status".split() zc.catalog-1.6/COPYRIGHT.txt0000664000177100020040000000004012165324663016646 0ustar menesismenesis00000000000000Zope Foundation and Contributorszc.catalog-1.6/setup.py0000664000177100020040000000676712165324663016275 0ustar menesismenesis00000000000000############################################################################## # # Copyright (c) 2007 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Setup for zc.catalog package $Id: setup.py 81038 2007-10-24 14:34:17Z srichter $ """ import os from setuptools import setup, find_packages def read(*rnames): return open(os.path.join(os.path.dirname('.'), *rnames)).read() setup(name='zc.catalog', version='1.6', author='Zope Corporation and Contributors', author_email='zope-dev@zope.org', description="Extensions to the Zope 3 Catalog", long_description=( read('README.txt') + '\n\n.. contents::\n\n' + read('CHANGES.txt') + '\n\n' + read('src', 'zc', 'catalog', 'valueindex.txt') + '\n\n' + read('src', 'zc', 'catalog', 'setindex.txt') + '\n\n' + read('src', 'zc', 'catalog', 'normalizedindex.txt') + '\n\n' + read('src', 'zc', 'catalog', 'extentcatalog.txt') + '\n\n' + read('src', 'zc', 'catalog', 'stemmer.txt') + '\n\n' + read('src', 'zc', 'catalog', 'legacy.txt') + '\n\n' + read('src', 'zc', 'catalog', 'globber.txt') + '\n\n' + read('src', 'zc', 'catalog', 'callablewrapper.txt') + '\n\n' + read('src', 'zc', 'catalog', 'browser', 'README.txt') ), keywords = "zope3 i18n date time duration catalog index", classifiers = [ 'Development Status :: 5 - Production/Stable', 'Environment :: Web Environment', 'Intended Audience :: Developers', 'License :: OSI Approved :: Zope Public License', 'Programming Language :: Python', 'Natural Language :: English', 'Operating System :: OS Independent', 'Topic :: Internet :: WWW/HTTP', 'Framework :: Zope3'], url='http://pypi.python.org/pypi/zc.catalog', license='ZPL 2.1', packages=find_packages('src'), package_dir = {'': 'src'}, namespace_packages=['zc'], extras_require=dict( test=[ 'zope.keyreference', 'zope.testing', ], browser=[ 'zope.app.form', 'zope.browsermenu', ], test_browser=[ 'zope.login', 'zope.password', 'zope.securitypolicy', 'zope.testbrowser', 'zope.app.appsetup', 'zope.app.catalog', 'zope.app.testing', 'zope.app.zcmlfiles', ]), install_requires=[ 'ZODB3', 'pytz', 'setuptools', 'zope.catalog', 'zope.component', 'zope.container', 'zope.i18nmessageid', 'zope.index>=3.5.1', 'zope.interface', 'zope.intid', 'zope.publisher >= 3.12', 'zope.schema', 'zope.security', ], include_package_data = True, zip_safe = False, ) zc.catalog-1.6/PKG-INFO0000664000177100020040000024016412165325611015641 0ustar menesismenesis00000000000000Metadata-Version: 1.1 Name: zc.catalog Version: 1.6 Summary: Extensions to the Zope 3 Catalog Home-page: http://pypi.python.org/pypi/zc.catalog Author: Zope Corporation and Contributors Author-email: zope-dev@zope.org License: ZPL 2.1 Description: zc.catalog is an extension to the Zope 3 catalog, Zope 3's indexing and search facility. zc.catalog contains a number of extensions to the Zope 3 catalog, such as some new indexes, improved globbing and stemming support, and an alternative catalog implementation. .. contents:: ======= CHANGES ======= 1.6 (2013-07-04) ---------------- - Using Python's ``doctest`` module instead of deprecated ``zope.testing.doctest``. - Move ``zope.intid`` to dependencies. 1.5.1 (2012-01-20) ------------------ - Fix the extent catalog's `searchResults` method to work when using a local uid source. - Replaced a testing dependency on ``zope.app.authentication`` with ``zope.password``. - Removed ``zope.app.server`` test dependency. 1.5 (2010-10-19) ---------------- - The package's ``configure.zcml`` does not include the browser subpackage's ``configure.zcml`` anymore. This, together with ``browser`` and ``test_browser`` ``extras_require``, decouples the browser view registrations from the main code. As a result projects that do not need the ZMI views to be registered are not pulling in the zope.app.* dependencies anymore. To enable the ZMI views for your project, you will have to do two things: * list ``zc.catalog [browser]`` as a ``install_requires``. * have your project's ``configure.zcml`` include the ``zc.catalog.browser`` subpackage. - Only include the browser tests whenever the dependencies for the browser tests are available. - Python2.7 test fix. 1.4.5 (2010-10-05) ------------------ - Remove implicit test dependency on zope.app.dublincore, that was not needed in the first place. 1.4.4 (2010-07-06) ------------------ * Fixed test-failure happening with more recent ``mechanize`` (>=2.0). 1.4.3 (2010-03-09) ------------------ * Try to import the stemmer from the zopyx.txng3.ext package first, which as of 3.3.2 contains stability and memory leak fixes. 1.4.2 (2010-01-20) ------------------ * Fix missing testing dependencies when using ZTK by adding zope.login. 1.4.1 (2009-02-27) ------------------ * Add FieldIndex-like sorting support for the ValueIndex. * Add sorting indexes support for the NormalizationWrapper. 1.4.0 (2009-02-07) ------------------ Bugs fixed ~~~~~~~~~~ * Fixed a typo in ValueIndex addform and addMenuItem * Use ``zope.container`` instead of ``zope.app.container``. * Use ``zope.keyreference`` instead of ``zope.app.keyreference``. * Use ``zope.intid`` instead of ``zope.app.intid``. * Use ``zope.catalog`` instead of ``zope.app.catalog``. 1.3.0 (2008-09-10) ------------------ Features added ~~~~~~~~~~~~~~ * Added hook point to allow extent catalog to be used with local UID sources. 1.2.0 (2007-11-03) ------------------ Features added ~~~~~~~~~~~~~~ * Updated package meta-data. * zc.catalog now can use 64-bit BTrees ("L") as provided by ZODB 3.8. * Albertas Agejavas (alga@pov.lt) included the new CallableWrapper, for when the typical Zope 3 index-by-adapter story (zope.app.catalog.attribute) is unnecessary trouble, and you just want to use a callable. See callablewrapper.txt. This can also be used for other indexes based on the zope.index interfaces. * Extents now have a __len__. The current implementation defers to the standard BTree len implementation, and shares its performance characteristics: it needs to wake up all of the buckets, but if all of the buckets are awake it is a fairly quick operation. * A simple ISelfPoulatingExtent was added to the extentcatalog module for which populating is a no-op. This is directly useful for catalogs that are used as implementation details of a component, in which objects are indexed explicitly by your own calls rather than by the usual subscribers. It is also potentially slightly useful as a base for other self-populating extents. 1.1.1 (2007-3-17) ----------------- Bugs fixed ~~~~~~~~~~ 'all_of' would return all results when one of the values had no results. Reported, with test and fix provided, by Nando Quintana. 1.1 (2007-01-06) ---------------- Features removed ~~~~~~~~~~~~~~~~ The queueing of events in the extent catalog has been entirely removed. Subtransactions caused significant problems to the code introduced in 1.0. Other solutions also have significant problems, and the win of this kind of queueing is qustionable. Here is a run down of the approaches rejected for getting the queueing to work: * _p_invalidate (used in 1.0). Not really designed for use within a transaction, and reverts to last savepoint, rather than the beginning of the transaction. Could monkeypatch savepoints to iterate over precommit transaction hooks but that just smells too bad. * _p_resolveConflict. Requires application software to exist in ZEO and even ZRS installations, which is counter to our software deployment goals. Also causes useless repeated writes of empty queue to database, but that's not the showstopper. * vague hand-wavy ideas for separate storages or transaction managers for the queue. Never panned out in discussion. 1.0 (2007-01-05) ---------------- Bugs fixed ~~~~~~~~~~ * adjusted extentcatalog tests to trigger (and discuss and test) the queueing behavior. * fixed problem with excessive conflict errors due to queueing code. * updated stemming to work with newest version of TextIndexNG's extensions. * omitted stemming test when TextIndexNG's extensions are unavailable, so tests pass without it. Since TextIndexNG's extensions are optional, this seems reasonable. * removed use of zapi in extentcatalog. 0.2 (2006-11-22) ---------------- Features added ~~~~~~~~~~~~~~ * First release on Cheeseshop. =========== Value Index =========== The valueindex is an index similar to, but more flexible than a standard Zope field index. The index allows searches for documents that contain any of a set of values; between a set of values; any (non-None) values; and any empty values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import ValueIndex >>> index = ValueIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. >>> data = {1: 'a', ... 2: 'b', ... 3: 'a', ... 4: 'c', ... 5: 'd', ... 6: 'c', ... 7: 'c', ... 8: 'b', ... 9: 'c', ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) ['a', 'b', 'c', 'd'] >>> index.documentCount() 9 >>> index.wordCount() 4 >>> index.maxValue() 'd' >>> index.minValue() 'a' >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports four types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are not weighted. >>> list(index.apply({'any_of':('b', 'c')})) [2, 4, 6, 7, 8, 9] >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> list(index.apply({'any_of': ('d',)})) [5] >>> list(index.apply({'any_of':(42,)})) [] Another query is 'any', If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are not weighted. >>> list(index.apply({'between': ('b', 'd')})) [2, 4, 5, 6, 7, 8, 9] >>> list(index.apply({'between': ('c', None)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('c',)})) [4, 5, 6, 7, 9] >>> list(index.apply({'between': ('b', 'd', True, True)})) [4, 6, 7, 9] The 'none' argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values. >>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14] Trying to use more than one of these at a time generates an error. >>> index.apply({'between': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... Using none of them simply returns None. >>> index.apply({}) # returns None Invalid query names cause ValueErrors. >>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... When you unindex a document, the searches and statistics should be updated. >>> index.unindex_doc(5) >>> len(index.apply({'any_of': ('d',)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.values()) ['a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 6, 7, 8, 9] Reindexing a document that has a changed value also is reflected in subsequent searches and statistic checks. >>> list(index.apply({'any_of': ('b',)})) [2, 8] >>> data[8] = 'e' >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 4 >>> list(index.apply({'any_of': ('e',)})) [8] >>> list(index.apply({'any_of': ('b',)})) [2] >>> data[2] = 'e' >>> index.index_doc(2, data[2]) >>> index.documentCount() 8 >>> index.wordCount() 3 >>> list(index.apply({'any_of': ('e',)})) [2, 8] >>> list(index.apply({'any_of': ('b',)})) [] Reindexing a document for which the value is now None causes it to be removed from the statistics. >>> data[3] = None >>> index.index_doc(3, data[3]) >>> index.documentCount() 7 >>> index.wordCount() 3 >>> list(index.ids()) [1, 2, 4, 6, 7, 8, 9] This affects both ways of determining the ids that are and are not in the index (that do and do not have values). >>> list(index.apply({'any': None})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'any': extent})) [1, 2, 4, 6, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 3, 5, 10, 11, 12, 13, 14] The values method can be used to examine the indexed values for a given document id. For a valueindex, the "values" for a given doc_id will always have a length of 0 or 1. >>> index.values(doc_id=8) ('e',) And the containsValue method provides a way of determining membership in the values. >>> index.containsValue('a') True >>> index.containsValue('q') False Sorting ------- Value indexes supports sorting, just like zope.index.field.FieldIndex. >>> index.clear() >>> index.index_doc(1, 9) >>> index.index_doc(2, 8) >>> index.index_doc(3, 7) >>> index.index_doc(4, 6) >>> index.index_doc(5, 5) >>> index.index_doc(6, 4) >>> index.index_doc(7, 3) >>> index.index_doc(8, 2) >>> index.index_doc(9, 1) >>> list(index.sort([4, 2, 9, 7, 3, 1, 5])) [9, 7, 5, 4, 3, 2, 1] We can also specify the ``reverse`` argument to reverse results: >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True)) [1, 2, 3, 4, 5, 7, 9] And as per IIndexSort, we can limit results by specifying the ``limit`` argument: >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3)) [9, 7, 5] If we pass an id that is not indexed by this index, it won't be included in the result. >>> list(index.sort([2, 10])) [2] ========= Set Index ========= The setindex is an index similar to, but more general than a traditional keyword index. The values indexed are expected to be iterables; the index allows searches for documents that contain any of a set of values; all of a set of values; or between a set of values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import SetIndex >>> index = SetIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. In our example, we hope that strings and integers will sort consistently; this may not be a reasonable hope. >>> data = {1: ['a', 1], ... 2: ['b', 'a', 3, 4, 7], ... 3: [1], ... 4: [1, 4, 'c'], ... 5: [7], ... 6: [5, 6, 7], ... 7: ['c'], ... 8: [1, 6], ... 9: ['a', 'c', 2, 3, 4, 6,], ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) [1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c'] >>> index.documentCount() 9 >>> index.wordCount() 10 >>> index.maxValue() 'c' >>> index.minValue() 1 >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports five types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are weighted. >>> list(index.apply({'any_of':('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of': ('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of':(42,)})) [] >>> index.apply({'any_of': ('a', 3, 7)}) # doctest: +ELLIPSIS BTrees...FBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)]) Another query is 'any'. If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'all_of' argument also takes an iterable of values, but returns an iterable of document ids that contains all of the values. The results are not weighted. >>> list(index.apply({'all_of': ('a',)})) [1, 2, 9] >>> list(index.apply({'all_of': (3, 4)})) [2, 9] These tests illustrate two related reported errors that have been fixed. >>> list(index.apply({'all_of': ('z', 3, 4)})) [] >>> list(index.apply({'all_of': (3, 4, 'z')})) [] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value should be excluded, and also defaults to False. The results are weighted. >>> list(index.apply({'between': (1, 7)})) [1, 2, 3, 4, 5, 6, 8, 9] >>> list(index.apply({'between': ('b', None)})) [2, 4, 7, 9] >>> list(index.apply({'between': ('b',)})) [2, 4, 7, 9] >>> list(index.apply({'between': (1, 7, True, True)})) [2, 4, 6, 8, 9] >>> index.apply({'between': (2, 6)}) # doctest: +ELLIPSIS BTrees...FBucket([(2, 2.0), (4, 1.0), (6, 2.0), (8, 1.0), (9, 4.0)]) The 'none' argument takes an extent and returns the ids in the extent that are not indexed; it is intended to be used to return docids that have no (or empty) values. >>> list(index.apply({'none': extent})) [0, 10, 11, 12, 13, 14] Trying to use more than one of these at a time generates an error. >>> index.apply({'all_of': (5,), 'any_of': (3,)}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... Using none of them simply returns None. >>> index.apply({}) # returns None Invalid query names cause ValueErrors. >>> index.apply({'foo':()}) ... # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... When you unindex a document, the searches and statistics should be updated. >>> index.unindex_doc(6) >>> len(index.apply({'any_of': (5,)})) 0 >>> index.documentCount() 8 >>> index.wordCount() 9 >>> list(index.values()) [1, 2, 3, 4, 6, 7, 'a', 'b', 'c'] >>> list(index.ids()) [1, 2, 3, 4, 5, 7, 8, 9] Reindexing a document that has new additional values also is reflected in subsequent searches and statistic checks. >>> data[8].extend([5, 'c']) >>> index.index_doc(8, data[8]) >>> index.documentCount() 8 >>> index.wordCount() 10 >>> list(index.apply({'any_of': (5,)})) [8] >>> list(index.apply({'any_of': ('c',)})) [4, 7, 8, 9] The same is true for reindexing a document with both additions and removals. >>> 2 in set(index.apply({'any_of': (7,)})) True >>> 2 in set(index.apply({'any_of': (2,)})) False >>> data[2].pop() 7 >>> data[2].append(2) >>> index.index_doc(2, data[2]) >>> 2 in set(index.apply({'any_of': (7,)})) False >>> 2 in set(index.apply({'any_of': (2,)})) True Reindexing a document that no longer has any values causes it to be removed from the statistics. >>> del data[2][:] >>> index.index_doc(2, data[2]) >>> index.documentCount() 7 >>> index.wordCount() 9 >>> list(index.ids()) [1, 3, 4, 5, 7, 8, 9] This affects both ways of determining the ids that are and are not in the index (that do and do not have values). >>> list(index.apply({'any': None})) [1, 3, 4, 5, 7, 8, 9] >>> list(index.apply({'none': extent})) [0, 2, 6, 10, 11, 12, 13, 14] The values method can be used to examine the indexed values for a given document id. >>> set(index.values(doc_id=8)) == set([1, 5, 6, 'c']) True And the containsValue method provides a way of determining membership in the values. >>> index.containsValue(5) True >>> index.containsValue(20) False ================ Normalized Index ================ The index module provides a normalizing wrapper, a DateTime normalizer, and a set index and a value index normalized with the DateTime normalizer. The normalizing wrapper implements a full complement of index interfaces-- zope.index.interfaces.IInjection, zope.index.interfaces.IIndexSearch, zope.index.interfaces.IStatistics, and zc.catalog.interfaces.IIndexValues-- and delegates all of the behavior to the wrapped index, normalizing values using the normalizer before the index sees them. The normalizing wrapper currently only supports queries offered by zc.catalog.interfaces.ISetIndex and zc.catalog.interfaces.IValueIndex. The normalizer interface requires the following methods, as defined in the interface: def value(value): """normalize or check constraints for an input value; raise an error or return the value to be indexed.""" def any(value, index): """normalize a query value for a "any_of" search; return a sequence of values.""" def all(value, index): """Normalize a query value for an "all_of" search; return the value for query""" def minimum(value, index): """normalize a query value for minimum of a range; return the value for query""" def maximum(value, index): """normalize a query value for maximum of a range; return the value for query""" The DateTime normalizer performs the following normalizations and validations. Whenever a timezone is needed, it tries to get a request from the current interaction and adapt it to zope.interface.common.idatetime.ITZInfo; failing that (no request or no adapter) it uses the system local timezone. - input values must be datetimes with a timezone. They are normalized to the resolution specified when the normalizer is created: a resolution of 0 normalizes values to days; a resolution of 1 to hours; 2 to minutes; 3 to seconds; and 4 to microseconds. - 'any' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to any value from the start to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. - 'all' values may be timezone-aware datetimes or timezone-naive datetimes. timezone-naive datetimes get the found timezone. - 'minimum' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the start of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. - 'maximum' values may be timezone-aware datetimes, timezone-naive datetimes, or dates. dates are converted to the end of the given date in the found timezone, as described above. timezone-naive datetimes get the found timezone. Let's look at the DateTime normalizer first, and then an integration of it with the normalizing wrapper and the value and set indexes. The indexed values are parsed with 'value'. >>> from zc.catalog.index import DateTimeNormalizer >>> n = DateTimeNormalizer() # defaults to minutes >>> import datetime >>> import pytz >>> naive_datetime = datetime.datetime(2005, 7, 15, 11, 21, 32, 104) >>> date = naive_datetime.date() >>> aware_datetime = naive_datetime.replace( ... tzinfo=pytz.timezone('US/Eastern')) >>> n.value(naive_datetime) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(date) Traceback (most recent call last): ... ValueError: This index only indexes timezone-aware datetimes. >>> n.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, tzinfo=) If we specify a different resolution, the results are different. >>> another = DateTimeNormalizer(1) # hours >>> another.value(aware_datetime) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 0, tzinfo=) Note that changing the resolution of an indexed value may create surprising results, because queries do not change their resolution. Therefore, if you index something with a datetime with a finer resolution that the normalizer's, then searching for that datetime will not find the doc_id. Values in an 'any_of' query are parsed with 'any'. 'any' should return a sequence of values. It requires an index, which we will mock up here. >>> class DummyIndex(object): ... def values(self, start, stop, exclude_start, exclude_stop): ... assert not exclude_start and exclude_stop ... six_hours = datetime.timedelta(hours=6) ... res = [] ... dt = start ... while dt < stop: ... res.append(dt) ... dt += six_hours ... return res ... >>> index = DummyIndex() >>> tuple(n.any(naive_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>),) >>> tuple(n.any(aware_datetime, index)) # doctest: +ELLIPSIS (datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>),) >>> tuple(n.any(date, index)) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS (datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 6, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 12, 0, tzinfo=<...Local...>), datetime.datetime(2005, 7, 15, 18, 0, tzinfo=<...Local...>)) Values in an 'all_of' query are parsed with 'all'. >>> n.all(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.all(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.all(date, index) # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError: ... Minimum values in a 'between' query as well as those in other methods are parsed with 'minimum'. They also take an optional exclude boolean, which indicates whether the minimum is to be excluded. For datetimes, it only makes a difference if you pass in a date. >>> n.minimum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.minimum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.minimum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) >>> n.minimum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) Maximum values in a 'between' query as well as those in other methods are parsed with 'maximum'. They also take an optional exclude boolean, which indicates whether the maximum is to be excluded. For datetimes, it only makes a difference if you pass in a date. >>> n.maximum(naive_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(naive_datetime, index, exclude=True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Local...>) >>> n.maximum(aware_datetime, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(aware_datetime, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, 32, 104, tzinfo=<...Eastern...>) >>> n.maximum(date, index) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 23, 59, 59, 999999, tzinfo=<...Local...>) >>> n.maximum(date, index, True) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 0, 0, tzinfo=<...Local...>) Now let's examine these normalizers in the context of a real index. >>> from zc.catalog.index import DateTimeValueIndex, DateTimeSetIndex >>> setindex = DateTimeSetIndex() # minutes resolution >>> data = [] # generate some data >>> def date_gen( ... start=aware_datetime, ... count=12, ... period=datetime.timedelta(hours=10)): ... dt = start ... ix = 0 ... while ix < count: ... yield dt ... dt += period ... ix += 1 ... >>> gen = date_gen() >>> count = 0 >>> while True: ... try: ... next = [gen.next() for i in range(6)] ... except StopIteration: ... break ... data.append((count, next[0:1])) ... count += 1 ... data.append((count, next[1:3])) ... count += 1 ... data.append((count, next[3:6])) ... count += 1 ... >>> print data # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [(0, [datetime.datetime(2005, 7, 15, 11, 21, 32, 104, ...<...Eastern...>)]), (1, [datetime.datetime(2005, 7, 15, 21, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 16, 7, 21, 32, 104, ...<...Eastern...>)]), (2, [datetime.datetime(2005, 7, 16, 17, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 3, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, 32, 104, ...<...Eastern...>)]), (3, [datetime.datetime(2005, 7, 17, 23, 21, 32, 104, ...<...Eastern...>)]), (4, [datetime.datetime(2005, 7, 18, 9, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 18, 19, 21, 32, 104, ...<...Eastern...>)]), (5, [datetime.datetime(2005, 7, 19, 5, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 19, 15, 21, 32, 104, ...<...Eastern...>), datetime.datetime(2005, 7, 20, 1, 21, 32, 104, ...<...Eastern...>)])] >>> data_dict = dict(data) >>> for doc_id, value in data: ... setindex.index_doc(doc_id, value) ... >>> list(setindex.ids()) [0, 1, 2, 3, 4, 5] >>> set(setindex.values()) == set( ... setindex.normalizer.value(v) for v in date_gen()) True For the searches, we will actually use a request and interaction, with an adapter that returns the Eastern timezone. This makes the examples less dependent on the machine that they use. >>> import zope.security.management >>> import zope.publisher.browser >>> import zope.interface.common.idatetime >>> import zope.publisher.interfaces >>> request = zope.publisher.browser.TestRequest() >>> zope.security.management.newInteraction(request) >>> from zope import interface, component >>> @interface.implementer(zope.interface.common.idatetime.ITZInfo) ... @component.adapter(zope.publisher.interfaces.IRequest) ... def tzinfo(req): ... return pytz.timezone('US/Eastern') ... >>> component.provideAdapter(tzinfo) >>> n.all(naive_datetime, index).tzinfo is pytz.timezone('US/Eastern') True >>> set(setindex.apply({'any_of': (datetime.date(2005, 7, 17), ... datetime.date(2005, 7, 20), ... datetime.date(2005, 12, 31))})) == set( ... (2, 3, 5)) True Note that this search is using the normalized values. >>> set(setindex.apply({'all_of': ( ... datetime.datetime( ... 2005, 7, 16, 7, 21, tzinfo=pytz.timezone('US/Eastern')), ... datetime.datetime( ... 2005, 7, 15, 21, 21, tzinfo=pytz.timezone('US/Eastern')),)}) ... ) == set((1,)) True >>> list(setindex.apply({'any': None})) [0, 1, 2, 3, 4, 5] >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1))}) ... ) == set((0, 1, 2, 3, 4, 5)) True >>> set(setindex.apply({'between': ( ... datetime.datetime(2005, 4, 1, 12), datetime.datetime(2006, 5, 1), ... True, True)}) ... ) == set((0, 1, 2, 3, 4, 5)) True 'between' searches should deal with dates well. >>> set(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == set((1, 2, 3)) True >>> len(setindex.apply({'between': ( ... datetime.date(2005, 7, 16), datetime.date(2005, 7, 17))}) ... ) == len(setindex.apply({'between': ( ... datetime.date(2005, 7, 15), datetime.date(2005, 7, 18), ... True, True)}) ... ) True Removing docs works as usual. >>> setindex.unindex_doc(1) >>> list(setindex.ids()) [0, 2, 3, 4, 5] Value, Minvalue and Maxvalue can take timezone-less datetimes and dates. >>> setindex.minValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 15, 11, 21, ...<...Eastern...>) >>> setindex.minValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>) >>> setindex.maxValue() # doctest: +ELLIPSIS datetime.datetime(2005, 7, 20, 1, 21, ...<...Eastern...>) >>> setindex.maxValue(datetime.date(2005, 7, 17)) # doctest: +ELLIPSIS datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>) >>> list(setindex.values( ... datetime.date(2005, 7, 17), datetime.date(2005, 7, 17))) ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE [datetime.datetime(2005, 7, 17, 3, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 13, 21, ...<...Eastern...>), datetime.datetime(2005, 7, 17, 23, 21, ...<...Eastern...>)] >>> zope.security.management.endInteraction() # TODO put in tests tearDown Sorting ------- The normalization wrapper provides the zope.index.interfaces.IIndexSort interface if its upstream index provides it. For example, the DateTimeValueIndex will provide IIndexSort, because ValueIndex provides sorting. It will also delegate the ``sort`` method to the value index. >>> from zc.catalog.index import DateTimeValueIndex >>> from zope.index.interfaces import IIndexSort >>> ix = DateTimeValueIndex() >>> IIndexSort.providedBy(ix.index) True >>> IIndexSort.providedBy(ix) True >>> ix.sort.im_self is ix.index True But it won't work for indexes that doesn't do sorting, for example DateTimeSetIndex. >>> ix = DateTimeSetIndex() >>> IIndexSort.providedBy(ix.index) False >>> IIndexSort.providedBy(ix) False >>> ix.sort Traceback (most recent call last): ... AttributeError: 'SetIndex' object has no attribute 'sort' ============== Extent Catalog ============== An extent catalog is very similar to a normal catalog except that it only indexes items addable to its extent. The extent is both a filter and a set that may be merged with other result sets. The filtering is an additional feature we will discuss below; we'll begin with a simple "do nothing" extent that only supports the second use case. We create the state that the text needs here. >>> import zope.keyreference.persistent >>> import zope.component >>> import zope.intid >>> import zope.component >>> import zope.component.interfaces >>> import zope.component.persistentregistry >>> from ZODB.tests.util import DB >>> import transaction >>> zope.component.provideAdapter( ... zope.keyreference.persistent.KeyReferenceToPersistent, ... adapts=(zope.interface.Interface,)) >>> zope.component.provideAdapter( ... zope.keyreference.persistent.connectionOfPersistent, ... adapts=(zope.interface.Interface,)) >>> site_manager = None >>> def getSiteManager(context=None): ... if context is None: ... if site_manager is None: ... return zope.component.getGlobalSiteManager() ... else: ... return site_manager ... else: ... try: ... return zope.component.interfaces.IComponentLookup(context) ... except TypeError, error: ... raise zope.component.ComponentLookupError(*error.args) ... >>> def setSiteManager(sm): ... global site_manager ... site_manager = sm ... if sm is None: ... zope.component.getSiteManager.reset() ... else: ... zope.component.getSiteManager.sethook(getSiteManager) ... >>> def makeRoot(): ... db = DB() ... conn = db.open() ... root = conn.root() ... site_manager = root['components'] = ( ... zope.component.persistentregistry.PersistentComponents()) ... site_manager.__bases__ = (zope.component.getGlobalSiteManager(),) ... site_manager.registerUtility( ... zope.intid.IntIds(family=btrees_family), ... provided=zope.intid.interfaces.IIntIds) ... setSiteManager(site_manager) ... transaction.commit() ... return root ... >>> @zope.component.adapter(zope.interface.Interface) ... @zope.interface.implementer(zope.component.interfaces.IComponentLookup) ... def getComponentLookup(obj): ... return obj._p_jar.root()['components'] ... >>> zope.component.provideAdapter(getComponentLookup) To show the extent catalog at work, we need an intid utility, an index, some items to index. We'll do this within a real ZODB and a real intid utility. >>> import zc.catalog >>> import zc.catalog.interfaces >>> from zc.catalog import interfaces, extentcatalog >>> from zope import interface, component >>> from zope.interface import verify >>> import persistent >>> import BTrees.IFBTree >>> root = makeRoot() >>> intid = zope.component.getUtility( ... zope.intid.interfaces.IIntIds, context=root) >>> TreeSet = btrees_family.IF.TreeSet >>> from zope.container.interfaces import IContained >>> class DummyIndex(persistent.Persistent): ... interface.implements(IContained) ... __parent__ = __name__ = None ... def __init__(self): ... self.uids = TreeSet() ... def unindex_doc(self, uid): ... if uid in self.uids: ... self.uids.remove(uid) ... def index_doc(self, uid, obj): ... self.uids.insert(uid) ... def clear(self): ... self.uids.clear() ... def apply(self, query): ... return [uid for uid in self.uids if uid <= query] ... >>> class DummyContent(persistent.Persistent): ... def __init__(self, name, parent): ... self.id = name ... self.__parent__ = parent ... >>> extent = extentcatalog.Extent(family=btrees_family) >>> verify.verifyObject(interfaces.IExtent, extent) True >>> root['catalog'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit() Now we have a catalog set up with an index and an extent. We can add some data to the extent: >>> matches = [] >>> for i in range(100): ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... matches.append(doc_id) >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True We can get the size of the extent. >>> len(extent) 100 Unindexing an object that is in the catalog should simply remove it from the catalog and index as usual. >>> matches[0] in catalog.extent True >>> matches[0] in catalog['index'].uids True >>> catalog.unindex_doc(matches[0]) >>> matches[0] in catalog.extent False >>> matches[0] in catalog['index'].uids False >>> doc_id = matches.pop(0) >>> sorted(extent) == sorted(index.uids) == matches True Clearing the catalog clears both the extent and the contained indexes. >>> catalog.clear() >>> list(catalog.extent) == list(catalog['index'].uids) == [] True Updating all indexes and an individual index both also update the extent. >>> catalog.updateIndexes() >>> matches.insert(0, doc_id) >>> sorted(extent) == sorted(index.uids) == matches True >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) == matches True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0]) But so why have an extent in the first place? It allows indices to operate against a reliable collection of the full indexed data; therefore, it allows the indices in zc.catalog to perform NOT operations. The extent itself provides a number of merging features to allow its values to be merged with other BTrees.IFBTree data structures. These include intersection, union, difference, and reverse difference. Given an extent named 'extent' and another IFBTree data structure named 'data', intersections can be spelled "extent & data" or "data & extent"; unions can be spelled "extent | data" or "data | extent"; differences can be spelled "extent - data"; and reverse differences can be spelled "data - extent". Unions and intersections are weighted. >>> extent = extentcatalog.Extent(family=btrees_family) >>> for i in range(1, 100, 2): ... extent.add(i, None) ... >>> alt_set = TreeSet() >>> alt_set.update(range(0, 166, 33)) # return value is unimportant here 6 >>> sorted(alt_set) [0, 33, 66, 99, 132, 165] >>> sorted(extent & alt_set) [33, 99] >>> sorted(alt_set & extent) [33, 99] >>> sorted(extent.intersection(alt_set)) [33, 99] >>> original = set(extent) >>> union_matches = original.copy() >>> union_matches.update(alt_set) >>> union_matches = sorted(union_matches) >>> sorted(alt_set | extent) == union_matches True >>> sorted(extent | alt_set) == union_matches True >>> sorted(extent.union(alt_set)) == union_matches True >>> sorted(alt_set - extent) [0, 66, 132, 165] >>> sorted(extent.rdifference(alt_set)) [0, 66, 132, 165] >>> original.remove(33) >>> original.remove(99) >>> set(extent - alt_set) == original True >>> set(extent.difference(alt_set)) == original True We can pass our own instantiated UID utility to extentcatalog.Catalog. >>> extent = extentcatalog.Extent(family=btrees_family) >>> uidutil = zope.intid.IntIds() >>> cat = extentcatalog.Catalog(extent, uidutil) >>> cat["index"] = DummyIndex() >>> cat.UIDSource is uidutil True >>> cat._getUIDSource() is uidutil True The ResultSet instance returned by the catalog's `searchResults` method uses our UID utility. >>> obj = DummyContent(43, root) >>> uid = uidutil.register(obj) >>> cat.index_doc(uid, obj) >>> res = cat.searchResults(index=uid) >>> res.uidutil is uidutil True >>> list(res) == [obj] True `searchResults` may also return None. >>> cat.searchResults() is None True Calling `updateIndex` and `updateIndexes` when the catalog has its uid source set works as well. >>> cat.clear() >>> uid in cat.extent False All objects in the uid utility are indexed. >>> cat.updateIndexes() >>> uid in cat.extent True >>> len(cat.extent) 1 >>> obj2 = DummyContent(44, root) >>> uid2 = uidutil.register(obj2) >>> cat.updateIndexes() >>> len(cat.extent) 2 >>> uid2 in cat.extent True >>> uidutil.unregister(obj2) >>> cat.clear() >>> uid in cat.extent False >>> cat.updateIndex(cat["index"]) >>> uid in cat.extent True With a self-populating extent, calling `updateIndex` or `updateIndexes` means only the objects whose ids are in the extent are updated/reindexed; if present, the catalog will use its uid source to look up the objects by id. >>> extent = extentcatalog.NonPopulatingExtent(family=btrees_family) >>> cat = extentcatalog.Catalog(extent, uidutil) >>> cat["index"] = DummyIndex() >>> extent.add(uid, obj) >>> uid in cat["index"].uids False >>> cat.updateIndexes() >>> uid in cat["index"].uids True >>> cat.clear() >>> uid in cat["index"].uids False >>> uid in cat.extent False >>> cat.extent.add(uid, obj) >>> cat.updateIndex(cat["index"]) >>> uid in cat["index"].uids True Unregister the objects of the previous tests from intid utility: >>> intid = zope.component.getUtility( ... zope.intid.interfaces.IIntIds, context=root) >>> for doc_id in matches: ... intid.unregister(intid.queryObject(doc_id)) Catalog with a filter extent ---------------------------- As discussed at the beginning of this document, extents can not only help with index operations, but also act as a filter, so that a given catalog can answer questions about a subset of the objects contained in the intids. The filter extent only stores objects that match a given filter. >>> def filter(extent, uid, ob): ... assert interfaces.IFilterExtent.providedBy(extent) ... # This is an extent of objects with odd-numbered uids without a ... # True ignore attribute ... return uid % 2 and not getattr(ob, 'ignore', False) ... >>> extent = extentcatalog.FilterExtent(filter, family=btrees_family) >>> verify.verifyObject(interfaces.IFilterExtent, extent) True >>> root['catalog1'] = catalog = extentcatalog.Catalog(extent) >>> verify.verifyObject(interfaces.IExtentCatalog, catalog) True >>> index = DummyIndex() >>> catalog['index'] = index >>> transaction.commit() Now we have a catalog set up with an index and an extent. If we create some content and ask the catalog to index it, only the ones that match the filter will be in the extent and in the index. >>> matches = [] >>> fails = [] >>> i = 0 >>> while True: ... c = DummyContent(i, root) ... root[i] = c ... doc_id = intid.register(c) ... catalog.index_doc(doc_id, c) ... if filter(extent, doc_id, c): ... matches.append(doc_id) ... else: ... fails.append(doc_id) ... i += 1 ... if i > 99 and len(matches) > 4: ... break ... >>> matches.sort() >>> sorted(extent) == sorted(index.uids) == matches True If a content object is indexed that used to match the filter but no longer does, it should be removed from the extent and indexes. >>> matches[0] in catalog.extent True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> filter(extent, matches[0], obj) False >>> catalog.index_doc(matches[0], obj) >>> doc_id = matches.pop(0) >>> doc_id in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True Unindexing an object that is not in the catalog should be a no-op. >>> fails[0] in catalog.extent False >>> catalog.unindex_doc(fails[0]) >>> fails[0] in catalog.extent False >>> sorted(extent) == sorted(index.uids) == matches True Updating all indexes and an individual index both also update the extent. >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> index2.__parent__ == catalog True >>> index.uids.remove(matches[0]) # to confirm that only index 2 is touched >>> catalog.updateIndex(index2) >>> sorted(extent) == sorted(index2.uids) True >>> matches[0] in index.uids False >>> matches[0] in index2.uids True >>> res = index.uids.insert(matches[0]) If you update a single index and an object is no longer a member of the extent, it is removed from all indexes. >>> matches[0] in catalog.extent True >>> matches[0] in index.uids True >>> matches[0] in index2.uids True >>> obj = intid.getObject(matches[0]) >>> obj.ignore = True >>> catalog.updateIndex(index2) >>> matches[0] in catalog.extent False >>> matches[0] in index.uids False >>> matches[0] in index2.uids False >>> doc_id = matches.pop(0) >>> (matches == sorted(catalog.extent) == sorted(index.uids) ... == sorted(index2.uids)) True Self-populating extents ----------------------- An extent may know how to populate itself; this is especially useful if the catalog can be initialized with fewer items than those available in the IIntIds utility that are also within the nearest Zope 3 site (the policy coded in the basic Zope 3 catalog). Such an extent must implement the `ISelfPopulatingExtent` interface, which requires two attributes. Let's use the `FilterExtent` class as a base for implementing such an extent, with a method that selects content item 0 (created and registered above):: >>> class PopulatingExtent( ... extentcatalog.FilterExtent, ... extentcatalog.NonPopulatingExtent): ... ... def populate(self): ... if self.populated: ... return ... self.add(intid.getId(root[0]), root[0]) ... super(PopulatingExtent, self).populate() Creating a catalog based on this extent ignores objects in the database already:: >>> def accept_any(extent, uid, ob): ... return True >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> catalog = extentcatalog.Catalog(extent) >>> index = DummyIndex() >>> catalog['index'] = index >>> root['catalog2'] = catalog >>> transaction.commit() At this point, our extent remains unpopulated:: >>> extent.populated False Iterating over the extent does not cause it to be automatically populated:: >>> list(extent) [] Causing our new index to be filled will cause the `populate()` method to be called, setting the `populate` flag as a side-effect:: >>> catalog.updateIndex(index) >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True The index has been updated with the documents identified by the extent:: >>> list(index.uids) == [intid.getId(root[0])] True Updating the same index repeatedly will continue to use the extent as the source of documents to include:: >>> catalog.updateIndex(index) >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True The `updateIndexes()` method has a similar behavior. If we add an additional index to the catalog, we see that it indexes only those objects from the extent:: >>> index2 = DummyIndex() >>> catalog['index2'] = index2 >>> catalog.updateIndexes() >>> list(extent) == [intid.getId(root[0])] True >>> list(index.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True When we have fresh catalog and extent (not yet populated), we see that `updateIndexes()` will cause the extent to be populated:: >>> extent = PopulatingExtent(accept_any, family=btrees_family) >>> root['catalog3'] = catalog = extentcatalog.Catalog(extent) >>> index1 = DummyIndex() >>> index2 = DummyIndex() >>> catalog['index1'] = index1 >>> catalog['index2'] = index2 >>> transaction.commit() >>> extent.populated False >>> catalog.updateIndexes() >>> extent.populated True >>> list(extent) == [intid.getId(root[0])] True >>> list(index1.uids) == [intid.getId(root[0])] True >>> list(index2.uids) == [intid.getId(root[0])] True We'll make sure everything can be safely committed. >>> transaction.commit() >>> setSiteManager(None) ======= Stemmer ======= The stemmer uses Andreas Jung's stemmer code, which is a Python wrapper of M. F. Porter's Snowball project (http://snowball.tartarus.org/index.php). It is designed to be used as part of a pipeline in a zope/index/text/ lexicon, after a splitter. This enables getting the relevance ranking of the zope/index/text code with the splitting functionality of TextIndexNG 3.x. It requires that the TextIndexNG extensions--specifically txngstemmer--have been compiled and installed in your Python installation. Inclusion of the textindexng package is not necessary. As of this writing (Jan 3, 2007), installing the necessary extensions can be done with the following steps: - `svn co https://svn.sourceforge.net/svnroot/textindexng/extension_modules/trunk ext_mod` - `cd ext_mod` - (using the python you use for Zope) `python setup.py install` Another approach is to simply install TextIndexNG (see http://opensource.zopyx.com/software/textindexng3) The stemmer must be instantiated with the language for which stemming is desired. It defaults to 'english'. For what it is worth, other languages supported as of this writing, using the strings that the stemmer expects, include the following: 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'italian', 'norwegian', 'portuguese', 'russian', 'spanish', and 'swedish'. For instance, let's build an index with an english stemmer. >>> from zope.index.text import textindex, lexicon >>> import zc.catalog.stemmer >>> lex = lexicon.Lexicon( ... lexicon.Splitter(), lexicon.CaseNormalizer(), ... lexicon.StopWordRemover(), zc.catalog.stemmer.Stemmer('english')) >>> ix = textindex.TextIndex(lex) >>> data = [ ... (0, 'consigned consistency consoles the constables'), ... (1, 'knaves kneeled and knocked knees, knowing no knights')] >>> for doc_id, text in data: ... ix.index_doc(doc_id, text) ... >>> list(ix.apply('consoling a constable')) [0] >>> list(ix.apply('knightly kneel')) [1] Note that query terms with globbing characters are not stemmed. >>> list(ix.apply('constables*')) [] ======================= Support for legacy data ======================= Prior to the introduction of btree "families" and the ``BTrees.Interfaces.IBTreeFamily`` interface, the indexes defined by the ``zc.catalog.index`` module used the instance attributes ``btreemodule`` and ``IOBTree``, initialized in the constructor, and the ``BTreeAPI`` property. These are replaced by the ``family`` attribute in the current implementation. This is a white-box test that verifies that the supported values in existing data structures (loaded from pickles) can be used effectively with the current implementation. There are two supported sets of values; one for 32-bit btrees:: >>> import BTrees.IOBTree >>> legacy32 = { ... "btreemodule": "BTrees.IFBTree", ... "IOBTree": BTrees.IOBTree.IOBTree, ... } and another for 64-bit btrees:: >>> import BTrees.LOBTree >>> legacy64 = { ... "btreemodule": "BTrees.LFBTree", ... "IOBTree": BTrees.LOBTree.LOBTree, ... } In each case, actual legacy structures will also include index structures that match the right integer size:: >>> import BTrees.OOBTree >>> import BTrees.Length >>> legacy32["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy32["documents_to_values"] = BTrees.IOBTree.IOBTree() >>> legacy32["documentCount"] = BTrees.Length.Length(0) >>> legacy32["wordCount"] = BTrees.Length.Length(0) >>> legacy64["values_to_documents"] = BTrees.OOBTree.OOBTree() >>> legacy64["documents_to_values"] = BTrees.LOBTree.LOBTree() >>> legacy64["documentCount"] = BTrees.Length.Length(0) >>> legacy64["wordCount"] = BTrees.Length.Length(0) What we want to do is verify that the ``family`` attribute is properly computed for instances loaded from legacy data, and ensure that the structure is updated cleanly without providing cause for a read-only transaction to become a write-transaction. We'll need to create instances that conform to the old data structures, pickle them, and show that unpickling them produces instances that use the correct families. Let's create new instances, and force the internal data to match the old structures:: >>> import pickle >>> import zc.catalog.index >>> vi32 = zc.catalog.index.ValueIndex() >>> vi32.__dict__ = legacy32.copy() >>> legacy32_pickle = pickle.dumps(vi32) >>> vi64 = zc.catalog.index.ValueIndex() >>> vi64.__dict__ = legacy64.copy() >>> legacy64_pickle = pickle.dumps(vi64) Now, let's unpickle these structures and verify the structures. We'll start with the 32-bit variety:: >>> vi32 = pickle.loads(legacy32_pickle) >>> vi32.__dict__["btreemodule"] 'BTrees.IFBTree' >>> vi32.__dict__["IOBTree"] >>> "family" in vi32.__dict__ False >>> vi32._p_changed False The ``family`` property returns the ``BTrees.family32`` singleton:: >>> vi32.family is BTrees.family32 True Once accessed, the legacy values have been cleaned out from the instance dictionary:: >>> "btreemodule" in vi32.__dict__ False >>> "IOBTree" in vi32.__dict__ False >>> "BTreeAPI" in vi32.__dict__ False Accessing these attributes as attributes provides the proper values anyway:: >>> vi32.btreemodule 'BTrees.IFBTree' >>> vi32.IOBTree >>> vi32.BTreeAPI Even though the instance dictionary has been cleaned up, the change flag hasn't been set. This is handled this way to avoid turning a read-only transaction into a write-transaction:: >>> vi32._p_changed False The 64-bit variation provides equivalent behavior:: >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64.__dict__["btreemodule"] 'BTrees.LFBTree' >>> vi64.__dict__["IOBTree"] >>> "family" in vi64.__dict__ False >>> vi64._p_changed False >>> vi64.family is BTrees.family64 True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree >>> vi64.BTreeAPI >>> vi64._p_changed False Now, if we have a legacy structure and explicitly set the ``family`` attribute, the old data structures will be cleared and replaced with the new structure. If the object is associated with a data manager, the changed flag will be set as well:: >>> class DataManager(object): ... def register(self, ob): ... pass >>> vi64 = pickle.loads(legacy64_pickle) >>> vi64._p_jar = DataManager() >>> vi64.family = BTrees.family64 >>> vi64._p_changed True >>> "btreemodule" in vi64.__dict__ False >>> "IOBTree" in vi64.__dict__ False >>> "BTreeAPI" in vi64.__dict__ False >>> "family" in vi64.__dict__ True >>> vi64.family is BTrees.family64 True >>> vi64.btreemodule 'BTrees.LFBTree' >>> vi64.IOBTree >>> vi64.BTreeAPI ======= Globber ======= The globber takes a query and makes any term that isn't already a glob into something that ends in a star. It was originally envisioned as a *very* low- rent stemming hack. The author now questions its value, and hopes that the new stemming pipeline option can be used instead. Nonetheless, here is an example of it at work. >>> from zope.index.text import textindex >>> index = textindex.TextIndex() >>> lex = index.lexicon >>> from zc.catalog import globber >>> globber.glob('foo bar and baz or (b?ng not boo)', lex) '(((foo* and bar*) and baz*) or (b?ng and not boo*))' ================ Callable Wrapper ================ If we want to index some value that is easily derivable from a document, we have to define an interface with this value as an attribute, and create an adapter that calculates this value and implements this interface. All this is too much hassle if the want to store a single easily derivable value. CallableWrapper solves this problem, by converting the document to the indexed value with a callable converter. Here's a contrived example. Suppose we have cars that know their mileage expressed in miles per gallon, but we want to index their economy in litres per 100 km. >>> class Car(object): ... def __init__(self, mpg): ... self.mpg = mpg >>> def mpg2lp100(car): ... return 100.0/(1.609344/3.7854118 * car.mpg) Let's create an index that would index cars' l/100 km rating. >>> from zc.catalog import index, catalogindex >>> idx = catalogindex.CallableWrapper(index.ValueIndex(), mpg2lp100) Let's add a couple of cars to the index! >>> hummer = Car(10.0) >>> beamer = Car(22.0) >>> civic = Car(45.0) >>> idx.index_doc(1, hummer) >>> idx.index_doc(2, beamer) >>> idx.index_doc(3, civic) The indexed values should be the converted l/100 km ratings: >>> list(idx.values()) # doctest: +ELLIPSIS [5.22699076283393..., 10.691572014887601, 23.521458432752723] We can query for cars that consume fuel in some range: >>> list(idx.apply({'between': (5.0, 7.0)})) [3] ========================== zc.catalog Browser Support ========================== The zc.catalog.browser package adds simple TTW addition/inspection for SetIndex and ValueIndex. First, we need a browser so we can test the web UI. >>> from zope.testbrowser.testing import Browser >>> browser = Browser() >>> browser.addHeader('Authorization', 'Basic mgr:mgrpw') >>> browser.addHeader('Accept-Language', 'en-US') >>> browser.open('http://localhost') Now we need to add the catalog that these indexes are going to reside within. >>> browser.open('/++etc++site/default/@@contents.html') >>> browser.getLink('Add').click() >>> browser.getControl('Catalog').click() >>> browser.getControl(name='id').value = 'catalog' >>> browser.getControl('Add').click() SetIndex -------- Add the SetIndex to the catalog. >>> browser.getLink('Add').click() >>> browser.getControl('Set Index').click() >>> browser.getControl(name='id').value = 'set_index' >>> browser.getControl('Add').click() The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We'll use a simple interfaces for demonstration purposes, it's not really significant.) >>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'set_index' >>> browser.getControl('Add').click() Now we can look at the index and see how is is configured. >>> browser.getLink('set_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True... We need to go back to the catalog so we can add a different index. >>> browser.open('/++etc++site/default/catalog/@@contents.html') ValueIndex ---------- Add the ValueIndex to the catalog. >>> browser.getLink('Add').click() >>> browser.getControl('Value Index').click() >>> browser.getControl(name='id').value = 'value_index' >>> browser.getControl('Add').click() The add form needs values for what interface to adapt candidate objects to, and what field name to use, and whether-or-not that field is a callable. (We'll use a simple interfaces for demonstration purposes, it's not really significant.) >>> browser.getControl('Interface', index=0).displayValue = [ ... 'zope.size.interfaces.ISized'] >>> browser.getControl('Field Name').value = 'sizeForSorting' >>> browser.getControl('Field Callable').click() >>> browser.getControl(name='add_input_name').value = 'value_index' >>> browser.getControl('Add').click() Now we can look at the index and see how is is configured. >>> browser.getLink('value_index').click() >>> print browser.contents <... ...Interface...zope.size.interfaces.ISized... ...Field Name...sizeForSorting... ...Field Callable...True... Keywords: zope3 i18n date time duration catalog index Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Web Environment Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: Zope Public License Classifier: Programming Language :: Python Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Topic :: Internet :: WWW/HTTP Classifier: Framework :: Zope3 zc.catalog-1.6/bootstrap.py0000664000177100020040000002443512165324663017142 0ustar menesismenesis00000000000000############################################################################## # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Bootstrap a buildout-based project Simply run this script in a directory containing a buildout.cfg. The script accepts buildout command-line options, so you can use the -c option to specify an alternate configuration file. """ import os, shutil, sys, tempfile, urllib, urllib2, subprocess from optparse import OptionParser if sys.platform == 'win32': def quote(c): if ' ' in c: return '"%s"' % c # work around spawn lamosity on windows else: return c else: quote = str # See zc.buildout.easy_install._has_broken_dash_S for motivation and comments. stdout, stderr = subprocess.Popen( [sys.executable, '-Sc', 'try:\n' ' import ConfigParser\n' 'except ImportError:\n' ' print 1\n' 'else:\n' ' print 0\n'], stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() has_broken_dash_S = bool(int(stdout.strip())) # In order to be more robust in the face of system Pythons, we want to # run without site-packages loaded. This is somewhat tricky, in # particular because Python 2.6's distutils imports site, so starting # with the -S flag is not sufficient. However, we'll start with that: if not has_broken_dash_S and 'site' in sys.modules: # We will restart with python -S. args = sys.argv[:] args[0:0] = [sys.executable, '-S'] args = map(quote, args) os.execv(sys.executable, args) # Now we are running with -S. We'll get the clean sys.path, import site # because distutils will do it later, and then reset the path and clean # out any namespace packages from site-packages that might have been # loaded by .pth files. clean_path = sys.path[:] import site # imported because of its side effects sys.path[:] = clean_path for k, v in sys.modules.items(): if k in ('setuptools', 'pkg_resources') or ( hasattr(v, '__path__') and len(v.__path__) == 1 and not os.path.exists(os.path.join(v.__path__[0], '__init__.py'))): # This is a namespace package. Remove it. sys.modules.pop(k) is_jython = sys.platform.startswith('java') setuptools_source = 'http://peak.telecommunity.com/dist/ez_setup.py' distribute_source = 'http://python-distribute.org/distribute_setup.py' # parsing arguments def normalize_to_url(option, opt_str, value, parser): if value: if '://' not in value: # It doesn't smell like a URL. value = 'file://%s' % ( urllib.pathname2url( os.path.abspath(os.path.expanduser(value))),) if opt_str == '--download-base' and not value.endswith('/'): # Download base needs a trailing slash to make the world happy. value += '/' else: value = None name = opt_str[2:].replace('-', '_') setattr(parser.values, name, value) usage = '''\ [DESIRED PYTHON FOR BUILDOUT] bootstrap.py [options] Bootstraps a buildout-based project. Simply run this script in a directory containing a buildout.cfg, using the Python that you want bin/buildout to use. Note that by using --setup-source and --download-base to point to local resources, you can keep this script from going over the network. ''' parser = OptionParser(usage=usage) parser.add_option("-v", "--version", dest="version", help="use a specific zc.buildout version") parser.add_option("-d", "--distribute", action="store_true", dest="use_distribute", default=False, help="Use Distribute rather than Setuptools.") parser.add_option("--setup-source", action="callback", dest="setup_source", callback=normalize_to_url, nargs=1, type="string", help=("Specify a URL or file location for the setup file. " "If you use Setuptools, this will default to " + setuptools_source + "; if you use Distribute, this " "will default to " + distribute_source + ".")) parser.add_option("--download-base", action="callback", dest="download_base", callback=normalize_to_url, nargs=1, type="string", help=("Specify a URL or directory for downloading " "zc.buildout and either Setuptools or Distribute. " "Defaults to PyPI.")) parser.add_option("--eggs", help=("Specify a directory for storing eggs. Defaults to " "a temporary directory that is deleted when the " "bootstrap script completes.")) parser.add_option("-t", "--accept-buildout-test-releases", dest='accept_buildout_test_releases', action="store_true", default=False, help=("Normally, if you do not specify a --version, the " "bootstrap script and buildout gets the newest " "*final* versions of zc.buildout and its recipes and " "extensions for you. If you use this flag, " "bootstrap and buildout will get the newest releases " "even if they are alphas or betas.")) parser.add_option("-c", None, action="store", dest="config_file", help=("Specify the path to the buildout configuration " "file to be used.")) options, args = parser.parse_args() if options.eggs: eggs_dir = os.path.abspath(os.path.expanduser(options.eggs)) else: eggs_dir = tempfile.mkdtemp() if options.setup_source is None: if options.use_distribute: options.setup_source = distribute_source else: options.setup_source = setuptools_source if options.accept_buildout_test_releases: args.insert(0, 'buildout:accept-buildout-test-releases=true') try: import pkg_resources import setuptools # A flag. Sometimes pkg_resources is installed alone. if not hasattr(pkg_resources, '_distribute'): raise ImportError except ImportError: ez_code = urllib2.urlopen( options.setup_source).read().replace('\r\n', '\n') ez = {} exec ez_code in ez setup_args = dict(to_dir=eggs_dir, download_delay=0) if options.download_base: setup_args['download_base'] = options.download_base if options.use_distribute: setup_args['no_fake'] = True if sys.version_info[:2] == (2, 4): setup_args['version'] = '0.6.32' ez['use_setuptools'](**setup_args) if 'pkg_resources' in sys.modules: reload(sys.modules['pkg_resources']) import pkg_resources # This does not (always?) update the default working set. We will # do it. for path in sys.path: if path not in pkg_resources.working_set.entries: pkg_resources.working_set.add_entry(path) cmd = [quote(sys.executable), '-c', quote('from setuptools.command.easy_install import main; main()'), '-mqNxd', quote(eggs_dir)] if not has_broken_dash_S: cmd.insert(1, '-S') find_links = options.download_base if not find_links: find_links = os.environ.get('bootstrap-testing-find-links') if not find_links and options.accept_buildout_test_releases: find_links = 'http://downloads.buildout.org/' if find_links: cmd.extend(['-f', quote(find_links)]) if options.use_distribute: setup_requirement = 'distribute' else: setup_requirement = 'setuptools' ws = pkg_resources.working_set setup_requirement_path = ws.find( pkg_resources.Requirement.parse(setup_requirement)).location env = dict( os.environ, PYTHONPATH=setup_requirement_path) requirement = 'zc.buildout' version = options.version if version is None and not options.accept_buildout_test_releases: # Figure out the most recent final version of zc.buildout. import setuptools.package_index _final_parts = '*final-', '*final' def _final_version(parsed_version): for part in parsed_version: if (part[:1] == '*') and (part not in _final_parts): return False return True index = setuptools.package_index.PackageIndex( search_path=[setup_requirement_path]) if find_links: index.add_find_links((find_links,)) req = pkg_resources.Requirement.parse(requirement) if index.obtain(req) is not None: best = [] bestv = None for dist in index[req.project_name]: distv = dist.parsed_version if distv >= pkg_resources.parse_version('2dev'): continue if _final_version(distv): if bestv is None or distv > bestv: best = [dist] bestv = distv elif distv == bestv: best.append(dist) if best: best.sort() version = best[-1].version if version: requirement += '=='+version else: requirement += '<2dev' cmd.append(requirement) if is_jython: import subprocess exitcode = subprocess.Popen(cmd, env=env).wait() else: # Windows prefers this, apparently; otherwise we would prefer subprocess exitcode = os.spawnle(*([os.P_WAIT, sys.executable] + cmd + [env])) if exitcode != 0: sys.stdout.flush() sys.stderr.flush() print ("An error occurred when trying to install zc.buildout. " "Look above this message for any errors that " "were output by easy_install.") sys.exit(exitcode) ws.add_entry(eggs_dir) ws.require(requirement) import zc.buildout.buildout # If there isn't already a command in the args, add bootstrap if not [a for a in args if '=' not in a]: args.append('bootstrap') # if -c was provided, we push it back into args for buildout's main function if options.config_file is not None: args[0:0] = ['-c', options.config_file] zc.buildout.buildout.main(args) if not options.eggs: # clean up temporary egg directory shutil.rmtree(eggs_dir) zc.catalog-1.6/setup.cfg0000664000177100020040000000007312165325611016356 0ustar menesismenesis00000000000000[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 zc.catalog-1.6/CHANGES.txt0000664000177100020040000001244712165324663016364 0ustar menesismenesis00000000000000======= CHANGES ======= 1.6 (2013-07-04) ---------------- - Using Python's ``doctest`` module instead of deprecated ``zope.testing.doctest``. - Move ``zope.intid`` to dependencies. 1.5.1 (2012-01-20) ------------------ - Fix the extent catalog's `searchResults` method to work when using a local uid source. - Replaced a testing dependency on ``zope.app.authentication`` with ``zope.password``. - Removed ``zope.app.server`` test dependency. 1.5 (2010-10-19) ---------------- - The package's ``configure.zcml`` does not include the browser subpackage's ``configure.zcml`` anymore. This, together with ``browser`` and ``test_browser`` ``extras_require``, decouples the browser view registrations from the main code. As a result projects that do not need the ZMI views to be registered are not pulling in the zope.app.* dependencies anymore. To enable the ZMI views for your project, you will have to do two things: * list ``zc.catalog [browser]`` as a ``install_requires``. * have your project's ``configure.zcml`` include the ``zc.catalog.browser`` subpackage. - Only include the browser tests whenever the dependencies for the browser tests are available. - Python2.7 test fix. 1.4.5 (2010-10-05) ------------------ - Remove implicit test dependency on zope.app.dublincore, that was not needed in the first place. 1.4.4 (2010-07-06) ------------------ * Fixed test-failure happening with more recent ``mechanize`` (>=2.0). 1.4.3 (2010-03-09) ------------------ * Try to import the stemmer from the zopyx.txng3.ext package first, which as of 3.3.2 contains stability and memory leak fixes. 1.4.2 (2010-01-20) ------------------ * Fix missing testing dependencies when using ZTK by adding zope.login. 1.4.1 (2009-02-27) ------------------ * Add FieldIndex-like sorting support for the ValueIndex. * Add sorting indexes support for the NormalizationWrapper. 1.4.0 (2009-02-07) ------------------ Bugs fixed ~~~~~~~~~~ * Fixed a typo in ValueIndex addform and addMenuItem * Use ``zope.container`` instead of ``zope.app.container``. * Use ``zope.keyreference`` instead of ``zope.app.keyreference``. * Use ``zope.intid`` instead of ``zope.app.intid``. * Use ``zope.catalog`` instead of ``zope.app.catalog``. 1.3.0 (2008-09-10) ------------------ Features added ~~~~~~~~~~~~~~ * Added hook point to allow extent catalog to be used with local UID sources. 1.2.0 (2007-11-03) ------------------ Features added ~~~~~~~~~~~~~~ * Updated package meta-data. * zc.catalog now can use 64-bit BTrees ("L") as provided by ZODB 3.8. * Albertas Agejavas (alga@pov.lt) included the new CallableWrapper, for when the typical Zope 3 index-by-adapter story (zope.app.catalog.attribute) is unnecessary trouble, and you just want to use a callable. See callablewrapper.txt. This can also be used for other indexes based on the zope.index interfaces. * Extents now have a __len__. The current implementation defers to the standard BTree len implementation, and shares its performance characteristics: it needs to wake up all of the buckets, but if all of the buckets are awake it is a fairly quick operation. * A simple ISelfPoulatingExtent was added to the extentcatalog module for which populating is a no-op. This is directly useful for catalogs that are used as implementation details of a component, in which objects are indexed explicitly by your own calls rather than by the usual subscribers. It is also potentially slightly useful as a base for other self-populating extents. 1.1.1 (2007-3-17) ----------------- Bugs fixed ~~~~~~~~~~ 'all_of' would return all results when one of the values had no results. Reported, with test and fix provided, by Nando Quintana. 1.1 (2007-01-06) ---------------- Features removed ~~~~~~~~~~~~~~~~ The queueing of events in the extent catalog has been entirely removed. Subtransactions caused significant problems to the code introduced in 1.0. Other solutions also have significant problems, and the win of this kind of queueing is qustionable. Here is a run down of the approaches rejected for getting the queueing to work: * _p_invalidate (used in 1.0). Not really designed for use within a transaction, and reverts to last savepoint, rather than the beginning of the transaction. Could monkeypatch savepoints to iterate over precommit transaction hooks but that just smells too bad. * _p_resolveConflict. Requires application software to exist in ZEO and even ZRS installations, which is counter to our software deployment goals. Also causes useless repeated writes of empty queue to database, but that's not the showstopper. * vague hand-wavy ideas for separate storages or transaction managers for the queue. Never panned out in discussion. 1.0 (2007-01-05) ---------------- Bugs fixed ~~~~~~~~~~ * adjusted extentcatalog tests to trigger (and discuss and test) the queueing behavior. * fixed problem with excessive conflict errors due to queueing code. * updated stemming to work with newest version of TextIndexNG's extensions. * omitted stemming test when TextIndexNG's extensions are unavailable, so tests pass without it. Since TextIndexNG's extensions are optional, this seems reasonable. * removed use of zapi in extentcatalog. 0.2 (2006-11-22) ---------------- Features added ~~~~~~~~~~~~~~ * First release on Cheeseshop. zc.catalog-1.6/MANIFEST.in0000664000177100020040000000016312165324663016301 0ustar menesismenesis00000000000000 include *.py include *.txt include buildout.cfg recursive-include src *.txt recursive-include src *.zcml zc.catalog-1.6/LICENSE.txt0000664000177100020040000000402612165324663016370 0ustar menesismenesis00000000000000Zope Public License (ZPL) Version 2.1 A copyright notice accompanies this license document that identifies the copyright holders. This license has been certified as open source. It has also been designated as GPL compatible by the Free Software Foundation (FSF). Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions in source code must retain the accompanying copyright notice, this list of conditions, and the following disclaimer. 2. Redistributions in binary form must reproduce the accompanying copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Names of the copyright holders must not be used to endorse or promote products derived from this software without prior written permission from the copyright holders. 4. The right to distribute this software or to use it for any purpose does not give you the right to use Servicemarks (sm) or Trademarks (tm) of the copyright holders. Use of them is covered by separate agreement with the copyright holders. 5. If any files are modified, you must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. Disclaimer THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.