zope.mimetype-1.3.1/000755 000766 000024 00000000000 11466575503 014241 5ustar00macstaff000000 000000 zope.mimetype-1.3.1/CHANGES.txt000644 000766 000024 00000002321 11466575473 016056 0ustar00macstaff000000 000000 ======= CHANGES ======= 1.3.1 (2010-11-10) ------------------ - No longer depending on `zope.app.form` in `configure.zcml` by using `zope.formlib` instead, where the needed interfaces are living now. 1.3.0 (2010-06-26) ------------------ - Added testing dependency on ``zope.component [test]``. - Use zope.formlib instead of zope.app.form.browser for select widget. - Conform to repository policy. 1.2.0 (2009-12-26) ------------------ - Converted functional tests to unit tests and get rid of all extra test dependencies as a result. - Use the ITerms interface from zope.browser. - Declared missing dependencies, resolved direct dependency on zope.app.publisher. - Import content-type parser from zope.contenttype, adding a dependency on that package. 1.1.2 (2009-05-22) ------------------ - No longer depends on ``zope.app.component``. 1.1.1 (2009-04-03) ------------------ - Fixed wrong package version (version ``1.1.0`` was released as ``0.4.0`` at `pypi` but as ``1.1dev`` at `download.zope.org/distribution`) - Fixed author email and home page address. 1.1.0 (2007-11-01) ------------------ - Package data update. - First public release. 1.0.0 (2007-??-??) ------------------ - Initial release. zope.mimetype-1.3.1/COPYRIGHT.txt000644 000766 000024 00000000040 11466575473 016352 0ustar00macstaff000000 000000 Zope Foundation and Contributorszope.mimetype-1.3.1/LICENSE.txt000644 000766 000024 00000004026 11466575473 016074 0ustar00macstaff000000 000000 Zope Public License (ZPL) Version 2.1 A copyright notice accompanies this license document that identifies the copyright holders. This license has been certified as open source. It has also been designated as GPL compatible by the Free Software Foundation (FSF). Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions in source code must retain the accompanying copyright notice, this list of conditions, and the following disclaimer. 2. Redistributions in binary form must reproduce the accompanying copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Names of the copyright holders must not be used to endorse or promote products derived from this software without prior written permission from the copyright holders. 4. The right to distribute this software or to use it for any purpose does not give you the right to use Servicemarks (sm) or Trademarks (tm) of the copyright holders. Use of them is covered by separate agreement with the copyright holders. 5. If any files are modified, you must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. Disclaimer THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. zope.mimetype-1.3.1/PKG-INFO000644 000766 000024 00000140375 11466575503 015350 0ustar00macstaff000000 000000 Metadata-Version: 1.0 Name: zope.mimetype Version: 1.3.1 Summary: A simple package for working with MIME content types Home-page: http://pypi.python.org/pypi/zope.mimetype Author: Zope Foundation and Contributors Author-email: zope-dev@zope.org License: ZPL 2.1 Description: This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information. .. contents:: ============================ The Zope MIME Infrastructure ============================ This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information. The basic idea behind this is that content objects should provide an interface based on the actual content type they implement. For example, objects that represent text/xml or application/xml documents should be marked mark with the `IContentTypeXml` interface. This can allow additional views to be registered based on the content type, or subscribers may be registered to perform other actions based on the content type. One aspect of the content type that's important for all documents is that the content type interface determines whether the object data is interpreted as an encoded text document. Encoded text documents, in particular, can be decoded to obtain a single Unicode string. The content type intefaces for encoded text must derive from `IContentTypeEncoded`. (All content type interfaces derive from `IContentType` and directly provide `IContentTypeInterface`.) The default configuration provides direct support for a variety of common document types found in office environments. Supported lookups ----------------- Several different queries are supported by this package: - Given a MIME type expressed as a string, the associated interface, if any, can be retrieved using:: # `mimeType` is the MIME type as a string interface = queryUtility(IContentTypeInterface, mimeType) - Given a charset name, the associated `ICodec` instance can be retrieved using:: # `charsetName` is the charset name as a string codec = queryUtility(ICharsetCodec, charsetName) - Given a codec, the preferred charset name can be retrieved using:: # `codec` is an `ICodec` instance: charsetName = getUtility(ICodecPreferredCharset, codec.name).name - Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable MIME type can be made using:: # `filename` is a suggested file name, or None # `data` is uploaded data, or None # `content_type` is a Content-Type header value, or None # mimeType = getUtility(IMimeTypeGetter)( name=filename, data=data, content_type=content_type) - Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable charset name can be made using:: # `filename` is a suggested file name, or None # `data` is uploaded data, or None # `content_type` is a Content-Type header value, or None # charsetName = getUtility(ICharsetGetter)( name=filename, data=data, content_type=content_type) =================================== Retrieving Content Type Information =================================== MIME Types ---------- We'll start by initializing the interfaces and registrations for the content type interfaces. This is normally done via ZCML. >>> from zope.mimetype import types >>> types.setup() A utility is used to retrieve MIME types. >>> from zope import component >>> from zope.mimetype import typegetter >>> from zope.mimetype.interfaces import IMimeTypeGetter >>> component.provideUtility(typegetter.smartMimeTypeGuesser, ... provides=IMimeTypeGetter) >>> mime_getter = component.getUtility(IMimeTypeGetter) To map a particular file name, file contents, and content type to a MIME type. >>> mime_getter(name='file.txt', data='A text file.', ... content_type='text/plain') 'text/plain' In the default implementation if not enough information is given to discern a MIME type, None is returned. >>> mime_getter() is None True Character Sets -------------- A utility is also used to retrieve character sets (charsets). >>> from zope.mimetype.interfaces import ICharsetGetter >>> component.provideUtility(typegetter.charsetGetter, ... provides=ICharsetGetter) >>> charset_getter = component.getUtility(ICharsetGetter) To map a particular file name, file contents, and content type to a charset. >>> charset_getter(name='file.txt', data='This is a text file.', ... content_type='text/plain;charset=ascii') 'ascii' In the default implementation if not enough information is given to discern a charset, None is returned. >>> charset_getter() is None True Finding Interfaces ------------------ Given a MIME type we need to be able to find the appropriate interface. >>> from zope.mimetype.interfaces import IContentTypeInterface >>> component.getUtility(IContentTypeInterface, name=u'text/plain') It is also possible to enumerate all content type interfaces. >>> utilities = list(component.getUtilitiesFor(IContentTypeInterface)) If you want to find an interface from a MIME string, you can use the utilityies. >>> component.getUtility(IContentTypeInterface, name='text/plain') ============== Codec handling ============== We can create codecs programatically. Codecs are registered as utilities for ICodec with the name of their python codec. >>> from zope import component >>> from zope.mimetype.interfaces import ICodec >>> from zope.mimetype.codec import addCodec >>> sorted(component.getUtilitiesFor(ICodec)) [] >>> addCodec('iso8859-1', 'Western (ISO-8859-1)') >>> codec = component.getUtility(ICodec, name='iso8859-1') >>> codec >>> codec.name 'iso8859-1' >>> addCodec('utf-8', 'Unicode (UTF-8)') >>> codec2 = component.getUtility(ICodec, name='utf-8') We can programmatically add charsets to a given codec. This registers each charset as a named utility for ICharset. It also registers the codec as a utility for ICharsetCodec with the name of the charset. >>> from zope.mimetype.codec import addCharset >>> from zope.mimetype.interfaces import ICharset, ICharsetCodec >>> sorted(component.getUtilitiesFor(ICharset)) [] >>> sorted(component.getUtilitiesFor(ICharsetCodec)) [] >>> addCharset(codec.name, 'latin1') >>> charset = component.getUtility(ICharset, name='latin1') >>> charset >>> charset.name 'latin1' >>> component.getUtility(ICharsetCodec, name='latin1') is codec True When adding a charset we can state that we want that charset to be the preferred charset for its codec. >>> addCharset(codec.name, 'iso8859-1', preferred=True) >>> addCharset(codec2.name, 'utf-8', preferred=True) A codec can have at most one preferred charset. >>> addCharset(codec.name, 'test', preferred=True) Traceback (most recent call last): ... ValueError: Codec already has a preferred charset. Preferred charsets are registered as utilities for ICodecPreferredCharset under the name of the python codec. >>> from zope.mimetype.interfaces import ICodecPreferredCharset >>> preferred = component.getUtility(ICodecPreferredCharset, name='iso8859-1') >>> preferred >>> preferred.name 'iso8859-1' >>> sorted(component.getUtilitiesFor(ICodecPreferredCharset)) [(u'iso8859-1', ), (u'utf-8', )] We can look up a codec by the name of its charset: >>> component.getUtility(ICharsetCodec, name='latin1') is codec True >>> component.getUtility(ICharsetCodec, name='utf-8') is codec2 True Or we can look up all codecs: >>> sorted(component.getUtilitiesFor(ICharsetCodec)) [(u'iso8859-1', ), (u'latin1', ), (u'test', ), (u'utf-8', )] =================================== Constraint Functions for Interfaces =================================== The `zope.mimetype.interfaces` module defines interfaces that use some helper functions to define constraints on the accepted data. These helpers are used to determine whether values conform to the what's allowed for parts of a MIME type specification and other parts of a Content-Type header as specified in RFC 2045. Single Token ------------ The first is the simplest: the `tokenConstraint()` function returns `True` if the ASCII string it is passed conforms to the `token` production in section 5.1 of the RFC. Let's import the function:: >>> from zope.mimetype.interfaces import tokenConstraint Typical token are the major and minor parts of the MIME type and the parameter names for the Content-Type header. The function should return `True` for these values:: >>> tokenConstraint("text") True >>> tokenConstraint("plain") True >>> tokenConstraint("charset") True The function should also return `True` for unusual but otherwise normal token that may be used in some situations:: >>> tokenConstraint("not-your-fathers-token") True It must also allow extension tokens and vendor-specific tokens:: >>> tokenConstraint("x-magic") True >>> tokenConstraint("vnd.zope.special-data") True Since we expect input handlers to normalize values to lower case, upper case text is not allowed:: >>> tokenConstraint("Text") False Non-ASCII text is also not allowed:: >>> tokenConstraint("\x80") False >>> tokenConstraint("\xC8") False >>> tokenConstraint("\xFF") False Note that lots of characters are allowed in tokens, and there are no constraints that the token "look like" something a person would want to read:: >>> tokenConstraint(".-.-.-.") True Other characters are disallowed, however, including all forms of whitespace:: >>> tokenConstraint("foo bar") False >>> tokenConstraint("foo\tbar") False >>> tokenConstraint("foo\nbar") False >>> tokenConstraint("foo\rbar") False >>> tokenConstraint("foo\x7Fbar") False Whitespace before or after the token is not accepted either:: >>> tokenConstraint(" text") False >>> tokenConstraint("plain ") False Other disallowed characters are defined in the `tspecials` production from the RFC (also in section 5.1):: >>> tokenConstraint("(") False >>> tokenConstraint(")") False >>> tokenConstraint("<") False >>> tokenConstraint(">") False >>> tokenConstraint("@") False >>> tokenConstraint(",") False >>> tokenConstraint(";") False >>> tokenConstraint(":") False >>> tokenConstraint("\\") False >>> tokenConstraint('"') False >>> tokenConstraint("/") False >>> tokenConstraint("[") False >>> tokenConstraint("]") False >>> tokenConstraint("?") False >>> tokenConstraint("=") False A token must contain at least one character, so `tokenConstraint()` returns false for an empty string:: >>> tokenConstraint("") False MIME Type --------- A MIME type is specified using two tokens separated by a slash; whitespace between the tokens and the slash must be normalized away in the input handler. The `mimeTypeConstraint()` function is available to test a normalized MIME type value; let's import that function now:: >>> from zope.mimetype.interfaces import mimeTypeConstraint Let's test some common MIME types to make sure the function isn't obviously insane:: >>> mimeTypeConstraint("text/plain") True >>> mimeTypeConstraint("application/xml") True >>> mimeTypeConstraint("image/svg+xml") True If parts of the MIME type are missing, it isn't accepted:: >>> mimeTypeConstraint("text") False >>> mimeTypeConstraint("text/") False >>> mimeTypeConstraint("/plain") False As for individual tokens, whitespace is not allowed:: >>> mimeTypeConstraint("foo bar/plain") False >>> mimeTypeConstraint("text/foo bar") False Whitespace is not accepted around the slash either:: >>> mimeTypeConstraint("text /plain") False >>> mimeTypeConstraint("text/ plain") False Surrounding whitespace is also not accepted:: >>> mimeTypeConstraint(" text/plain") False >>> mimeTypeConstraint("text/plain ") False =================================== Minimal IContentInfo Implementation =================================== The `zope.mimetype.contentinfo` module provides a minimal `IContentInfo` implementation that adds no information to what's provided by a content object. This represents the most conservative content-type policy that might be useful. Let's take a look at how this operates by creating a couple of concrete content-type interfaces:: >>> from zope.mimetype import interfaces >>> class ITextPlain(interfaces.IContentTypeEncoded): ... """text/plain""" >>> class IApplicationOctetStream(interfaces.IContentType): ... """application/octet-stream""" Now, we'll create a minimal content object that provide the necessary information:: >>> import zope.interface >>> class Content(object): ... zope.interface.implements(interfaces.IContentTypeAware) ... ... def __init__(self, mimeType, charset=None): ... self.mimeType = mimeType ... self.parameters = {} ... if charset: ... self.parameters["charset"] = charset We can now create examples of both encoded and non-encoded content:: >>> encoded = Content("text/plain", "utf-8") >>> zope.interface.alsoProvides(encoded, ITextPlain) >>> unencoded = Content("application/octet-stream") >>> zope.interface.alsoProvides(unencoded, IApplicationOctetStream) The minimal IContentInfo implementation only exposes the information available to it from the base content object. Let's take a look at the unencoded content first:: >>> from zope.mimetype import contentinfo >>> ci = contentinfo.ContentInfo(unencoded) >>> ci.effectiveMimeType 'application/octet-stream' >>> ci.effectiveParameters {} >>> ci.contentType 'application/octet-stream' For unencoded content, there is never a codec:: >>> print ci.getCodec() None It is also disallowed to try decoding such content:: >>> ci.decode("foo") Traceback (most recent call last): ... ValueError: no matching codec found Attemping to decode data using an uncoded object causes an exception to be raised:: >>> print ci.decode("data") Traceback (most recent call last): ... ValueError: no matching codec found If we try this with encoded data, we get somewhat different behavior:: >>> ci = contentinfo.ContentInfo(encoded) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {'charset': 'utf-8'} >>> ci.contentType 'text/plain;charset=utf-8' The `getCodec()` and `decode()` methods can be used to handle encoded data using the encoding indicated by the ``charset`` parameter. Let's store some UTF-8 data in a variable:: >>> utf8_data = unicode("\xAB\xBB", "iso-8859-1").encode("utf-8") >>> utf8_data '\xc2\xab\xc2\xbb' We want to be able to decode the data using the `IContentInfo` object. Let's try getting the corresponding `ICodec` object using `getCodec()`:: >>> codec = ci.getCodec() Traceback (most recent call last): ... ValueError: unsupported charset: 'utf-8' So, we can't proceed without some further preparation. What we need is to register an `ICharset` for UTF-8. The `ICharset` will need a reference (by name) to a `ICodec` for UTF-8. So let's create those objects and register them:: >>> import codecs >>> from zope.mimetype.i18n import _ >>> class Utf8Codec(object): ... zope.interface.implements(interfaces.ICodec) ... ... name = "utf-8" ... title = _("UTF-8") ... ... def __init__(self): ... ( self.encode, ... self.decode, ... self.reader, ... self.writer ... ) = codecs.lookup(self.name) >>> utf8_codec = Utf8Codec() >>> class Utf8Charset(object): ... zope.interface.implements(interfaces.ICharset) ... ... name = utf8_codec.name ... encoding = name >>> utf8_charset = Utf8Charset() >>> import zope.component >>> zope.component.provideUtility( ... utf8_codec, interfaces.ICodec, utf8_codec.name) >>> zope.component.provideUtility( ... utf8_charset, interfaces.ICharset, utf8_charset.name) Now that that's been initialized, let's try getting the codec again:: >>> codec = ci.getCodec() >>> codec.name 'utf-8' >>> codec.decode(utf8_data) (u'\xab\xbb', 4) We can now check that the `decode()` method of the `IContentInfo` will decode the entire data, returning the Unicode representation of the text:: >>> ci.decode(utf8_data) u'\xab\xbb' Another possibilty, of course, is that you have content that you know is encoded text of some sort, but you don't actually know what encoding it's in:: >>> encoded2 = Content("text/plain") >>> zope.interface.alsoProvides(encoded2, ITextPlain) >>> ci = contentinfo.ContentInfo(encoded2) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {} >>> ci.contentType 'text/plain' >>> ci.getCodec() Traceback (most recent call last): ... ValueError: charset not known It's also possible that the initial content type information for an object is incorrect for some reason. If the browser provides a content type of "text/plain; charset=utf-8", the content will be seen as encoded. A user correcting this content type using UI elements can cause the content to be considered un-encoded. At this point, there should no longer be a charset parameter to the content type, and the content info object should reflect this, though the previous encoding information will be retained in case the content type should be changed to an encoded type in the future. Let's see how this behavior will be exhibited in this API. We'll start by creating some encoded content:: >>> content = Content("text/plain", "utf-8") >>> zope.interface.alsoProvides(content, ITextPlain) We can see that the encoding information is included in the effective MIME type information provided by the content-info object:: >>> ci = contentinfo.ContentInfo(content) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {'charset': 'utf-8'} We now change the content type information for the object:: >>> ifaces = zope.interface.directlyProvidedBy(content) >>> ifaces -= ITextPlain >>> ifaces += IApplicationOctetStream >>> zope.interface.directlyProvides(content, *ifaces) >>> content.mimeType = 'application/octet-stream' At this point, a content type object would provide different information:: >>> ci = contentinfo.ContentInfo(content) >>> ci.effectiveMimeType 'application/octet-stream' >>> ci.effectiveParameters {} The underlying content type parameters still contain the original encoding information, however:: >>> content.parameters {'charset': 'utf-8'} =============================== Events and content-type changes =============================== The `IContentTypeChangedEvent` is fired whenever an object's `IContentTypeInterface` is changed. This includes the cases when a content type interface is applied to an object that doesn't have one, and when the content type interface is removed from an object. Let's start the demonstration by defining a subscriber for the event that simply prints out the information from the event object:: >>> def handler(event): ... print "changed content type interface:" ... print " from:", event.oldContentType ... print " to:", event.newContentType We'll also define a simple content object:: >>> import zope.interface >>> class IContent(zope.interface.Interface): ... pass >>> class Content(object): ... ... zope.interface.implements(IContent) ... ... def __str__(self): ... return "" >>> obj = Content() We'll also need a couple of content type interfaces:: >>> from zope.mimetype import interfaces >>> class ITextPlain(interfaces.IContentTypeEncoded): ... """text/plain""" >>> ITextPlain.setTaggedValue("mimeTypes", ["text/plain"]) >>> ITextPlain.setTaggedValue("extensions", [".txt"]) >>> zope.interface.directlyProvides( ... ITextPlain, interfaces.IContentTypeInterface) >>> class IOctetStream(interfaces.IContentType): ... """application/octet-stream""" >>> IOctetStream.setTaggedValue("mimeTypes", ["application/octet-stream"]) >>> IOctetStream.setTaggedValue("extensions", [".bin"]) >>> zope.interface.directlyProvides( ... IOctetStream, interfaces.IContentTypeInterface) Let's register our subscriber:: >>> import zope.component >>> import zope.component.interfaces >>> zope.component.provideHandler( ... handler, ... (zope.component.interfaces.IObjectEvent,)) Changing the content type interface on an object is handled by the `zope.mimetype.event.changeContentType()` function. Let's import that module and demonstrate that the expected event is fired appropriately:: >>> from zope.mimetype import event Since the object currently has no content type interface, "removing" the interface does not affect the object and the event is not fired:: >>> event.changeContentType(obj, None) Setting a content type interface on an object that doesn't have one will cause the event to be fired, with the `.oldContentType` attribute on the event set to `None`:: >>> event.changeContentType(obj, ITextPlain) changed content type interface: from: None to: Calling the `changeContentType()` function again with the same "new" content type interface causes no change, so the event is not fired again:: >>> event.changeContentType(obj, ITextPlain) Providing a new interface does cause the event to be fired again:: >>> event.changeContentType(obj, IOctetStream) changed content type interface: from: to: Similarly, removing the content type interface triggers the event as well:: >>> event.changeContentType(obj, None) changed content type interface: from: to: None ====================================== MIME type and character set extraction ====================================== The `zope.mimetype.typegetter` module provides a selection of MIME type extractors and charset extractors. These may be used to determine what the MIME type and character set for uploaded data should be. These two interfaces represent the site policy regarding interpreting upload data in the face of missing or inaccurate input. Let's go ahead and import the module:: >>> from zope.mimetype import typegetter MIME types ---------- There are a number of interesting MIME-type extractors: `mimeTypeGetter()` A minimal extractor that never attempts to guess. `mimeTypeGuesser()` An extractor that tries to guess the content type based on the name and data if the input contains no content type information. `smartMimeTypeGuesser()` An extractor that checks the content for a variety of constructs to try and refine the results of the `mimeTypeGuesser()`. This is able to do things like check for XHTML that's labelled as HTML in upload data. `mimeTypeGetter()` ~~~~~~~~~~~~~~~~~~ We'll start with the simplest, which does no content-based guessing at all, but uses the information provided by the browser directly. If the browser did not provide any content-type information, or if it cannot be parsed, the extractor simply asserts a "safe" MIME type of application/octet-stream. (The rationale for selecting this type is that since there's really nothing productive that can be done with it other than download it, it's impossible to mis-interpret the data.) When there's no information at all about the content, the extractor returns None:: >>> print typegetter.mimeTypeGetter() None Providing only the upload filename or data, or both, still produces None, since no guessing is being done:: >>> print typegetter.mimeTypeGetter(name="file.html") None >>> print typegetter.mimeTypeGetter(data="...") None >>> print typegetter.mimeTypeGetter( ... name="file.html", data="...") None If a content type header is available for the input, that is used since that represents explicit input from outside the application server. The major and minor parts of the content type are extracted and returned as a single string:: >>> typegetter.mimeTypeGetter(content_type="text/plain") 'text/plain' >>> typegetter.mimeTypeGetter(content_type="text/plain; charset=utf-8") 'text/plain' If the content-type information is provided but malformed (not in conformance with RFC 2822), it is ignored, since the intent cannot be reliably guessed:: >>> print typegetter.mimeTypeGetter(content_type="foo bar") None This combines with ignoring the other values that may be provided as expected:: >>> print typegetter.mimeTypeGetter( ... name="file.html", data="...", content_type="foo bar") None `mimeTypeGuesser()` ~~~~~~~~~~~~~~~~~~~ A more elaborate extractor that tries to work around completely missing information can be found as the `mimeTypeGuesser()` function. This function will only guess if there is no usable content type information in the input. This extractor can be thought of as having the following pseudo-code:: def mimeTypeGuesser(name=None, data=None, content_type=None): type = mimeTypeGetter(name=name, data=data, content_type=content_type) if type is None: type = guess the content type return type Let's see how this affects the results we saw earlier. When there's no input to use, we still get None:: >>> print typegetter.mimeTypeGuesser() None Providing only the upload filename or data, or both, now produces a non-None guess for common content types:: >>> typegetter.mimeTypeGuesser(name="file.html") 'text/html' >>> typegetter.mimeTypeGuesser(data="...") 'text/html' >>> typegetter.mimeTypeGuesser(name="file.html", data="...") 'text/html' Note that if the filename and data provided separately produce different MIME types, the result of providing both will be one of those types, but which is unspecified:: >>> mt_1 = typegetter.mimeTypeGuesser(name="file.html") >>> mt_1 'text/html' >>> mt_2 = typegetter.mimeTypeGuesser(data="...") >>> mt_2 'text/xml' >>> mt = typegetter.mimeTypeGuesser( ... data="...", name="file.html") >>> mt in (mt_1, mt_2) True If a content type header is available for the input, that is used in the same way as for the `mimeTypeGetter()` function:: >>> typegetter.mimeTypeGuesser(content_type="text/plain") 'text/plain' >>> typegetter.mimeTypeGuesser(content_type="text/plain; charset=utf-8") 'text/plain' If the content-type information is provided but malformed, it is ignored:: >>> print typegetter.mimeTypeGetter(content_type="foo bar") None When combined with values for the filename or content data, those are still used to provide reasonable guesses for the content type:: >>> typegetter.mimeTypeGuesser(name="file.html", content_type="foo bar") 'text/html' >>> typegetter.mimeTypeGuesser( ... data="...", content_type="foo bar") 'text/html' Information from a parsable content-type is still used even if a guess from the data or filename would provide a different or more-refined result:: >>> typegetter.mimeTypeGuesser( ... data="GIF89a...", content_type="application/octet-stream") 'application/octet-stream' `smartMimeTypeGuesser()` ~~~~~~~~~~~~~~~~~~~~~~~~ The `smartMimeTypeGuesser()` function applies more knowledge to the process of determining the MIME-type to use. Essentially, it takes the result of the `mimeTypeGuesser()` function and attempts to refine the content-type based on various heuristics. We still see the basic behavior that no input produces None:: >>> print typegetter.smartMimeTypeGuesser() None An unparsable content-type is still ignored:: >>> print typegetter.smartMimeTypeGuesser(content_type="foo bar") None The interpretation of uploaded data will be different in at least some interesting cases. For instance, the `mimeTypeGuesser()` function provides these results for some XHTML input data:: >>> typegetter.mimeTypeGuesser( ... data="...", ... name="file.html") 'text/html' The smart extractor is able to refine this into more usable data:: >>> typegetter.smartMimeTypeGuesser( ... data="...", ... name="file.html") 'application/xhtml+xml' In this case, the smart extractor has refined the information determined from the filename using information from the uploaded data. The specific approach taken by the extractor is not part of the interface, however. `charsetGetter()` ~~~~~~~~~~~~~~~~~ If you're interested in the character set of textual data, you can use the `charsetGetter` function (which can also be registered as the `ICharsetGetter` utility): The simplest case is when the character set is already specified in the content type. >>> typegetter.charsetGetter(content_type='text/plain; charset=mambo-42') 'mambo-42' Note that the charset name is lowercased, because all the default ICharset and ICharsetCodec utilities are registered for lowercase names. >>> typegetter.charsetGetter(content_type='text/plain; charset=UTF-8') 'utf-8' If it isn't, `charsetGetter` can try to guess by looking at actual data >>> typegetter.charsetGetter(content_type='text/plain', data='just text') 'ascii' >>> typegetter.charsetGetter(content_type='text/plain', data='\xe2\x98\xba') 'utf-8' >>> import codecs >>> typegetter.charsetGetter(data=codecs.BOM_UTF16_BE + '\x12\x34') 'utf-16be' >>> typegetter.charsetGetter(data=codecs.BOM_UTF16_LE + '\x12\x34') 'utf-16le' If the character set cannot be determined, `charsetGetter` returns None. >>> typegetter.charsetGetter(content_type='text/plain', data='\xff') >>> typegetter.charsetGetter() =============================== Source for MIME type interfaces =============================== Some sample interfaces have been created in the zope.mimetype.tests module for use in this test. Let's import them:: >>> from zope.mimetype.tests import ( ... ISampleContentTypeOne, ISampleContentTypeTwo) The source should only include `IContentTypeInterface` interfaces that have been registered. Let's register one of these two interfaces so we can test this:: >>> import zope.component >>> from zope.mimetype.interfaces import IContentTypeInterface >>> zope.component.provideUtility( ... ISampleContentTypeOne, IContentTypeInterface, name="type/one") >>> zope.component.provideUtility( ... ISampleContentTypeOne, IContentTypeInterface, name="type/two") We should see that these interfaces are included in the source:: >>> from zope.mimetype import source >>> s = source.ContentTypeSource() >>> ISampleContentTypeOne in s True >>> ISampleContentTypeTwo in s False Interfaces that do not implement the `IContentTypeInterface` are not included in the source:: >>> import zope.interface >>> class ISomethingElse(zope.interface.Interface): ... """This isn't a content type interface.""" >>> ISomethingElse in s False The source is iterable, so we can get a list of the values:: >>> values = list(s) >>> len(values) 1 >>> values[0] is ISampleContentTypeOne True We can get terms for the allowed values:: >>> terms = source.ContentTypeTerms(s, None) >>> t = terms.getTerm(ISampleContentTypeOne) >>> terms.getValue(t.token) is ISampleContentTypeOne True Interfaces that are not in the source cause an error when a term is requested:: >>> terms.getTerm(ISomethingElse) Traceback (most recent call last): ... LookupError: value is not an element in the source The term provides a token based on the module name of the interface:: >>> t.token 'zope.mimetype.tests.ISampleContentTypeOne' The term also provides the title based on the "title" tagged value from the interface:: >>> t.title u'Type One' Each interface provides a list of MIME types with which the interface is associated. The term object provides access to this list:: >>> t.mimeTypes ['type/one', 'type/foo'] A list of common extensions for files of this type is also available, though it may be empty:: >>> t.extensions [] The term's value, of course, is the interface passed in:: >>> t.value is ISampleContentTypeOne True This extended term API is defined by the `IContentTypeTerm` interface:: >>> from zope.mimetype.interfaces import IContentTypeTerm >>> IContentTypeTerm.providedBy(t) True The value can also be retrieved using the `getValue()` method:: >>> iface = terms.getValue('zope.mimetype.tests.ISampleContentTypeOne') >>> iface is ISampleContentTypeOne True Attempting to retrieve an interface that isn't in the source using the terms object generates a LookupError:: >>> terms.getValue('zope.mimetype.tests.ISampleContentTypeTwo') Traceback (most recent call last): ... LookupError: token does not represent an element in the source Attempting to look up a junk token also generates an error:: >>> terms.getValue('just.some.dotted.name.that.does.not.exist') Traceback (most recent call last): ... LookupError: could not import module for token ============================== TranslatableSourceSelectWidget ============================== TranslatableSourceSelectWidget is a SourceSelectWidget that translates and sorts the choices. We will borrow the boring set up code from the SourceSelectWidget test (source.txt in zope.formlib). >>> import zope.interface >>> import zope.component >>> import zope.schema >>> import zope.schema.interfaces >>> class SourceList(list): ... zope.interface.implements(zope.schema.interfaces.IIterableSource) >>> import zope.publisher.interfaces.browser >>> from zope.browser.interfaces import ITerms >>> from zope.schema.vocabulary import SimpleTerm >>> class ListTerms: ... ... zope.interface.implements(ITerms) ... ... def __init__(self, source, request): ... pass # We don't actually need the source or the request :) ... ... def getTerm(self, value): ... title = unicode(value) ... try: ... token = title.encode('base64').strip() ... except binascii.Error: ... raise LookupError(token) ... return SimpleTerm(value, token=token, title=title) ... ... def getValue(self, token): ... return token.decode('base64') >>> zope.component.provideAdapter( ... ListTerms, ... (SourceList, zope.publisher.interfaces.browser.IBrowserRequest)) >>> dog = zope.schema.Choice( ... __name__ = 'dog', ... title=u"Dogs", ... source=SourceList(['spot', 'bowser', 'prince', 'duchess', 'lassie']), ... ) >>> dog = dog.bind(object()) Now that we have a field and a working source, we can construct and render a widget. >>> from zope.mimetype.widget import TranslatableSourceSelectWidget >>> from zope.publisher.browser import TestRequest >>> request = TestRequest() >>> widget = TranslatableSourceSelectWidget( ... dog, dog.source, request) >>> print widget()
Note that the options are ordered alphabetically. If the field is not required, we will also see a special choice labeled "(nothing selected)" at the top of the list >>> dog.required = False >>> print widget()
The utils module contains various helpers for working with data goverened by MIME content type information, as found in the HTTP Content-Type header: mime types and character sets. The decode function takes a string and an IANA character set name and returns a unicode object decoded from the string, using the codec associated with the character set name. Errors will generally arise from the unicode conversion rather than the mapping of character set to codec, and will be LookupErrors (the character set did not cleanly convert to a codec that Python knows about) or UnicodeDecodeErrors (the string included characters that were not in the range of the codec associated with the character set). >>> original = 'This is an o with a slash through it: \xb8.' >>> charset = 'Latin-7' # Baltic Rim or iso-8859-13 >>> from zope.mimetype import utils >>> utils.decode(original, charset) u'This is an o with a slash through it: \xf8.' >>> utils.decode(original, 'foo bar baz') Traceback (most recent call last): ... LookupError: unknown encoding: foo bar baz >>> utils.decode(original, 'iso-ir-6') # alias for ASCII ... # doctest: +ELLIPSIS Traceback (most recent call last): ... UnicodeDecodeError: 'ascii' codec can't decode... ======= CHANGES ======= 1.3.1 (2010-11-10) ------------------ - No longer depending on `zope.app.form` in `configure.zcml` by using `zope.formlib` instead, where the needed interfaces are living now. 1.3.0 (2010-06-26) ------------------ - Added testing dependency on ``zope.component [test]``. - Use zope.formlib instead of zope.app.form.browser for select widget. - Conform to repository policy. 1.2.0 (2009-12-26) ------------------ - Converted functional tests to unit tests and get rid of all extra test dependencies as a result. - Use the ITerms interface from zope.browser. - Declared missing dependencies, resolved direct dependency on zope.app.publisher. - Import content-type parser from zope.contenttype, adding a dependency on that package. 1.1.2 (2009-05-22) ------------------ - No longer depends on ``zope.app.component``. 1.1.1 (2009-04-03) ------------------ - Fixed wrong package version (version ``1.1.0`` was released as ``0.4.0`` at `pypi` but as ``1.1dev`` at `download.zope.org/distribution`) - Fixed author email and home page address. 1.1.0 (2007-11-01) ------------------ - Package data update. - First public release. 1.0.0 (2007-??-??) ------------------ - Initial release. Keywords: file content mimetype Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Web Environment Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: Zope Public License Classifier: Programming Language :: Python Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Topic :: Internet :: WWW/HTTP Classifier: Framework :: Zope3 zope.mimetype-1.3.1/README.txt000644 000766 000024 00000000304 11466575473 015742 0ustar00macstaff000000 000000 This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information. zope.mimetype-1.3.1/ZopePublicLicense.txt000644 000766 000024 00000004203 11466575473 020366 0ustar00macstaff000000 000000 Zope Public License (ZPL) Version 2.1 ------------------------------------- A copyright notice accompanies this license document that identifies the copyright holders. This license has been certified as open source. It has also been designated as GPL compatible by the Free Software Foundation (FSF). Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions in source code must retain the accompanying copyright notice, this list of conditions, and the following disclaimer. 2. Redistributions in binary form must reproduce the accompanying copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Names of the copyright holders must not be used to endorse or promote products derived from this software without prior written permission from the copyright holders. 4. The right to distribute this software or to use it for any purpose does not give you the right to use Servicemarks (sm) or Trademarks (tm) of the copyright holders. Use of them is covered by separate agreement with the copyright holders. 5. If any files are modified, you must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. Disclaimer THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. zope.mimetype-1.3.1/bootstrap.py000644 000766 000024 00000007350 11466575473 016643 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Bootstrap a buildout-based project Simply run this script in a directory containing a buildout.cfg. The script accepts buildout command-line options, so you can use the -c option to specify an alternate configuration file. $Id: bootstrap.py 113850 2010-06-26 08:24:32Z hannosch $ """ import os, shutil, sys, tempfile, urllib2 from optparse import OptionParser tmpeggs = tempfile.mkdtemp() is_jython = sys.platform.startswith('java') # parsing arguments parser = OptionParser() parser.add_option("-v", "--version", dest="version", help="use a specific zc.buildout version") parser.add_option("-d", "--distribute", action="store_true", dest="distribute", default=False, help="Use Disribute rather than Setuptools.") parser.add_option("-c", None, action="store", dest="config_file", help=("Specify the path to the buildout configuration " "file to be used.")) options, args = parser.parse_args() # if -c was provided, we push it back into args for buildout' main function if options.config_file is not None: args += ['-c', options.config_file] if options.version is not None: VERSION = '==%s' % options.version else: VERSION = '' USE_DISTRIBUTE = options.distribute args = args + ['bootstrap'] try: import pkg_resources import setuptools if not hasattr(pkg_resources, '_distribute'): raise ImportError except ImportError: ez = {} if USE_DISTRIBUTE: exec urllib2.urlopen('http://python-distribute.org/distribute_setup.py' ).read() in ez ez['use_setuptools'](to_dir=tmpeggs, download_delay=0, no_fake=True) else: exec urllib2.urlopen('http://peak.telecommunity.com/dist/ez_setup.py' ).read() in ez ez['use_setuptools'](to_dir=tmpeggs, download_delay=0) reload(sys.modules['pkg_resources']) import pkg_resources if sys.platform == 'win32': def quote(c): if ' ' in c: return '"%s"' % c # work around spawn lamosity on windows else: return c else: def quote (c): return c cmd = 'from setuptools.command.easy_install import main; main()' ws = pkg_resources.working_set if USE_DISTRIBUTE: requirement = 'distribute' else: requirement = 'setuptools' if is_jython: import subprocess assert subprocess.Popen([sys.executable] + ['-c', quote(cmd), '-mqNxd', quote(tmpeggs), 'zc.buildout' + VERSION], env=dict(os.environ, PYTHONPATH= ws.find(pkg_resources.Requirement.parse(requirement)).location ), ).wait() == 0 else: assert os.spawnle( os.P_WAIT, sys.executable, quote (sys.executable), '-c', quote (cmd), '-mqNxd', quote (tmpeggs), 'zc.buildout' + VERSION, dict(os.environ, PYTHONPATH= ws.find(pkg_resources.Requirement.parse(requirement)).location ), ) == 0 ws.add_entry(tmpeggs) ws.require('zc.buildout' + VERSION) import zc.buildout.buildout zc.buildout.buildout.main(args) shutil.rmtree(tmpeggs) zope.mimetype-1.3.1/buildout.cfg000644 000766 000024 00000000600 11466575473 016553 0ustar00macstaff000000 000000 [buildout] develop = . parts = test coverage-test coverage-report [test] recipe = zc.recipe.testrunner eggs = zope.mimetype [test] [coverage-test] recipe = zc.recipe.testrunner eggs = ${test:eggs} defaults = ['--coverage', '../../coverage'] [coverage-report] recipe = zc.recipe.egg eggs = z3c.coverage scripts = coverage=coverage-report arguments = ('coverage', 'coverage/report') zope.mimetype-1.3.1/setup.cfg000644 000766 000024 00000000073 11466575503 016062 0ustar00macstaff000000 000000 [egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 zope.mimetype-1.3.1/setup.py000644 000766 000024 00000007343 11466575473 015770 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2006 Zope Foundation and Contributors. # All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## # This package is developed by the Zope Toolkit project, documented here: # http://docs.zope.org/zopetoolkit # When developing and releasing this package, please follow the documented # Zope Toolkit policies as described by this documentation. ############################################################################## """Setup for zope.mimetype package $Id: setup.py 118320 2010-11-10 20:04:40Z icemac $ """ import os from setuptools import setup, find_packages def read(*rnames): return open(os.path.join(os.path.dirname(__file__), *rnames)).read() version = '1.3.1' setup(name='zope.mimetype', version=version, author='Zope Foundation and Contributors', author_email='zope-dev@zope.org', description = "A simple package for working with MIME content types", long_description=( read('README.txt') + '\n\n' + '.. contents::' + '\n\n' + read('src', 'zope', 'mimetype', 'README.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'retrieving_mime_types.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'codec.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'constraints.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'contentinfo.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'event.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'typegetter.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'source.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'widget.txt') + '\n\n' + read('src', 'zope', 'mimetype', 'utils.txt') + '\n\n' + read('CHANGES.txt') ), keywords = "file content mimetype", classifiers = [ 'Development Status :: 5 - Production/Stable', 'Environment :: Web Environment', 'Intended Audience :: Developers', 'License :: OSI Approved :: Zope Public License', 'Programming Language :: Python', 'Natural Language :: English', 'Operating System :: OS Independent', 'Topic :: Internet :: WWW/HTTP', 'Framework :: Zope3'], url='http://pypi.python.org/pypi/zope.mimetype', license='ZPL 2.1', packages=find_packages('src'), package_dir = {'': 'src'}, namespace_packages=['zope'], extras_require = dict(test=['zope.component [test]']), install_requires=['setuptools', 'zope.browser', 'zope.browserresource', 'zope.component', 'zope.configuration', 'zope.contenttype>=3.5.0dev', 'zope.event', 'zope.formlib>=4.0', 'zope.i18n', 'zope.i18nmessageid', 'zope.interface', 'zope.publisher', 'zope.schema', 'zope.security', ], include_package_data = True, zip_safe = False, ) zope.mimetype-1.3.1/src/000755 000766 000024 00000000000 11466575503 015030 5ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope/000755 000766 000024 00000000000 11466575503 016005 5ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope.mimetype.egg-info/000755 000766 000024 00000000000 11466575503 021327 5ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope.mimetype.egg-info/PKG-INFO000644 000766 000024 00000140375 11466575500 022433 0ustar00macstaff000000 000000 Metadata-Version: 1.0 Name: zope.mimetype Version: 1.3.1 Summary: A simple package for working with MIME content types Home-page: http://pypi.python.org/pypi/zope.mimetype Author: Zope Foundation and Contributors Author-email: zope-dev@zope.org License: ZPL 2.1 Description: This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information. .. contents:: ============================ The Zope MIME Infrastructure ============================ This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information. The basic idea behind this is that content objects should provide an interface based on the actual content type they implement. For example, objects that represent text/xml or application/xml documents should be marked mark with the `IContentTypeXml` interface. This can allow additional views to be registered based on the content type, or subscribers may be registered to perform other actions based on the content type. One aspect of the content type that's important for all documents is that the content type interface determines whether the object data is interpreted as an encoded text document. Encoded text documents, in particular, can be decoded to obtain a single Unicode string. The content type intefaces for encoded text must derive from `IContentTypeEncoded`. (All content type interfaces derive from `IContentType` and directly provide `IContentTypeInterface`.) The default configuration provides direct support for a variety of common document types found in office environments. Supported lookups ----------------- Several different queries are supported by this package: - Given a MIME type expressed as a string, the associated interface, if any, can be retrieved using:: # `mimeType` is the MIME type as a string interface = queryUtility(IContentTypeInterface, mimeType) - Given a charset name, the associated `ICodec` instance can be retrieved using:: # `charsetName` is the charset name as a string codec = queryUtility(ICharsetCodec, charsetName) - Given a codec, the preferred charset name can be retrieved using:: # `codec` is an `ICodec` instance: charsetName = getUtility(ICodecPreferredCharset, codec.name).name - Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable MIME type can be made using:: # `filename` is a suggested file name, or None # `data` is uploaded data, or None # `content_type` is a Content-Type header value, or None # mimeType = getUtility(IMimeTypeGetter)( name=filename, data=data, content_type=content_type) - Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable charset name can be made using:: # `filename` is a suggested file name, or None # `data` is uploaded data, or None # `content_type` is a Content-Type header value, or None # charsetName = getUtility(ICharsetGetter)( name=filename, data=data, content_type=content_type) =================================== Retrieving Content Type Information =================================== MIME Types ---------- We'll start by initializing the interfaces and registrations for the content type interfaces. This is normally done via ZCML. >>> from zope.mimetype import types >>> types.setup() A utility is used to retrieve MIME types. >>> from zope import component >>> from zope.mimetype import typegetter >>> from zope.mimetype.interfaces import IMimeTypeGetter >>> component.provideUtility(typegetter.smartMimeTypeGuesser, ... provides=IMimeTypeGetter) >>> mime_getter = component.getUtility(IMimeTypeGetter) To map a particular file name, file contents, and content type to a MIME type. >>> mime_getter(name='file.txt', data='A text file.', ... content_type='text/plain') 'text/plain' In the default implementation if not enough information is given to discern a MIME type, None is returned. >>> mime_getter() is None True Character Sets -------------- A utility is also used to retrieve character sets (charsets). >>> from zope.mimetype.interfaces import ICharsetGetter >>> component.provideUtility(typegetter.charsetGetter, ... provides=ICharsetGetter) >>> charset_getter = component.getUtility(ICharsetGetter) To map a particular file name, file contents, and content type to a charset. >>> charset_getter(name='file.txt', data='This is a text file.', ... content_type='text/plain;charset=ascii') 'ascii' In the default implementation if not enough information is given to discern a charset, None is returned. >>> charset_getter() is None True Finding Interfaces ------------------ Given a MIME type we need to be able to find the appropriate interface. >>> from zope.mimetype.interfaces import IContentTypeInterface >>> component.getUtility(IContentTypeInterface, name=u'text/plain') It is also possible to enumerate all content type interfaces. >>> utilities = list(component.getUtilitiesFor(IContentTypeInterface)) If you want to find an interface from a MIME string, you can use the utilityies. >>> component.getUtility(IContentTypeInterface, name='text/plain') ============== Codec handling ============== We can create codecs programatically. Codecs are registered as utilities for ICodec with the name of their python codec. >>> from zope import component >>> from zope.mimetype.interfaces import ICodec >>> from zope.mimetype.codec import addCodec >>> sorted(component.getUtilitiesFor(ICodec)) [] >>> addCodec('iso8859-1', 'Western (ISO-8859-1)') >>> codec = component.getUtility(ICodec, name='iso8859-1') >>> codec >>> codec.name 'iso8859-1' >>> addCodec('utf-8', 'Unicode (UTF-8)') >>> codec2 = component.getUtility(ICodec, name='utf-8') We can programmatically add charsets to a given codec. This registers each charset as a named utility for ICharset. It also registers the codec as a utility for ICharsetCodec with the name of the charset. >>> from zope.mimetype.codec import addCharset >>> from zope.mimetype.interfaces import ICharset, ICharsetCodec >>> sorted(component.getUtilitiesFor(ICharset)) [] >>> sorted(component.getUtilitiesFor(ICharsetCodec)) [] >>> addCharset(codec.name, 'latin1') >>> charset = component.getUtility(ICharset, name='latin1') >>> charset >>> charset.name 'latin1' >>> component.getUtility(ICharsetCodec, name='latin1') is codec True When adding a charset we can state that we want that charset to be the preferred charset for its codec. >>> addCharset(codec.name, 'iso8859-1', preferred=True) >>> addCharset(codec2.name, 'utf-8', preferred=True) A codec can have at most one preferred charset. >>> addCharset(codec.name, 'test', preferred=True) Traceback (most recent call last): ... ValueError: Codec already has a preferred charset. Preferred charsets are registered as utilities for ICodecPreferredCharset under the name of the python codec. >>> from zope.mimetype.interfaces import ICodecPreferredCharset >>> preferred = component.getUtility(ICodecPreferredCharset, name='iso8859-1') >>> preferred >>> preferred.name 'iso8859-1' >>> sorted(component.getUtilitiesFor(ICodecPreferredCharset)) [(u'iso8859-1', ), (u'utf-8', )] We can look up a codec by the name of its charset: >>> component.getUtility(ICharsetCodec, name='latin1') is codec True >>> component.getUtility(ICharsetCodec, name='utf-8') is codec2 True Or we can look up all codecs: >>> sorted(component.getUtilitiesFor(ICharsetCodec)) [(u'iso8859-1', ), (u'latin1', ), (u'test', ), (u'utf-8', )] =================================== Constraint Functions for Interfaces =================================== The `zope.mimetype.interfaces` module defines interfaces that use some helper functions to define constraints on the accepted data. These helpers are used to determine whether values conform to the what's allowed for parts of a MIME type specification and other parts of a Content-Type header as specified in RFC 2045. Single Token ------------ The first is the simplest: the `tokenConstraint()` function returns `True` if the ASCII string it is passed conforms to the `token` production in section 5.1 of the RFC. Let's import the function:: >>> from zope.mimetype.interfaces import tokenConstraint Typical token are the major and minor parts of the MIME type and the parameter names for the Content-Type header. The function should return `True` for these values:: >>> tokenConstraint("text") True >>> tokenConstraint("plain") True >>> tokenConstraint("charset") True The function should also return `True` for unusual but otherwise normal token that may be used in some situations:: >>> tokenConstraint("not-your-fathers-token") True It must also allow extension tokens and vendor-specific tokens:: >>> tokenConstraint("x-magic") True >>> tokenConstraint("vnd.zope.special-data") True Since we expect input handlers to normalize values to lower case, upper case text is not allowed:: >>> tokenConstraint("Text") False Non-ASCII text is also not allowed:: >>> tokenConstraint("\x80") False >>> tokenConstraint("\xC8") False >>> tokenConstraint("\xFF") False Note that lots of characters are allowed in tokens, and there are no constraints that the token "look like" something a person would want to read:: >>> tokenConstraint(".-.-.-.") True Other characters are disallowed, however, including all forms of whitespace:: >>> tokenConstraint("foo bar") False >>> tokenConstraint("foo\tbar") False >>> tokenConstraint("foo\nbar") False >>> tokenConstraint("foo\rbar") False >>> tokenConstraint("foo\x7Fbar") False Whitespace before or after the token is not accepted either:: >>> tokenConstraint(" text") False >>> tokenConstraint("plain ") False Other disallowed characters are defined in the `tspecials` production from the RFC (also in section 5.1):: >>> tokenConstraint("(") False >>> tokenConstraint(")") False >>> tokenConstraint("<") False >>> tokenConstraint(">") False >>> tokenConstraint("@") False >>> tokenConstraint(",") False >>> tokenConstraint(";") False >>> tokenConstraint(":") False >>> tokenConstraint("\\") False >>> tokenConstraint('"') False >>> tokenConstraint("/") False >>> tokenConstraint("[") False >>> tokenConstraint("]") False >>> tokenConstraint("?") False >>> tokenConstraint("=") False A token must contain at least one character, so `tokenConstraint()` returns false for an empty string:: >>> tokenConstraint("") False MIME Type --------- A MIME type is specified using two tokens separated by a slash; whitespace between the tokens and the slash must be normalized away in the input handler. The `mimeTypeConstraint()` function is available to test a normalized MIME type value; let's import that function now:: >>> from zope.mimetype.interfaces import mimeTypeConstraint Let's test some common MIME types to make sure the function isn't obviously insane:: >>> mimeTypeConstraint("text/plain") True >>> mimeTypeConstraint("application/xml") True >>> mimeTypeConstraint("image/svg+xml") True If parts of the MIME type are missing, it isn't accepted:: >>> mimeTypeConstraint("text") False >>> mimeTypeConstraint("text/") False >>> mimeTypeConstraint("/plain") False As for individual tokens, whitespace is not allowed:: >>> mimeTypeConstraint("foo bar/plain") False >>> mimeTypeConstraint("text/foo bar") False Whitespace is not accepted around the slash either:: >>> mimeTypeConstraint("text /plain") False >>> mimeTypeConstraint("text/ plain") False Surrounding whitespace is also not accepted:: >>> mimeTypeConstraint(" text/plain") False >>> mimeTypeConstraint("text/plain ") False =================================== Minimal IContentInfo Implementation =================================== The `zope.mimetype.contentinfo` module provides a minimal `IContentInfo` implementation that adds no information to what's provided by a content object. This represents the most conservative content-type policy that might be useful. Let's take a look at how this operates by creating a couple of concrete content-type interfaces:: >>> from zope.mimetype import interfaces >>> class ITextPlain(interfaces.IContentTypeEncoded): ... """text/plain""" >>> class IApplicationOctetStream(interfaces.IContentType): ... """application/octet-stream""" Now, we'll create a minimal content object that provide the necessary information:: >>> import zope.interface >>> class Content(object): ... zope.interface.implements(interfaces.IContentTypeAware) ... ... def __init__(self, mimeType, charset=None): ... self.mimeType = mimeType ... self.parameters = {} ... if charset: ... self.parameters["charset"] = charset We can now create examples of both encoded and non-encoded content:: >>> encoded = Content("text/plain", "utf-8") >>> zope.interface.alsoProvides(encoded, ITextPlain) >>> unencoded = Content("application/octet-stream") >>> zope.interface.alsoProvides(unencoded, IApplicationOctetStream) The minimal IContentInfo implementation only exposes the information available to it from the base content object. Let's take a look at the unencoded content first:: >>> from zope.mimetype import contentinfo >>> ci = contentinfo.ContentInfo(unencoded) >>> ci.effectiveMimeType 'application/octet-stream' >>> ci.effectiveParameters {} >>> ci.contentType 'application/octet-stream' For unencoded content, there is never a codec:: >>> print ci.getCodec() None It is also disallowed to try decoding such content:: >>> ci.decode("foo") Traceback (most recent call last): ... ValueError: no matching codec found Attemping to decode data using an uncoded object causes an exception to be raised:: >>> print ci.decode("data") Traceback (most recent call last): ... ValueError: no matching codec found If we try this with encoded data, we get somewhat different behavior:: >>> ci = contentinfo.ContentInfo(encoded) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {'charset': 'utf-8'} >>> ci.contentType 'text/plain;charset=utf-8' The `getCodec()` and `decode()` methods can be used to handle encoded data using the encoding indicated by the ``charset`` parameter. Let's store some UTF-8 data in a variable:: >>> utf8_data = unicode("\xAB\xBB", "iso-8859-1").encode("utf-8") >>> utf8_data '\xc2\xab\xc2\xbb' We want to be able to decode the data using the `IContentInfo` object. Let's try getting the corresponding `ICodec` object using `getCodec()`:: >>> codec = ci.getCodec() Traceback (most recent call last): ... ValueError: unsupported charset: 'utf-8' So, we can't proceed without some further preparation. What we need is to register an `ICharset` for UTF-8. The `ICharset` will need a reference (by name) to a `ICodec` for UTF-8. So let's create those objects and register them:: >>> import codecs >>> from zope.mimetype.i18n import _ >>> class Utf8Codec(object): ... zope.interface.implements(interfaces.ICodec) ... ... name = "utf-8" ... title = _("UTF-8") ... ... def __init__(self): ... ( self.encode, ... self.decode, ... self.reader, ... self.writer ... ) = codecs.lookup(self.name) >>> utf8_codec = Utf8Codec() >>> class Utf8Charset(object): ... zope.interface.implements(interfaces.ICharset) ... ... name = utf8_codec.name ... encoding = name >>> utf8_charset = Utf8Charset() >>> import zope.component >>> zope.component.provideUtility( ... utf8_codec, interfaces.ICodec, utf8_codec.name) >>> zope.component.provideUtility( ... utf8_charset, interfaces.ICharset, utf8_charset.name) Now that that's been initialized, let's try getting the codec again:: >>> codec = ci.getCodec() >>> codec.name 'utf-8' >>> codec.decode(utf8_data) (u'\xab\xbb', 4) We can now check that the `decode()` method of the `IContentInfo` will decode the entire data, returning the Unicode representation of the text:: >>> ci.decode(utf8_data) u'\xab\xbb' Another possibilty, of course, is that you have content that you know is encoded text of some sort, but you don't actually know what encoding it's in:: >>> encoded2 = Content("text/plain") >>> zope.interface.alsoProvides(encoded2, ITextPlain) >>> ci = contentinfo.ContentInfo(encoded2) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {} >>> ci.contentType 'text/plain' >>> ci.getCodec() Traceback (most recent call last): ... ValueError: charset not known It's also possible that the initial content type information for an object is incorrect for some reason. If the browser provides a content type of "text/plain; charset=utf-8", the content will be seen as encoded. A user correcting this content type using UI elements can cause the content to be considered un-encoded. At this point, there should no longer be a charset parameter to the content type, and the content info object should reflect this, though the previous encoding information will be retained in case the content type should be changed to an encoded type in the future. Let's see how this behavior will be exhibited in this API. We'll start by creating some encoded content:: >>> content = Content("text/plain", "utf-8") >>> zope.interface.alsoProvides(content, ITextPlain) We can see that the encoding information is included in the effective MIME type information provided by the content-info object:: >>> ci = contentinfo.ContentInfo(content) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {'charset': 'utf-8'} We now change the content type information for the object:: >>> ifaces = zope.interface.directlyProvidedBy(content) >>> ifaces -= ITextPlain >>> ifaces += IApplicationOctetStream >>> zope.interface.directlyProvides(content, *ifaces) >>> content.mimeType = 'application/octet-stream' At this point, a content type object would provide different information:: >>> ci = contentinfo.ContentInfo(content) >>> ci.effectiveMimeType 'application/octet-stream' >>> ci.effectiveParameters {} The underlying content type parameters still contain the original encoding information, however:: >>> content.parameters {'charset': 'utf-8'} =============================== Events and content-type changes =============================== The `IContentTypeChangedEvent` is fired whenever an object's `IContentTypeInterface` is changed. This includes the cases when a content type interface is applied to an object that doesn't have one, and when the content type interface is removed from an object. Let's start the demonstration by defining a subscriber for the event that simply prints out the information from the event object:: >>> def handler(event): ... print "changed content type interface:" ... print " from:", event.oldContentType ... print " to:", event.newContentType We'll also define a simple content object:: >>> import zope.interface >>> class IContent(zope.interface.Interface): ... pass >>> class Content(object): ... ... zope.interface.implements(IContent) ... ... def __str__(self): ... return "" >>> obj = Content() We'll also need a couple of content type interfaces:: >>> from zope.mimetype import interfaces >>> class ITextPlain(interfaces.IContentTypeEncoded): ... """text/plain""" >>> ITextPlain.setTaggedValue("mimeTypes", ["text/plain"]) >>> ITextPlain.setTaggedValue("extensions", [".txt"]) >>> zope.interface.directlyProvides( ... ITextPlain, interfaces.IContentTypeInterface) >>> class IOctetStream(interfaces.IContentType): ... """application/octet-stream""" >>> IOctetStream.setTaggedValue("mimeTypes", ["application/octet-stream"]) >>> IOctetStream.setTaggedValue("extensions", [".bin"]) >>> zope.interface.directlyProvides( ... IOctetStream, interfaces.IContentTypeInterface) Let's register our subscriber:: >>> import zope.component >>> import zope.component.interfaces >>> zope.component.provideHandler( ... handler, ... (zope.component.interfaces.IObjectEvent,)) Changing the content type interface on an object is handled by the `zope.mimetype.event.changeContentType()` function. Let's import that module and demonstrate that the expected event is fired appropriately:: >>> from zope.mimetype import event Since the object currently has no content type interface, "removing" the interface does not affect the object and the event is not fired:: >>> event.changeContentType(obj, None) Setting a content type interface on an object that doesn't have one will cause the event to be fired, with the `.oldContentType` attribute on the event set to `None`:: >>> event.changeContentType(obj, ITextPlain) changed content type interface: from: None to: Calling the `changeContentType()` function again with the same "new" content type interface causes no change, so the event is not fired again:: >>> event.changeContentType(obj, ITextPlain) Providing a new interface does cause the event to be fired again:: >>> event.changeContentType(obj, IOctetStream) changed content type interface: from: to: Similarly, removing the content type interface triggers the event as well:: >>> event.changeContentType(obj, None) changed content type interface: from: to: None ====================================== MIME type and character set extraction ====================================== The `zope.mimetype.typegetter` module provides a selection of MIME type extractors and charset extractors. These may be used to determine what the MIME type and character set for uploaded data should be. These two interfaces represent the site policy regarding interpreting upload data in the face of missing or inaccurate input. Let's go ahead and import the module:: >>> from zope.mimetype import typegetter MIME types ---------- There are a number of interesting MIME-type extractors: `mimeTypeGetter()` A minimal extractor that never attempts to guess. `mimeTypeGuesser()` An extractor that tries to guess the content type based on the name and data if the input contains no content type information. `smartMimeTypeGuesser()` An extractor that checks the content for a variety of constructs to try and refine the results of the `mimeTypeGuesser()`. This is able to do things like check for XHTML that's labelled as HTML in upload data. `mimeTypeGetter()` ~~~~~~~~~~~~~~~~~~ We'll start with the simplest, which does no content-based guessing at all, but uses the information provided by the browser directly. If the browser did not provide any content-type information, or if it cannot be parsed, the extractor simply asserts a "safe" MIME type of application/octet-stream. (The rationale for selecting this type is that since there's really nothing productive that can be done with it other than download it, it's impossible to mis-interpret the data.) When there's no information at all about the content, the extractor returns None:: >>> print typegetter.mimeTypeGetter() None Providing only the upload filename or data, or both, still produces None, since no guessing is being done:: >>> print typegetter.mimeTypeGetter(name="file.html") None >>> print typegetter.mimeTypeGetter(data="...") None >>> print typegetter.mimeTypeGetter( ... name="file.html", data="...") None If a content type header is available for the input, that is used since that represents explicit input from outside the application server. The major and minor parts of the content type are extracted and returned as a single string:: >>> typegetter.mimeTypeGetter(content_type="text/plain") 'text/plain' >>> typegetter.mimeTypeGetter(content_type="text/plain; charset=utf-8") 'text/plain' If the content-type information is provided but malformed (not in conformance with RFC 2822), it is ignored, since the intent cannot be reliably guessed:: >>> print typegetter.mimeTypeGetter(content_type="foo bar") None This combines with ignoring the other values that may be provided as expected:: >>> print typegetter.mimeTypeGetter( ... name="file.html", data="...", content_type="foo bar") None `mimeTypeGuesser()` ~~~~~~~~~~~~~~~~~~~ A more elaborate extractor that tries to work around completely missing information can be found as the `mimeTypeGuesser()` function. This function will only guess if there is no usable content type information in the input. This extractor can be thought of as having the following pseudo-code:: def mimeTypeGuesser(name=None, data=None, content_type=None): type = mimeTypeGetter(name=name, data=data, content_type=content_type) if type is None: type = guess the content type return type Let's see how this affects the results we saw earlier. When there's no input to use, we still get None:: >>> print typegetter.mimeTypeGuesser() None Providing only the upload filename or data, or both, now produces a non-None guess for common content types:: >>> typegetter.mimeTypeGuesser(name="file.html") 'text/html' >>> typegetter.mimeTypeGuesser(data="...") 'text/html' >>> typegetter.mimeTypeGuesser(name="file.html", data="...") 'text/html' Note that if the filename and data provided separately produce different MIME types, the result of providing both will be one of those types, but which is unspecified:: >>> mt_1 = typegetter.mimeTypeGuesser(name="file.html") >>> mt_1 'text/html' >>> mt_2 = typegetter.mimeTypeGuesser(data="...") >>> mt_2 'text/xml' >>> mt = typegetter.mimeTypeGuesser( ... data="...", name="file.html") >>> mt in (mt_1, mt_2) True If a content type header is available for the input, that is used in the same way as for the `mimeTypeGetter()` function:: >>> typegetter.mimeTypeGuesser(content_type="text/plain") 'text/plain' >>> typegetter.mimeTypeGuesser(content_type="text/plain; charset=utf-8") 'text/plain' If the content-type information is provided but malformed, it is ignored:: >>> print typegetter.mimeTypeGetter(content_type="foo bar") None When combined with values for the filename or content data, those are still used to provide reasonable guesses for the content type:: >>> typegetter.mimeTypeGuesser(name="file.html", content_type="foo bar") 'text/html' >>> typegetter.mimeTypeGuesser( ... data="...", content_type="foo bar") 'text/html' Information from a parsable content-type is still used even if a guess from the data or filename would provide a different or more-refined result:: >>> typegetter.mimeTypeGuesser( ... data="GIF89a...", content_type="application/octet-stream") 'application/octet-stream' `smartMimeTypeGuesser()` ~~~~~~~~~~~~~~~~~~~~~~~~ The `smartMimeTypeGuesser()` function applies more knowledge to the process of determining the MIME-type to use. Essentially, it takes the result of the `mimeTypeGuesser()` function and attempts to refine the content-type based on various heuristics. We still see the basic behavior that no input produces None:: >>> print typegetter.smartMimeTypeGuesser() None An unparsable content-type is still ignored:: >>> print typegetter.smartMimeTypeGuesser(content_type="foo bar") None The interpretation of uploaded data will be different in at least some interesting cases. For instance, the `mimeTypeGuesser()` function provides these results for some XHTML input data:: >>> typegetter.mimeTypeGuesser( ... data="...", ... name="file.html") 'text/html' The smart extractor is able to refine this into more usable data:: >>> typegetter.smartMimeTypeGuesser( ... data="...", ... name="file.html") 'application/xhtml+xml' In this case, the smart extractor has refined the information determined from the filename using information from the uploaded data. The specific approach taken by the extractor is not part of the interface, however. `charsetGetter()` ~~~~~~~~~~~~~~~~~ If you're interested in the character set of textual data, you can use the `charsetGetter` function (which can also be registered as the `ICharsetGetter` utility): The simplest case is when the character set is already specified in the content type. >>> typegetter.charsetGetter(content_type='text/plain; charset=mambo-42') 'mambo-42' Note that the charset name is lowercased, because all the default ICharset and ICharsetCodec utilities are registered for lowercase names. >>> typegetter.charsetGetter(content_type='text/plain; charset=UTF-8') 'utf-8' If it isn't, `charsetGetter` can try to guess by looking at actual data >>> typegetter.charsetGetter(content_type='text/plain', data='just text') 'ascii' >>> typegetter.charsetGetter(content_type='text/plain', data='\xe2\x98\xba') 'utf-8' >>> import codecs >>> typegetter.charsetGetter(data=codecs.BOM_UTF16_BE + '\x12\x34') 'utf-16be' >>> typegetter.charsetGetter(data=codecs.BOM_UTF16_LE + '\x12\x34') 'utf-16le' If the character set cannot be determined, `charsetGetter` returns None. >>> typegetter.charsetGetter(content_type='text/plain', data='\xff') >>> typegetter.charsetGetter() =============================== Source for MIME type interfaces =============================== Some sample interfaces have been created in the zope.mimetype.tests module for use in this test. Let's import them:: >>> from zope.mimetype.tests import ( ... ISampleContentTypeOne, ISampleContentTypeTwo) The source should only include `IContentTypeInterface` interfaces that have been registered. Let's register one of these two interfaces so we can test this:: >>> import zope.component >>> from zope.mimetype.interfaces import IContentTypeInterface >>> zope.component.provideUtility( ... ISampleContentTypeOne, IContentTypeInterface, name="type/one") >>> zope.component.provideUtility( ... ISampleContentTypeOne, IContentTypeInterface, name="type/two") We should see that these interfaces are included in the source:: >>> from zope.mimetype import source >>> s = source.ContentTypeSource() >>> ISampleContentTypeOne in s True >>> ISampleContentTypeTwo in s False Interfaces that do not implement the `IContentTypeInterface` are not included in the source:: >>> import zope.interface >>> class ISomethingElse(zope.interface.Interface): ... """This isn't a content type interface.""" >>> ISomethingElse in s False The source is iterable, so we can get a list of the values:: >>> values = list(s) >>> len(values) 1 >>> values[0] is ISampleContentTypeOne True We can get terms for the allowed values:: >>> terms = source.ContentTypeTerms(s, None) >>> t = terms.getTerm(ISampleContentTypeOne) >>> terms.getValue(t.token) is ISampleContentTypeOne True Interfaces that are not in the source cause an error when a term is requested:: >>> terms.getTerm(ISomethingElse) Traceback (most recent call last): ... LookupError: value is not an element in the source The term provides a token based on the module name of the interface:: >>> t.token 'zope.mimetype.tests.ISampleContentTypeOne' The term also provides the title based on the "title" tagged value from the interface:: >>> t.title u'Type One' Each interface provides a list of MIME types with which the interface is associated. The term object provides access to this list:: >>> t.mimeTypes ['type/one', 'type/foo'] A list of common extensions for files of this type is also available, though it may be empty:: >>> t.extensions [] The term's value, of course, is the interface passed in:: >>> t.value is ISampleContentTypeOne True This extended term API is defined by the `IContentTypeTerm` interface:: >>> from zope.mimetype.interfaces import IContentTypeTerm >>> IContentTypeTerm.providedBy(t) True The value can also be retrieved using the `getValue()` method:: >>> iface = terms.getValue('zope.mimetype.tests.ISampleContentTypeOne') >>> iface is ISampleContentTypeOne True Attempting to retrieve an interface that isn't in the source using the terms object generates a LookupError:: >>> terms.getValue('zope.mimetype.tests.ISampleContentTypeTwo') Traceback (most recent call last): ... LookupError: token does not represent an element in the source Attempting to look up a junk token also generates an error:: >>> terms.getValue('just.some.dotted.name.that.does.not.exist') Traceback (most recent call last): ... LookupError: could not import module for token ============================== TranslatableSourceSelectWidget ============================== TranslatableSourceSelectWidget is a SourceSelectWidget that translates and sorts the choices. We will borrow the boring set up code from the SourceSelectWidget test (source.txt in zope.formlib). >>> import zope.interface >>> import zope.component >>> import zope.schema >>> import zope.schema.interfaces >>> class SourceList(list): ... zope.interface.implements(zope.schema.interfaces.IIterableSource) >>> import zope.publisher.interfaces.browser >>> from zope.browser.interfaces import ITerms >>> from zope.schema.vocabulary import SimpleTerm >>> class ListTerms: ... ... zope.interface.implements(ITerms) ... ... def __init__(self, source, request): ... pass # We don't actually need the source or the request :) ... ... def getTerm(self, value): ... title = unicode(value) ... try: ... token = title.encode('base64').strip() ... except binascii.Error: ... raise LookupError(token) ... return SimpleTerm(value, token=token, title=title) ... ... def getValue(self, token): ... return token.decode('base64') >>> zope.component.provideAdapter( ... ListTerms, ... (SourceList, zope.publisher.interfaces.browser.IBrowserRequest)) >>> dog = zope.schema.Choice( ... __name__ = 'dog', ... title=u"Dogs", ... source=SourceList(['spot', 'bowser', 'prince', 'duchess', 'lassie']), ... ) >>> dog = dog.bind(object()) Now that we have a field and a working source, we can construct and render a widget. >>> from zope.mimetype.widget import TranslatableSourceSelectWidget >>> from zope.publisher.browser import TestRequest >>> request = TestRequest() >>> widget = TranslatableSourceSelectWidget( ... dog, dog.source, request) >>> print widget()
Note that the options are ordered alphabetically. If the field is not required, we will also see a special choice labeled "(nothing selected)" at the top of the list >>> dog.required = False >>> print widget()
The utils module contains various helpers for working with data goverened by MIME content type information, as found in the HTTP Content-Type header: mime types and character sets. The decode function takes a string and an IANA character set name and returns a unicode object decoded from the string, using the codec associated with the character set name. Errors will generally arise from the unicode conversion rather than the mapping of character set to codec, and will be LookupErrors (the character set did not cleanly convert to a codec that Python knows about) or UnicodeDecodeErrors (the string included characters that were not in the range of the codec associated with the character set). >>> original = 'This is an o with a slash through it: \xb8.' >>> charset = 'Latin-7' # Baltic Rim or iso-8859-13 >>> from zope.mimetype import utils >>> utils.decode(original, charset) u'This is an o with a slash through it: \xf8.' >>> utils.decode(original, 'foo bar baz') Traceback (most recent call last): ... LookupError: unknown encoding: foo bar baz >>> utils.decode(original, 'iso-ir-6') # alias for ASCII ... # doctest: +ELLIPSIS Traceback (most recent call last): ... UnicodeDecodeError: 'ascii' codec can't decode... ======= CHANGES ======= 1.3.1 (2010-11-10) ------------------ - No longer depending on `zope.app.form` in `configure.zcml` by using `zope.formlib` instead, where the needed interfaces are living now. 1.3.0 (2010-06-26) ------------------ - Added testing dependency on ``zope.component [test]``. - Use zope.formlib instead of zope.app.form.browser for select widget. - Conform to repository policy. 1.2.0 (2009-12-26) ------------------ - Converted functional tests to unit tests and get rid of all extra test dependencies as a result. - Use the ITerms interface from zope.browser. - Declared missing dependencies, resolved direct dependency on zope.app.publisher. - Import content-type parser from zope.contenttype, adding a dependency on that package. 1.1.2 (2009-05-22) ------------------ - No longer depends on ``zope.app.component``. 1.1.1 (2009-04-03) ------------------ - Fixed wrong package version (version ``1.1.0`` was released as ``0.4.0`` at `pypi` but as ``1.1dev`` at `download.zope.org/distribution`) - Fixed author email and home page address. 1.1.0 (2007-11-01) ------------------ - Package data update. - First public release. 1.0.0 (2007-??-??) ------------------ - Initial release. Keywords: file content mimetype Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Web Environment Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: Zope Public License Classifier: Programming Language :: Python Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Topic :: Internet :: WWW/HTTP Classifier: Framework :: Zope3 zope.mimetype-1.3.1/src/zope.mimetype.egg-info/SOURCES.txt000644 000766 000024 00000004004 11466575500 023206 0ustar00macstaff000000 000000 CHANGES.txt COPYRIGHT.txt LICENSE.txt README.txt ZopePublicLicense.txt bootstrap.py buildout.cfg setup.py src/zope/__init__.py src/zope.mimetype.egg-info/PKG-INFO src/zope.mimetype.egg-info/SOURCES.txt src/zope.mimetype.egg-info/dependency_links.txt src/zope.mimetype.egg-info/namespace_packages.txt src/zope.mimetype.egg-info/not-zip-safe src/zope.mimetype.egg-info/requires.txt src/zope.mimetype.egg-info/top_level.txt src/zope/mimetype/README.txt src/zope/mimetype/TODO.txt src/zope/mimetype/__init__.py src/zope/mimetype/character-sets.txt src/zope/mimetype/codec.py src/zope/mimetype/codec.txt src/zope/mimetype/configure.zcml src/zope/mimetype/constraints.txt src/zope/mimetype/contentinfo.py src/zope/mimetype/contentinfo.txt src/zope/mimetype/event.py src/zope/mimetype/event.txt src/zope/mimetype/i18n.py src/zope/mimetype/interfaces.py src/zope/mimetype/meta.zcml src/zope/mimetype/retrieving_mime_types.txt src/zope/mimetype/source.py src/zope/mimetype/source.txt src/zope/mimetype/tests.py src/zope/mimetype/typegetter.py src/zope/mimetype/typegetter.txt src/zope/mimetype/types.csv src/zope/mimetype/types.py src/zope/mimetype/utils.py src/zope/mimetype/utils.txt src/zope/mimetype/widget.py src/zope/mimetype/widget.txt src/zope/mimetype/zcml.py src/zope/mimetype/icons/archive.png src/zope/mimetype/icons/audio.gif src/zope/mimetype/icons/binary.gif src/zope/mimetype/icons/css.gif src/zope/mimetype/icons/document.gif src/zope/mimetype/icons/html.gif src/zope/mimetype/icons/image.gif src/zope/mimetype/icons/javascript.gif src/zope/mimetype/icons/ms-excel.gif src/zope/mimetype/icons/ms-powerpoint.gif src/zope/mimetype/icons/ms-project.gif src/zope/mimetype/icons/ms-word.gif src/zope/mimetype/icons/octet-stream.gif src/zope/mimetype/icons/oo-calc.png src/zope/mimetype/icons/oo-impress.png src/zope/mimetype/icons/oo-writer.png src/zope/mimetype/icons/pdf.gif src/zope/mimetype/icons/python.gif src/zope/mimetype/icons/video.png src/zope/mimetype/icons/wordperfect.gif src/zope/mimetype/icons/xml.gif src/zope/mimetype/icons/xsl.gifzope.mimetype-1.3.1/src/zope.mimetype.egg-info/dependency_links.txt000644 000766 000024 00000000001 11466575500 025372 0ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope.mimetype.egg-info/namespace_packages.txt000644 000766 000024 00000000005 11466575500 025652 0ustar00macstaff000000 000000 zope zope.mimetype-1.3.1/src/zope.mimetype.egg-info/not-zip-safe000644 000766 000024 00000000001 11466575474 023564 0ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope.mimetype.egg-info/requires.txt000644 000766 000024 00000000371 11466575500 023725 0ustar00macstaff000000 000000 setuptools zope.browser zope.browserresource zope.component zope.configuration zope.contenttype>=3.5.0dev zope.event zope.formlib>=4.0 zope.i18n zope.i18nmessageid zope.interface zope.publisher zope.schema zope.security [test] zope.component [test]zope.mimetype-1.3.1/src/zope.mimetype.egg-info/top_level.txt000644 000766 000024 00000000005 11466575500 024051 0ustar00macstaff000000 000000 zope zope.mimetype-1.3.1/src/zope/__init__.py000644 000766 000024 00000000070 11466575473 020121 0ustar00macstaff000000 000000 __import__('pkg_resources').declare_namespace(__name__) zope.mimetype-1.3.1/src/zope/mimetype/000755 000766 000024 00000000000 11466575503 017636 5ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope/mimetype/README.txt000644 000766 000024 00000005347 11466575473 021353 0ustar00macstaff000000 000000 ============================ The Zope MIME Infrastructure ============================ This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information. The basic idea behind this is that content objects should provide an interface based on the actual content type they implement. For example, objects that represent text/xml or application/xml documents should be marked mark with the `IContentTypeXml` interface. This can allow additional views to be registered based on the content type, or subscribers may be registered to perform other actions based on the content type. One aspect of the content type that's important for all documents is that the content type interface determines whether the object data is interpreted as an encoded text document. Encoded text documents, in particular, can be decoded to obtain a single Unicode string. The content type intefaces for encoded text must derive from `IContentTypeEncoded`. (All content type interfaces derive from `IContentType` and directly provide `IContentTypeInterface`.) The default configuration provides direct support for a variety of common document types found in office environments. Supported lookups ----------------- Several different queries are supported by this package: - Given a MIME type expressed as a string, the associated interface, if any, can be retrieved using:: # `mimeType` is the MIME type as a string interface = queryUtility(IContentTypeInterface, mimeType) - Given a charset name, the associated `ICodec` instance can be retrieved using:: # `charsetName` is the charset name as a string codec = queryUtility(ICharsetCodec, charsetName) - Given a codec, the preferred charset name can be retrieved using:: # `codec` is an `ICodec` instance: charsetName = getUtility(ICodecPreferredCharset, codec.name).name - Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable MIME type can be made using:: # `filename` is a suggested file name, or None # `data` is uploaded data, or None # `content_type` is a Content-Type header value, or None # mimeType = getUtility(IMimeTypeGetter)( name=filename, data=data, content_type=content_type) - Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable charset name can be made using:: # `filename` is a suggested file name, or None # `data` is uploaded data, or None # `content_type` is a Content-Type header value, or None # charsetName = getUtility(ICharsetGetter)( name=filename, data=data, content_type=content_type) zope.mimetype-1.3.1/src/zope/mimetype/TODO.txt000644 000766 000024 00000000234 11466575473 021151 0ustar00macstaff000000 000000 - make tests into unit tests (setting up component framework and utilities/adapters) - make the location of the CSV file configurable/extendable via ZCML zope.mimetype-1.3.1/src/zope/mimetype/__init__.py000644 000766 000024 00000000015 11466575473 021751 0ustar00macstaff000000 000000 import types zope.mimetype-1.3.1/src/zope/mimetype/character-sets.txt000644 000766 000024 00000146270 11466575473 023327 0ustar00macstaff000000 000000 =================================================================== CHARACTER SETS (last updated 28 January 2005) These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. These names are expressed in ANSI_X3.4-1968 which is commonly called US-ASCII or simply ASCII. The character set most commonly use in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged. The character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters. The MIBenum value is a unique value for use in MIBs to identify coded character sets. The value space for MIBenum values has been divided into three regions. The first region (3-999) consists of coded character sets that have been standardized by some standard setting organization. This region is intended for standards that do not have subset implementations. The second region (1000-1999) is for the Unicode and ISO/IEC 10646 coded character sets together with a specification of a (set of) sub-repertoires that may occur. The third region (>1999) is intended for vendor specific coded character sets. Assigned MIB enum Numbers ------------------------- 0-2 Reserved 3-999 Set By Standards Organizations 1000-1999 Unicode / 10646 2000-2999 Vendor The aliases that start with "cs" have been added for use with the IANA-CHARSET-MIB as originally defined in RFC3808, and as currently maintained by IANA at http://www/iana.org/assignments/ianacharset-mib. Note that the ianacharset-mib needs to be kept in sync with this registry. These aliases that start with "cs" contain the standard numbers along with suggestive names in order to facilitate applications that want to display the names in user interfaces. The "cs" stands for character set and is provided for applications that need a lower case first letter but want to use mixed case thereafter that cannot contain any special characters, such as underbar ("_") and dash ("-"). If the character set is from an ISO standard, its cs alias is the ISO standard number or name. If the character set is not from an ISO standard, but is registered with ISO (IPSJ/ITSCJ is the current ISO Registration Authority), the ISO Registry number is specified as ISOnnn followed by letters suggestive of the name or standards number of the code set. When a national or international standard is revised, the year of revision is added to the cs alias of the new character set entry in the IANA Registry in order to distinguish the revised character set from the original character set. Character Set Reference ------------- --------- Name: ANSI_X3.4-1968 [RFC1345,KXS2] MIBenum: 3 Source: ECMA registry Alias: iso-ir-6 Alias: ANSI_X3.4-1986 Alias: ISO_646.irv:1991 Alias: ASCII Alias: ISO646-US Alias: US-ASCII (preferred MIME name) Alias: us Alias: IBM367 Alias: cp367 Alias: csASCII Name: ISO-10646-UTF-1 MIBenum: 27 Source: Universal Transfer Format (1), this is the multibyte encoding, that subsets ASCII-7. It does not have byte ordering issues. Alias: csISO10646UTF1 Name: ISO_646.basic:1983 [RFC1345,KXS2] MIBenum: 28 Source: ECMA registry Alias: ref Alias: csISO646basic1983 Name: INVARIANT [RFC1345,KXS2] MIBenum: 29 Alias: csINVARIANT Name: ISO_646.irv:1983 [RFC1345,KXS2] MIBenum: 30 Source: ECMA registry Alias: iso-ir-2 Alias: irv Alias: csISO2IntlRefVersion Name: BS_4730 [RFC1345,KXS2] MIBenum: 20 Source: ECMA registry Alias: iso-ir-4 Alias: ISO646-GB Alias: gb Alias: uk Alias: csISO4UnitedKingdom Name: NATS-SEFI [RFC1345,KXS2] MIBenum: 31 Source: ECMA registry Alias: iso-ir-8-1 Alias: csNATSSEFI Name: NATS-SEFI-ADD [RFC1345,KXS2] MIBenum: 32 Source: ECMA registry Alias: iso-ir-8-2 Alias: csNATSSEFIADD Name: NATS-DANO [RFC1345,KXS2] MIBenum: 33 Source: ECMA registry Alias: iso-ir-9-1 Alias: csNATSDANO Name: NATS-DANO-ADD [RFC1345,KXS2] MIBenum: 34 Source: ECMA registry Alias: iso-ir-9-2 Alias: csNATSDANOADD Name: SEN_850200_B [RFC1345,KXS2] MIBenum: 35 Source: ECMA registry Alias: iso-ir-10 Alias: FI Alias: ISO646-FI Alias: ISO646-SE Alias: se Alias: csISO10Swedish Name: SEN_850200_C [RFC1345,KXS2] MIBenum: 21 Source: ECMA registry Alias: iso-ir-11 Alias: ISO646-SE2 Alias: se2 Alias: csISO11SwedishForNames Name: KS_C_5601-1987 [RFC1345,KXS2] MIBenum: 36 Source: ECMA registry Alias: iso-ir-149 Alias: KS_C_5601-1989 Alias: KSC_5601 Alias: korean Alias: csKSC56011987 Name: ISO-2022-KR (preferred MIME name) [RFC1557,Choi] MIBenum: 37 Source: RFC-1557 (see also KS_C_5601-1987) Alias: csISO2022KR Name: EUC-KR (preferred MIME name) [RFC1557,Choi] MIBenum: 38 Source: RFC-1557 (see also KS_C_5861-1992) Alias: csEUCKR Name: ISO-2022-JP (preferred MIME name) [RFC1468,Murai] MIBenum: 39 Source: RFC-1468 (see also RFC-2237) Alias: csISO2022JP Name: ISO-2022-JP-2 (preferred MIME name) [RFC1554,Ohta] MIBenum: 40 Source: RFC-1554 Alias: csISO2022JP2 Name: ISO-2022-CN [RFC1922] MIBenum: 104 Source: RFC-1922 Name: ISO-2022-CN-EXT [RFC1922] MIBenum: 105 Source: RFC-1922 Name: JIS_C6220-1969-jp [RFC1345,KXS2] MIBenum: 41 Source: ECMA registry Alias: JIS_C6220-1969 Alias: iso-ir-13 Alias: katakana Alias: x0201-7 Alias: csISO13JISC6220jp Name: JIS_C6220-1969-ro [RFC1345,KXS2] MIBenum: 42 Source: ECMA registry Alias: iso-ir-14 Alias: jp Alias: ISO646-JP Alias: csISO14JISC6220ro Name: IT [RFC1345,KXS2] MIBenum: 22 Source: ECMA registry Alias: iso-ir-15 Alias: ISO646-IT Alias: csISO15Italian Name: PT [RFC1345,KXS2] MIBenum: 43 Source: ECMA registry Alias: iso-ir-16 Alias: ISO646-PT Alias: csISO16Portuguese Name: ES [RFC1345,KXS2] MIBenum: 23 Source: ECMA registry Alias: iso-ir-17 Alias: ISO646-ES Alias: csISO17Spanish Name: greek7-old [RFC1345,KXS2] MIBenum: 44 Source: ECMA registry Alias: iso-ir-18 Alias: csISO18Greek7Old Name: latin-greek [RFC1345,KXS2] MIBenum: 45 Source: ECMA registry Alias: iso-ir-19 Alias: csISO19LatinGreek Name: DIN_66003 [RFC1345,KXS2] MIBenum: 24 Source: ECMA registry Alias: iso-ir-21 Alias: de Alias: ISO646-DE Alias: csISO21German Name: NF_Z_62-010_(1973) [RFC1345,KXS2] MIBenum: 46 Source: ECMA registry Alias: iso-ir-25 Alias: ISO646-FR1 Alias: csISO25French Name: Latin-greek-1 [RFC1345,KXS2] MIBenum: 47 Source: ECMA registry Alias: iso-ir-27 Alias: csISO27LatinGreek1 Name: ISO_5427 [RFC1345,KXS2] MIBenum: 48 Source: ECMA registry Alias: iso-ir-37 Alias: csISO5427Cyrillic Name: JIS_C6226-1978 [RFC1345,KXS2] MIBenum: 49 Source: ECMA registry Alias: iso-ir-42 Alias: csISO42JISC62261978 Name: BS_viewdata [RFC1345,KXS2] MIBenum: 50 Source: ECMA registry Alias: iso-ir-47 Alias: csISO47BSViewdata Name: INIS [RFC1345,KXS2] MIBenum: 51 Source: ECMA registry Alias: iso-ir-49 Alias: csISO49INIS Name: INIS-8 [RFC1345,KXS2] MIBenum: 52 Source: ECMA registry Alias: iso-ir-50 Alias: csISO50INIS8 Name: INIS-cyrillic [RFC1345,KXS2] MIBenum: 53 Source: ECMA registry Alias: iso-ir-51 Alias: csISO51INISCyrillic Name: ISO_5427:1981 [RFC1345,KXS2] MIBenum: 54 Source: ECMA registry Alias: iso-ir-54 Alias: ISO5427Cyrillic1981 Name: ISO_5428:1980 [RFC1345,KXS2] MIBenum: 55 Source: ECMA registry Alias: iso-ir-55 Alias: csISO5428Greek Name: GB_1988-80 [RFC1345,KXS2] MIBenum: 56 Source: ECMA registry Alias: iso-ir-57 Alias: cn Alias: ISO646-CN Alias: csISO57GB1988 Name: GB_2312-80 [RFC1345,KXS2] MIBenum: 57 Source: ECMA registry Alias: iso-ir-58 Alias: chinese Alias: csISO58GB231280 Name: NS_4551-1 [RFC1345,KXS2] MIBenum: 25 Source: ECMA registry Alias: iso-ir-60 Alias: ISO646-NO Alias: no Alias: csISO60DanishNorwegian Alias: csISO60Norwegian1 Name: NS_4551-2 [RFC1345,KXS2] MIBenum: 58 Source: ECMA registry Alias: ISO646-NO2 Alias: iso-ir-61 Alias: no2 Alias: csISO61Norwegian2 Name: NF_Z_62-010 [RFC1345,KXS2] MIBenum: 26 Source: ECMA registry Alias: iso-ir-69 Alias: ISO646-FR Alias: fr Alias: csISO69French Name: videotex-suppl [RFC1345,KXS2] MIBenum: 59 Source: ECMA registry Alias: iso-ir-70 Alias: csISO70VideotexSupp1 Name: PT2 [RFC1345,KXS2] MIBenum: 60 Source: ECMA registry Alias: iso-ir-84 Alias: ISO646-PT2 Alias: csISO84Portuguese2 Name: ES2 [RFC1345,KXS2] MIBenum: 61 Source: ECMA registry Alias: iso-ir-85 Alias: ISO646-ES2 Alias: csISO85Spanish2 Name: MSZ_7795.3 [RFC1345,KXS2] MIBenum: 62 Source: ECMA registry Alias: iso-ir-86 Alias: ISO646-HU Alias: hu Alias: csISO86Hungarian Name: JIS_C6226-1983 [RFC1345,KXS2] MIBenum: 63 Source: ECMA registry Alias: iso-ir-87 Alias: x0208 Alias: JIS_X0208-1983 Alias: csISO87JISX0208 Name: greek7 [RFC1345,KXS2] MIBenum: 64 Source: ECMA registry Alias: iso-ir-88 Alias: csISO88Greek7 Name: ASMO_449 [RFC1345,KXS2] MIBenum: 65 Source: ECMA registry Alias: ISO_9036 Alias: arabic7 Alias: iso-ir-89 Alias: csISO89ASMO449 Name: iso-ir-90 [RFC1345,KXS2] MIBenum: 66 Source: ECMA registry Alias: csISO90 Name: JIS_C6229-1984-a [RFC1345,KXS2] MIBenum: 67 Source: ECMA registry Alias: iso-ir-91 Alias: jp-ocr-a Alias: csISO91JISC62291984a Name: JIS_C6229-1984-b [RFC1345,KXS2] MIBenum: 68 Source: ECMA registry Alias: iso-ir-92 Alias: ISO646-JP-OCR-B Alias: jp-ocr-b Alias: csISO92JISC62991984b Name: JIS_C6229-1984-b-add [RFC1345,KXS2] MIBenum: 69 Source: ECMA registry Alias: iso-ir-93 Alias: jp-ocr-b-add Alias: csISO93JIS62291984badd Name: JIS_C6229-1984-hand [RFC1345,KXS2] MIBenum: 70 Source: ECMA registry Alias: iso-ir-94 Alias: jp-ocr-hand Alias: csISO94JIS62291984hand Name: JIS_C6229-1984-hand-add [RFC1345,KXS2] MIBenum: 71 Source: ECMA registry Alias: iso-ir-95 Alias: jp-ocr-hand-add Alias: csISO95JIS62291984handadd Name: JIS_C6229-1984-kana [RFC1345,KXS2] MIBenum: 72 Source: ECMA registry Alias: iso-ir-96 Alias: csISO96JISC62291984kana Name: ISO_2033-1983 [RFC1345,KXS2] MIBenum: 73 Source: ECMA registry Alias: iso-ir-98 Alias: e13b Alias: csISO2033 Name: ANSI_X3.110-1983 [RFC1345,KXS2] MIBenum: 74 Source: ECMA registry Alias: iso-ir-99 Alias: CSA_T500-1983 Alias: NAPLPS Alias: csISO99NAPLPS Name: ISO_8859-1:1987 [RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 Alias: l1 Alias: IBM819 Alias: CP819 Alias: csISOLatin1 Name: ISO_8859-2:1987 [RFC1345,KXS2] MIBenum: 5 Source: ECMA registry Alias: iso-ir-101 Alias: ISO_8859-2 Alias: ISO-8859-2 (preferred MIME name) Alias: latin2 Alias: l2 Alias: csISOLatin2 Name: T.61-7bit [RFC1345,KXS2] MIBenum: 75 Source: ECMA registry Alias: iso-ir-102 Alias: csISO102T617bit Name: T.61-8bit [RFC1345,KXS2] MIBenum: 76 Alias: T.61 Source: ECMA registry Alias: iso-ir-103 Alias: csISO103T618bit Name: ISO_8859-3:1988 [RFC1345,KXS2] MIBenum: 6 Source: ECMA registry Alias: iso-ir-109 Alias: ISO_8859-3 Alias: ISO-8859-3 (preferred MIME name) Alias: latin3 Alias: l3 Alias: csISOLatin3 Name: ISO_8859-4:1988 [RFC1345,KXS2] MIBenum: 7 Source: ECMA registry Alias: iso-ir-110 Alias: ISO_8859-4 Alias: ISO-8859-4 (preferred MIME name) Alias: latin4 Alias: l4 Alias: csISOLatin4 Name: ECMA-cyrillic MIBenum: 77 Source: ISO registry (formerly ECMA registry) http://www.itscj.ipsj.jp/ISO-IR/111.pdf Alias: iso-ir-111 Alias: KOI8-E Alias: csISO111ECMACyrillic Name: CSA_Z243.4-1985-1 [RFC1345,KXS2] MIBenum: 78 Source: ECMA registry Alias: iso-ir-121 Alias: ISO646-CA Alias: csa7-1 Alias: ca Alias: csISO121Canadian1 Name: CSA_Z243.4-1985-2 [RFC1345,KXS2] MIBenum: 79 Source: ECMA registry Alias: iso-ir-122 Alias: ISO646-CA2 Alias: csa7-2 Alias: csISO122Canadian2 Name: CSA_Z243.4-1985-gr [RFC1345,KXS2] MIBenum: 80 Source: ECMA registry Alias: iso-ir-123 Alias: csISO123CSAZ24341985gr Name: ISO_8859-6:1987 [RFC1345,KXS2] MIBenum: 9 Source: ECMA registry Alias: iso-ir-127 Alias: ISO_8859-6 Alias: ISO-8859-6 (preferred MIME name) Alias: ECMA-114 Alias: ASMO-708 Alias: arabic Alias: csISOLatinArabic Name: ISO_8859-6-E [RFC1556,IANA] MIBenum: 81 Source: RFC1556 Alias: csISO88596E Alias: ISO-8859-6-E (preferred MIME name) Name: ISO_8859-6-I [RFC1556,IANA] MIBenum: 82 Source: RFC1556 Alias: csISO88596I Alias: ISO-8859-6-I (preferred MIME name) Name: ISO_8859-7:1987 [RFC1947,RFC1345,KXS2] MIBenum: 10 Source: ECMA registry Alias: iso-ir-126 Alias: ISO_8859-7 Alias: ISO-8859-7 (preferred MIME name) Alias: ELOT_928 Alias: ECMA-118 Alias: greek Alias: greek8 Alias: csISOLatinGreek Name: T.101-G2 [RFC1345,KXS2] MIBenum: 83 Source: ECMA registry Alias: iso-ir-128 Alias: csISO128T101G2 Name: ISO_8859-8:1988 [RFC1345,KXS2] MIBenum: 11 Source: ECMA registry Alias: iso-ir-138 Alias: ISO_8859-8 Alias: ISO-8859-8 (preferred MIME name) Alias: hebrew Alias: csISOLatinHebrew Name: ISO_8859-8-E [RFC1556,Nussbacher] MIBenum: 84 Source: RFC1556 Alias: csISO88598E Alias: ISO-8859-8-E (preferred MIME name) Name: ISO_8859-8-I [RFC1556,Nussbacher] MIBenum: 85 Source: RFC1556 Alias: csISO88598I Alias: ISO-8859-8-I (preferred MIME name) Name: CSN_369103 [RFC1345,KXS2] MIBenum: 86 Source: ECMA registry Alias: iso-ir-139 Alias: csISO139CSN369103 Name: JUS_I.B1.002 [RFC1345,KXS2] MIBenum: 87 Source: ECMA registry Alias: iso-ir-141 Alias: ISO646-YU Alias: js Alias: yu Alias: csISO141JUSIB1002 Name: ISO_6937-2-add [RFC1345,KXS2] MIBenum: 14 Source: ECMA registry and ISO 6937-2:1983 Alias: iso-ir-142 Alias: csISOTextComm Name: IEC_P27-1 [RFC1345,KXS2] MIBenum: 88 Source: ECMA registry Alias: iso-ir-143 Alias: csISO143IECP271 Name: ISO_8859-5:1988 [RFC1345,KXS2] MIBenum: 8 Source: ECMA registry Alias: iso-ir-144 Alias: ISO_8859-5 Alias: ISO-8859-5 (preferred MIME name) Alias: cyrillic Alias: csISOLatinCyrillic Name: JUS_I.B1.003-serb [RFC1345,KXS2] MIBenum: 89 Source: ECMA registry Alias: iso-ir-146 Alias: serbian Alias: csISO146Serbian Name: JUS_I.B1.003-mac [RFC1345,KXS2] MIBenum: 90 Source: ECMA registry Alias: macedonian Alias: iso-ir-147 Alias: csISO147Macedonian Name: ISO_8859-9:1989 [RFC1345,KXS2] MIBenum: 12 Source: ECMA registry Alias: iso-ir-148 Alias: ISO_8859-9 Alias: ISO-8859-9 (preferred MIME name) Alias: latin5 Alias: l5 Alias: csISOLatin5 Name: greek-ccitt [RFC1345,KXS2] MIBenum: 91 Source: ECMA registry Alias: iso-ir-150 Alias: csISO150 Alias: csISO150GreekCCITT Name: NC_NC00-10:81 [RFC1345,KXS2] MIBenum: 92 Source: ECMA registry Alias: cuba Alias: iso-ir-151 Alias: ISO646-CU Alias: csISO151Cuba Name: ISO_6937-2-25 [RFC1345,KXS2] MIBenum: 93 Source: ECMA registry Alias: iso-ir-152 Alias: csISO6937Add Name: GOST_19768-74 [RFC1345,KXS2] MIBenum: 94 Source: ECMA registry Alias: ST_SEV_358-88 Alias: iso-ir-153 Alias: csISO153GOST1976874 Name: ISO_8859-supp [RFC1345,KXS2] MIBenum: 95 Source: ECMA registry Alias: iso-ir-154 Alias: latin1-2-5 Alias: csISO8859Supp Name: ISO_10367-box [RFC1345,KXS2] MIBenum: 96 Source: ECMA registry Alias: iso-ir-155 Alias: csISO10367Box Name: ISO-8859-10 (preferred MIME name) [RFC1345,KXS2] MIBenum: 13 Source: ECMA registry Alias: iso-ir-157 Alias: l6 Alias: ISO_8859-10:1992 Alias: csISOLatin6 Alias: latin6 Name: latin-lap [RFC1345,KXS2] MIBenum: 97 Source: ECMA registry Alias: lap Alias: iso-ir-158 Alias: csISO158Lap Name: JIS_X0212-1990 [RFC1345,KXS2] MIBenum: 98 Source: ECMA registry Alias: x0212 Alias: iso-ir-159 Alias: csISO159JISX02121990 Name: DS_2089 [RFC1345,KXS2] MIBenum: 99 Source: Danish Standard, DS 2089, February 1974 Alias: DS2089 Alias: ISO646-DK Alias: dk Alias: csISO646Danish Name: us-dk [RFC1345,KXS2] MIBenum: 100 Alias: csUSDK Name: dk-us [RFC1345,KXS2] MIBenum: 101 Alias: csDKUS Name: JIS_X0201 [RFC1345,KXS2] MIBenum: 15 Source: JIS X 0201-1976. One byte only, this is equivalent to JIS/Roman (similar to ASCII) plus eight-bit half-width Katakana Alias: X0201 Alias: csHalfWidthKatakana Name: KSC5636 [RFC1345,KXS2] MIBenum: 102 Alias: ISO646-KR Alias: csKSC5636 Name: ISO-10646-UCS-2 MIBenum: 1000 Source: the 2-octet Basic Multilingual Plane, aka Unicode this needs to specify network byte order: the standard does not specify (it is a 16-bit integer space) Alias: csUnicode Name: ISO-10646-UCS-4 MIBenum: 1001 Source: the full code space. (same comment about byte order, these are 31-bit numbers. Alias: csUCS4 Name: DEC-MCS [RFC1345,KXS2] MIBenum: 2008 Source: VAX/VMS User's Manual, Order Number: AI-Y517A-TE, April 1986. Alias: dec Alias: csDECMCS Name: hp-roman8 [HP-PCL5,RFC1345,KXS2] MIBenum: 2004 Source: LaserJet IIP Printer User's Manual, HP part no 33471-90901, Hewlet-Packard, June 1989. Alias: roman8 Alias: r8 Alias: csHPRoman8 Name: macintosh [RFC1345,KXS2] MIBenum: 2027 Source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991 Alias: mac Alias: csMacintosh Name: IBM037 [RFC1345,KXS2] MIBenum: 2028 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp037 Alias: ebcdic-cp-us Alias: ebcdic-cp-ca Alias: ebcdic-cp-wt Alias: ebcdic-cp-nl Alias: csIBM037 Name: IBM038 [RFC1345,KXS2] MIBenum: 2029 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: EBCDIC-INT Alias: cp038 Alias: csIBM038 Name: IBM273 [RFC1345,KXS2] MIBenum: 2030 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP273 Alias: csIBM273 Name: IBM274 [RFC1345,KXS2] MIBenum: 2031 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: EBCDIC-BE Alias: CP274 Alias: csIBM274 Name: IBM275 [RFC1345,KXS2] MIBenum: 2032 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: EBCDIC-BR Alias: cp275 Alias: csIBM275 Name: IBM277 [RFC1345,KXS2] MIBenum: 2033 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: EBCDIC-CP-DK Alias: EBCDIC-CP-NO Alias: csIBM277 Name: IBM278 [RFC1345,KXS2] MIBenum: 2034 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP278 Alias: ebcdic-cp-fi Alias: ebcdic-cp-se Alias: csIBM278 Name: IBM280 [RFC1345,KXS2] MIBenum: 2035 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP280 Alias: ebcdic-cp-it Alias: csIBM280 Name: IBM281 [RFC1345,KXS2] MIBenum: 2036 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: EBCDIC-JP-E Alias: cp281 Alias: csIBM281 Name: IBM284 [RFC1345,KXS2] MIBenum: 2037 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP284 Alias: ebcdic-cp-es Alias: csIBM284 Name: IBM285 [RFC1345,KXS2] MIBenum: 2038 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP285 Alias: ebcdic-cp-gb Alias: csIBM285 Name: IBM290 [RFC1345,KXS2] MIBenum: 2039 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: cp290 Alias: EBCDIC-JP-kana Alias: csIBM290 Name: IBM297 [RFC1345,KXS2] MIBenum: 2040 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp297 Alias: ebcdic-cp-fr Alias: csIBM297 Name: IBM420 [RFC1345,KXS2] MIBenum: 2041 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990, IBM NLS RM p 11-11 Alias: cp420 Alias: ebcdic-cp-ar1 Alias: csIBM420 Name: IBM423 [RFC1345,KXS2] MIBenum: 2042 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp423 Alias: ebcdic-cp-gr Alias: csIBM423 Name: IBM424 [RFC1345,KXS2] MIBenum: 2043 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp424 Alias: ebcdic-cp-he Alias: csIBM424 Name: IBM437 [RFC1345,KXS2] MIBenum: 2011 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp437 Alias: 437 Alias: csPC8CodePage437 Name: IBM500 [RFC1345,KXS2] MIBenum: 2044 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP500 Alias: ebcdic-cp-be Alias: ebcdic-cp-ch Alias: csIBM500 Name: IBM775 [HP-PCL5] MIBenum: 2087 Source: HP PCL 5 Comparison Guide (P/N 5021-0329) pp B-13, 1996 Alias: cp775 Alias: csPC775Baltic Name: IBM850 [RFC1345,KXS2] MIBenum: 2009 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp850 Alias: 850 Alias: csPC850Multilingual Name: IBM851 [RFC1345,KXS2] MIBenum: 2045 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp851 Alias: 851 Alias: csIBM851 Name: IBM852 [RFC1345,KXS2] MIBenum: 2010 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp852 Alias: 852 Alias: csPCp852 Name: IBM855 [RFC1345,KXS2] MIBenum: 2046 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp855 Alias: 855 Alias: csIBM855 Name: IBM857 [RFC1345,KXS2] MIBenum: 2047 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp857 Alias: 857 Alias: csIBM857 Name: IBM860 [RFC1345,KXS2] MIBenum: 2048 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp860 Alias: 860 Alias: csIBM860 Name: IBM861 [RFC1345,KXS2] MIBenum: 2049 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp861 Alias: 861 Alias: cp-is Alias: csIBM861 Name: IBM862 [RFC1345,KXS2] MIBenum: 2013 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp862 Alias: 862 Alias: csPC862LatinHebrew Name: IBM863 [RFC1345,KXS2] MIBenum: 2050 Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991 Alias: cp863 Alias: 863 Alias: csIBM863 Name: IBM864 [RFC1345,KXS2] MIBenum: 2051 Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991 Alias: cp864 Alias: csIBM864 Name: IBM865 [RFC1345,KXS2] MIBenum: 2052 Source: IBM DOS 3.3 Ref (Abridged), 94X9575 (Feb 1987) Alias: cp865 Alias: 865 Alias: csIBM865 Name: IBM866 [Pond] MIBenum: 2086 Source: IBM NLDG Volume 2 (SE09-8002-03) August 1994 Alias: cp866 Alias: 866 Alias: csIBM866 Name: IBM868 [RFC1345,KXS2] MIBenum: 2053 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP868 Alias: cp-ar Alias: csIBM868 Name: IBM869 [RFC1345,KXS2] MIBenum: 2054 Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991 Alias: cp869 Alias: 869 Alias: cp-gr Alias: csIBM869 Name: IBM870 [RFC1345,KXS2] MIBenum: 2055 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP870 Alias: ebcdic-cp-roece Alias: ebcdic-cp-yu Alias: csIBM870 Name: IBM871 [RFC1345,KXS2] MIBenum: 2056 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP871 Alias: ebcdic-cp-is Alias: csIBM871 Name: IBM880 [RFC1345,KXS2] MIBenum: 2057 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp880 Alias: EBCDIC-Cyrillic Alias: csIBM880 Name: IBM891 [RFC1345,KXS2] MIBenum: 2058 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp891 Alias: csIBM891 Name: IBM903 [RFC1345,KXS2] MIBenum: 2059 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp903 Alias: csIBM903 Name: IBM904 [RFC1345,KXS2] MIBenum: 2060 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp904 Alias: 904 Alias: csIBBM904 Name: IBM905 [RFC1345,KXS2] MIBenum: 2061 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: CP905 Alias: ebcdic-cp-tr Alias: csIBM905 Name: IBM918 [RFC1345,KXS2] MIBenum: 2062 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP918 Alias: ebcdic-cp-ar2 Alias: csIBM918 Name: IBM1026 [RFC1345,KXS2] MIBenum: 2063 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP1026 Alias: csIBM1026 Name: EBCDIC-AT-DE [RFC1345,KXS2] MIBenum: 2064 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csIBMEBCDICATDE Name: EBCDIC-AT-DE-A [RFC1345,KXS2] MIBenum: 2065 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICATDEA Name: EBCDIC-CA-FR [RFC1345,KXS2] MIBenum: 2066 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICCAFR Name: EBCDIC-DK-NO [RFC1345,KXS2] MIBenum: 2067 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICDKNO Name: EBCDIC-DK-NO-A [RFC1345,KXS2] MIBenum: 2068 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICDKNOA Name: EBCDIC-FI-SE [RFC1345,KXS2] MIBenum: 2069 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICFISE Name: EBCDIC-FI-SE-A [RFC1345,KXS2] MIBenum: 2070 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICFISEA Name: EBCDIC-FR [RFC1345,KXS2] MIBenum: 2071 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICFR Name: EBCDIC-IT [RFC1345,KXS2] MIBenum: 2072 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICIT Name: EBCDIC-PT [RFC1345,KXS2] MIBenum: 2073 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICPT Name: EBCDIC-ES [RFC1345,KXS2] MIBenum: 2074 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICES Name: EBCDIC-ES-A [RFC1345,KXS2] MIBenum: 2075 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICESA Name: EBCDIC-ES-S [RFC1345,KXS2] MIBenum: 2076 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICESS Name: EBCDIC-UK [RFC1345,KXS2] MIBenum: 2077 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICUK Name: EBCDIC-US [RFC1345,KXS2] MIBenum: 2078 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICUS Name: UNKNOWN-8BIT [RFC1428] MIBenum: 2079 Alias: csUnknown8BiT Name: MNEMONIC [RFC1345,KXS2] MIBenum: 2080 Source: RFC 1345, also known as "mnemonic+ascii+38" Alias: csMnemonic Name: MNEM [RFC1345,KXS2] MIBenum: 2081 Source: RFC 1345, also known as "mnemonic+ascii+8200" Alias: csMnem Name: VISCII [RFC1456] MIBenum: 2082 Source: RFC 1456 Alias: csVISCII Name: VIQR [RFC1456] MIBenum: 2083 Source: RFC 1456 Alias: csVIQR Name: KOI8-R (preferred MIME name) [RFC1489] MIBenum: 2084 Source: RFC 1489, based on GOST-19768-74, ISO-6937/8, INIS-Cyrillic, ISO-5427. Alias: csKOI8R Name: KOI8-U [RFC2319] MIBenum: 2088 Source: RFC 2319 Name: IBM00858 MIBenum: 2089 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM00858) [Mahdi] Alias: CCSID00858 Alias: CP00858 Alias: PC-Multilingual-850+euro Name: IBM00924 MIBenum: 2090 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM00924) [Mahdi] Alias: CCSID00924 Alias: CP00924 Alias: ebcdic-Latin9--euro Name: IBM01140 MIBenum: 2091 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01140) [Mahdi] Alias: CCSID01140 Alias: CP01140 Alias: ebcdic-us-37+euro Name: IBM01141 MIBenum: 2092 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01141) [Mahdi] Alias: CCSID01141 Alias: CP01141 Alias: ebcdic-de-273+euro Name: IBM01142 MIBenum: 2093 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01142) [Mahdi] Alias: CCSID01142 Alias: CP01142 Alias: ebcdic-dk-277+euro Alias: ebcdic-no-277+euro Name: IBM01143 MIBenum: 2094 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01143) [Mahdi] Alias: CCSID01143 Alias: CP01143 Alias: ebcdic-fi-278+euro Alias: ebcdic-se-278+euro Name: IBM01144 MIBenum: 2095 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01144) [Mahdi] Alias: CCSID01144 Alias: CP01144 Alias: ebcdic-it-280+euro Name: IBM01145 MIBenum: 2096 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01145) [Mahdi] Alias: CCSID01145 Alias: CP01145 Alias: ebcdic-es-284+euro Name: IBM01146 MIBenum: 2097 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01146) [Mahdi] Alias: CCSID01146 Alias: CP01146 Alias: ebcdic-gb-285+euro Name: IBM01147 MIBenum: 2098 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01147) [Mahdi] Alias: CCSID01147 Alias: CP01147 Alias: ebcdic-fr-297+euro Name: IBM01148 MIBenum: 2099 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01148) [Mahdi] Alias: CCSID01148 Alias: CP01148 Alias: ebcdic-international-500+euro Name: IBM01149 MIBenum: 2100 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01149) [Mahdi] Alias: CCSID01149 Alias: CP01149 Alias: ebcdic-is-871+euro Name: Big5-HKSCS [Yick] MIBenum: 2101 Source: See (http://www.iana.org/assignments/charset-reg/Big5-HKSCS) Alias: None Name: IBM1047 [Robrigado] MIBenum: 2102 Source: IBM1047 (EBCDIC Latin 1/Open Systems) http://www-1.ibm.com/servers/eserver/iseries/software/globalization/pdf/cp01047z.pdf Alias: IBM-1047 Name: PTCP154 [Uskov] MIBenum: 2103 Source: See (http://www.iana.org/assignments/charset-reg/PTCP154) Alias: csPTCP154 Alias: PT154 Alias: CP154 Alias: Cyrillic-Asian Name: Amiga-1251 MIBenum: 2104 Source: See (http://www.amiga.ultranet.ru/Amiga-1251.html) Alias: Ami1251 Alias: Amiga1251 Alias: Ami-1251 (Aliases are provided for historical reasons and should not be used) [Malyshev] Name: KOI7-switched MIBenum: 2105 Source: See Aliases: None Name: UNICODE-1-1 [RFC1641] MIBenum: 1010 Source: RFC 1641 Alias: csUnicode11 Name: SCSU MIBenum: 1011 Source: SCSU See (http://www.iana.org/assignments/charset-reg/SCSU) [Scherer] Alias: None Name: UTF-7 [RFC2152] MIBenum: 1012 Source: RFC 2152 Alias: None Name: UTF-16BE [RFC2781] MIBenum: 1013 Source: RFC 2781 Alias: None Name: UTF-16LE [RFC2781] MIBenum: 1014 Source: RFC 2781 Alias: None Name: UTF-16 [RFC2781] MIBenum: 1015 Source: RFC 2781 Alias: None Name: CESU-8 [Phipps] MIBenum: 1016 Source: Alias: csCESU-8 Name: UTF-32 [Davis] MIBenum: 1017 Source: Alias: None Name: UTF-32BE [Davis] MIBenum: 1018 Source: Alias: None Name: UTF-32LE [Davis] MIBenum: 1019 Source: Alias: None Name: BOCU-1 [Scherer] MIBenum: 1020 Source: http://www.unicode.org/notes/tn6/ Alias: csBOCU-1 Name: UNICODE-1-1-UTF-7 [RFC1642] MIBenum: 103 Source: RFC 1642 Alias: csUnicode11UTF7 Name: UTF-8 [RFC3629] MIBenum: 106 Source: RFC 3629 Alias: None Name: ISO-8859-13 MIBenum: 109 Source: ISO See (http://www.iana.org/assignments/charset-reg/iso-8859-13)[Tumasonis] Alias: None Name: ISO-8859-14 MIBenum: 110 Source: ISO See (http://www.iana.org/assignments/charset-reg/iso-8859-14) [Simonsen] Alias: iso-ir-199 Alias: ISO_8859-14:1998 Alias: ISO_8859-14 Alias: latin8 Alias: iso-celtic Alias: l8 Name: ISO-8859-15 MIBenum: 111 Source: ISO Please see: Alias: ISO_8859-15 Alias: Latin-9 Name: ISO-8859-16 MIBenum: 112 Source: ISO Alias: iso-ir-226 Alias: ISO_8859-16:2001 Alias: ISO_8859-16 Alias: latin10 Alias: l10 Name: GBK MIBenum: 113 Source: Chinese IT Standardization Technical Committee Please see: Alias: CP936 Alias: MS936 Alias: windows-936 Name: GB18030 MIBenum: 114 Source: Chinese IT Standardization Technical Committee Please see: Alias: None Name: OSD_EBCDIC_DF04_15 MIBenum: 115 Source: Fujitsu-Siemens standard mainframe EBCDIC encoding Please see: Alias: None Name: OSD_EBCDIC_DF03_IRV MIBenum: 116 Source: Fujitsu-Siemens standard mainframe EBCDIC encoding Please see: Alias: None Name: OSD_EBCDIC_DF04_1 MIBenum: 117 Source: Fujitsu-Siemens standard mainframe EBCDIC encoding Please see: Alias: None Name: JIS_Encoding MIBenum: 16 Source: JIS X 0202-1991. Uses ISO 2022 escape sequences to shift code sets as documented in JIS X 0202-1991. Alias: csJISEncoding Name: Shift_JIS (preferred MIME name) MIBenum: 17 Source: This charset is an extension of csHalfWidthKatakana by adding graphic characters in JIS X 0208. The CCS's are JIS X0201:1997 and JIS X0208:1997. The complete definition is shown in Appendix 1 of JIS X0208:1997. This charset can be used for the top-level media type "text". Alias: MS_Kanji Alias: csShiftJIS Name: Extended_UNIX_Code_Packed_Format_for_Japanese MIBenum: 18 Source: Standardized by OSF, UNIX International, and UNIX Systems Laboratories Pacific. Uses ISO 2022 rules to select code set 0: US-ASCII (a single 7-bit byte set) code set 1: JIS X0208-1990 (a double 8-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) requiring SS2 as the character prefix code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes requiring SS3 as the character prefix Alias: csEUCPkdFmtJapanese Alias: EUC-JP (preferred MIME name) Name: Extended_UNIX_Code_Fixed_Width_for_Japanese MIBenum: 19 Source: Used in Japan. Each character is 2 octets. code set 0: US-ASCII (a single 7-bit byte set) 1st byte = 00 2nd byte = 20-7E code set 1: JIS X0208-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) 1st byte = 00 2nd byte = A0-FF code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in the first byte and 21-7E in the second byte Alias: csEUCFixWidJapanese Name: ISO-10646-UCS-Basic MIBenum: 1002 Source: ASCII subset of Unicode. Basic Latin = collection 1 See ISO 10646, Appendix A Alias: csUnicodeASCII Name: ISO-10646-Unicode-Latin1 MIBenum: 1003 Source: ISO Latin-1 subset of Unicode. Basic Latin and Latin-1 Supplement = collections 1 and 2. See ISO 10646, Appendix A. See RFC 1815. Alias: csUnicodeLatin1 Alias: ISO-10646 Name: ISO-10646-J-1 Source: ISO 10646 Japanese, see RFC 1815. Name: ISO-Unicode-IBM-1261 MIBenum: 1005 Source: IBM Latin-2, -3, -5, Extended Presentation Set, GCSGID: 1261 Alias: csUnicodeIBM1261 Name: ISO-Unicode-IBM-1268 MIBenum: 1006 Source: IBM Latin-4 Extended Presentation Set, GCSGID: 1268 Alias: csUnicodeIBM1268 Name: ISO-Unicode-IBM-1276 MIBenum: 1007 Source: IBM Cyrillic Greek Extended Presentation Set, GCSGID: 1276 Alias: csUnicodeIBM1276 Name: ISO-Unicode-IBM-1264 MIBenum: 1008 Source: IBM Arabic Presentation Set, GCSGID: 1264 Alias: csUnicodeIBM1264 Name: ISO-Unicode-IBM-1265 MIBenum: 1009 Source: IBM Hebrew Presentation Set, GCSGID: 1265 Alias: csUnicodeIBM1265 Name: ISO-8859-1-Windows-3.0-Latin-1 [HP-PCL5] MIBenum: 2000 Source: Extended ISO 8859-1 Latin-1 for Windows 3.0. PCL Symbol Set id: 9U Alias: csWindows30Latin1 Name: ISO-8859-1-Windows-3.1-Latin-1 [HP-PCL5] MIBenum: 2001 Source: Extended ISO 8859-1 Latin-1 for Windows 3.1. PCL Symbol Set id: 19U Alias: csWindows31Latin1 Name: ISO-8859-2-Windows-Latin-2 [HP-PCL5] MIBenum: 2002 Source: Extended ISO 8859-2. Latin-2 for Windows 3.1. PCL Symbol Set id: 9E Alias: csWindows31Latin2 Name: ISO-8859-9-Windows-Latin-5 [HP-PCL5] MIBenum: 2003 Source: Extended ISO 8859-9. Latin-5 for Windows 3.1 PCL Symbol Set id: 5T Alias: csWindows31Latin5 Name: Adobe-Standard-Encoding [Adobe] MIBenum: 2005 Source: PostScript Language Reference Manual PCL Symbol Set id: 10J Alias: csAdobeStandardEncoding Name: Ventura-US [HP-PCL5] MIBenum: 2006 Source: Ventura US. ASCII plus characters typically used in publishing, like pilcrow, copyright, registered, trade mark, section, dagger, and double dagger in the range A0 (hex) to FF (hex). PCL Symbol Set id: 14J Alias: csVenturaUS Name: Ventura-International [HP-PCL5] MIBenum: 2007 Source: Ventura International. ASCII plus coded characters similar to Roman8. PCL Symbol Set id: 13J Alias: csVenturaInternational Name: PC8-Danish-Norwegian [HP-PCL5] MIBenum: 2012 Source: PC Danish Norwegian 8-bit PC set for Danish Norwegian PCL Symbol Set id: 11U Alias: csPC8DanishNorwegian Name: PC8-Turkish [HP-PCL5] MIBenum: 2014 Source: PC Latin Turkish. PCL Symbol Set id: 9T Alias: csPC8Turkish Name: IBM-Symbols [IBM-CIDT] MIBenum: 2015 Source: Presentation Set, CPGID: 259 Alias: csIBMSymbols Name: IBM-Thai [IBM-CIDT] MIBenum: 2016 Source: Presentation Set, CPGID: 838 Alias: csIBMThai Name: HP-Legal [HP-PCL5] MIBenum: 2017 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 1U Alias: csHPLegal Name: HP-Pi-font [HP-PCL5] MIBenum: 2018 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 15U Alias: csHPPiFont Name: HP-Math8 [HP-PCL5] MIBenum: 2019 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 8M Alias: csHPMath8 Name: Adobe-Symbol-Encoding [Adobe] MIBenum: 2020 Source: PostScript Language Reference Manual PCL Symbol Set id: 5M Alias: csHPPSMath Name: HP-DeskTop [HP-PCL5] MIBenum: 2021 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 7J Alias: csHPDesktop Name: Ventura-Math [HP-PCL5] MIBenum: 2022 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 6M Alias: csVenturaMath Name: Microsoft-Publishing [HP-PCL5] MIBenum: 2023 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 6J Alias: csMicrosoftPublishing Name: Windows-31J MIBenum: 2024 Source: Windows Japanese. A further extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CCS's are JIS X0201:1997, JIS X0208:1997, and these extensions. This charset can be used for the top-level media type "text", but it is of limited or specialized use (see RFC2278). PCL Symbol Set id: 19K Alias: csWindows31J Name: GB2312 (preferred MIME name) MIBenum: 2025 Source: Chinese for People's Republic of China (PRC) mixed one byte, two byte set: 20-7E = one byte ASCII A1-FE = two byte PRC Kanji See GB 2312-80 PCL Symbol Set Id: 18C Alias: csGB2312 Name: Big5 (preferred MIME name) MIBenum: 2026 Source: Chinese for Taiwan Multi-byte set. PCL Symbol Set Id: 18T Alias: csBig5 Name: windows-1250 MIBenum: 2250 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1250) [Lazhintseva] Alias: None Name: windows-1251 MIBenum: 2251 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1251) [Lazhintseva] Alias: None Name: windows-1252 MIBenum: 2252 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1252) [Wendt] Alias: None Name: windows-1253 MIBenum: 2253 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1253) [Lazhintseva] Alias: None Name: windows-1254 MIBenum: 2254 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1254) [Lazhintseva] Alias: None Name: windows-1255 MIBenum: 2255 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1255) [Lazhintseva] Alias: None Name: windows-1256 MIBenum: 2256 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1256) [Lazhintseva] Alias: None Name: windows-1257 MIBenum: 2257 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1257) [Lazhintseva] Alias: None Name: windows-1258 MIBenum: 2258 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1258) [Lazhintseva] Alias: None Name: TIS-620 MIBenum: 2259 Source: Thai Industrial Standards Institute (TISI) [Tantsetthi] Name: HZ-GB-2312 MIBenum: 2085 Source: RFC 1842, RFC 1843 [RFC1842, RFC1843] REFERENCES ---------- [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345, Rationel Almen Planlaegning, Rationel Almen Planlaegning, June 1992. [RFC1428] Vaudreuil, G., "Transition of Internet Mail from Just-Send-8 to 8bit-SMTP/MIME", RFC1428, CNRI, February 1993. [RFC1456] Vietnamese Standardization Working Group, "Conventions for Encoding the Vietnamese Language VISCII: VIetnamese Standard Code for Information Interchange VIQR: VIetnamese Quoted-Readable Specification Revision 1.1", RFC 1456, May 1993. [RFC1468] Murai, J., Crispin, M., and E. van der Poel, "Japanese Character Encoding for Internet Messages", RFC 1468, Keio University, Panda Programming, June 1993. [RFC1489] Chernov, A., "Registration of a Cyrillic Character Set", RFC1489, RELCOM Development Team, July 1993. [RFC1554] Ohta, M., and K. Handa, "ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP", RFC1554, Tokyo Institute of Technology, ETL, December 1993. [RFC1556] Nussbacher, H., "Handling of Bi-directional Texts in MIME", RFC1556, Israeli Inter-University, December 1993. [RFC1557] Choi, U., Chon, K., and H. Park, "Korean Character Encoding for Internet Messages", KAIST, Solvit Chosun Media, December 1993. [RFC1641] Goldsmith, D., and M. Davis, "Using Unicode with MIME", RFC1641, Taligent, Inc., July 1994. [RFC1642] Goldsmith, D., and M. Davis, "UTF-7", RFC1642, Taligent, Inc., July 1994. [RFC1815] Ohta, M., "Character Sets ISO-10646 and ISO-10646-J-1", RFC 1815, Tokyo Institute of Technology, July 1995. [Adobe] Adobe Systems Incorporated, PostScript Language Reference Manual, second edition, Addison-Wesley Publishing Company, Inc., 1990. [ECMA Registry] ISO-IR: International Register of Escape Sequences http://www.itscj.ipsj.or.jp/ISO-IE/ Note: The current registration authority is IPSJ/ITSCJ, Japan. [HP-PCL5] Hewlett-Packard Company, "HP PCL 5 Comparison Guide", (P/N 5021-0329) pp B-13, 1996. [IBM-CIDT] IBM Corporation, "ABOUT TYPE: IBM's Technical Reference for Core Interchange Digitized Type", Publication number S544-3708-01 [RFC1842] Wei, Y., J. Li, and Y. Jiang, "ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages", RFC 1842, Harvard University, Rice University, University of Maryland, August 1995. [RFC1843] Lee, F., "HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII Characters", RFC 1843, Stanford University, August 1995. [RFC2152] Goldsmith, D., M. Davis, "UTF-7: A Mail-Safe Transformation Format of Unicode", RFC 2152, Apple Computer, Inc., Taligent Inc., May 1997. [RFC2279] Yergeau, F., "UTF-8, A Transformation Format of ISO 10646", RFC 2279, Alis Technologies, January, 1998. [RFC2781] Hoffman, P., Yergeau, F., "UTF-16, an encoding of ISO 10646", RFC 2781, February 2000. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC3629, November 2003. PEOPLE ------ [KXS2] Keld Simonsen [Choi] Woohyong Choi [Davis] Mark Davis, , April 2002. [Lazhintseva] Katya Lazhintseva, , May 1996. [Mahdi] Tamer Mahdi, , August 2000. [Malyshev] Michael Malyshev, , January 2004 [Murai] Jun Murai [Nussbacher] Hank Nussbacher, [Ohta] Masataka Ohta, , July 1995. [Phipps] Toby Phipps, , March 2002. [Pond] Rick Pond, , March 1997. [Robrigado] Reuel Robrigado, , September 2002. [Scherer] Markus Scherer, , August 2000, September 2002. [Simonsen] Keld Simonsen, , August 2000. [Tantsetthi] Trin Tantsetthi, , September 1998. [Tumasonis] Vladas Tumasonis, , August 2000. [Uskov] Alexander Uskov, , September 2002. [Wendt] Chris Wendt, , December 1999. [Yick] Nicky Yick, , October 2000. [] zope.mimetype-1.3.1/src/zope/mimetype/codec.py000644 000766 000024 00000011366 11466575473 021302 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## import codecs import os import re from zope import interface, component from interfaces import ICodec, ICharsetCodec from interfaces import ICharset, ICodecPreferredCharset class Codec: interface.implements(ICodec) def __init__(self, name, title): self.name = name self.title = title ( self.encode, self.decode, self.reader, self.writer ) = codecs.lookup(name) def addCodec(name, title=None): codec = Codec(name, title) component.provideUtility(codec, provides=ICodec, name=name) class Charset: interface.implements(ICharset) def __init__(self, name, encoding): self.name = name self.encoding = encoding def addCharset(encoding, name, preferred=False): codec = component.getUtility(ICodec, name=encoding) charset = Charset(name, codec.name) component.provideUtility(charset, provides=ICharset, name=name) interface.alsoProvides(codec, ICharsetCodec) component.provideUtility(codec, provides=ICharsetCodec, name=name) if preferred: utility = component.queryUtility( ICodecPreferredCharset, name=codec.name) if utility is not None: raise ValueError("Codec already has a preferred charset.") interface.alsoProvides(charset, ICodecPreferredCharset) component.provideUtility( charset, provides=ICodecPreferredCharset, name=codec.name) FILENAME = "character-sets.txt" DATA_RE = re.compile( r'(Name|Alias|MIBenum):\s*(\S+)\s*(\(preferred MIME name\))?' ) def initialize(_context): # if any ICodec has been registered, we're done: for unused in component.getUtilitiesFor(ICodec): return _names = [] _codecs = {} _aliases = {} # alias -> codec name here = os.path.dirname(os.path.abspath(__file__)) fn = os.path.join(here, FILENAME) f = open(fn, "r") class Codec(object): preferred_alias = None def __init__(self, name): self.name = name self.aliases = [name.lower()] def findPyCodecs(self): self.pyCodecs = {} for alias in self.aliases: try: codec = codecs.lookup(alias) except LookupError: pass else: self.pyCodecs[alias] = codec for line in f: if not line.strip(): lastname = None continue m = DATA_RE.match(line) if m is None: continue type, name, preferred = m.groups() if type == "Name": if name in _codecs: raise ValueError("codec %s already exists" % name) _names.append(name) lastname = name _codecs[name] = Codec(name) if preferred: _codecs[name].preferred_alias = name.lower() elif type == "Alias" and name != "None": if not lastname: raise ValueError("Parsing failed. Alias found without a name.") name = name.lower() if name in _aliases: raise ValueError("Alias %s already exists." % name) codec = _codecs[lastname] codec.aliases.append(name) _aliases[name] = lastname if preferred: codec.preferred_alias = name f.close() for name in _names: codec = _codecs[name] codec.findPyCodecs() if codec.pyCodecs.get(codec.preferred_alias): pyName = codec.preferred_alias else: for pyName in codec.aliases: if pyName in codec.pyCodecs: break else: continue # not found under any name _context.action( discriminator=None, callable=addCodec, args=(pyName, codec.name), ) if not codec.preferred_alias: codec.preferred_alias = codec.aliases[0] for alias in codec.aliases: _context.action( discriminator=(pyName, alias), callable=addCharset, args=(pyName, alias, alias == codec.preferred_alias) ) zope.mimetype-1.3.1/src/zope/mimetype/codec.txt000644 000766 000024 00000005534 11466575473 021471 0ustar00macstaff000000 000000 ============== Codec handling ============== We can create codecs programatically. Codecs are registered as utilities for ICodec with the name of their python codec. >>> from zope import component >>> from zope.mimetype.interfaces import ICodec >>> from zope.mimetype.codec import addCodec >>> sorted(component.getUtilitiesFor(ICodec)) [] >>> addCodec('iso8859-1', 'Western (ISO-8859-1)') >>> codec = component.getUtility(ICodec, name='iso8859-1') >>> codec >>> codec.name 'iso8859-1' >>> addCodec('utf-8', 'Unicode (UTF-8)') >>> codec2 = component.getUtility(ICodec, name='utf-8') We can programmatically add charsets to a given codec. This registers each charset as a named utility for ICharset. It also registers the codec as a utility for ICharsetCodec with the name of the charset. >>> from zope.mimetype.codec import addCharset >>> from zope.mimetype.interfaces import ICharset, ICharsetCodec >>> sorted(component.getUtilitiesFor(ICharset)) [] >>> sorted(component.getUtilitiesFor(ICharsetCodec)) [] >>> addCharset(codec.name, 'latin1') >>> charset = component.getUtility(ICharset, name='latin1') >>> charset >>> charset.name 'latin1' >>> component.getUtility(ICharsetCodec, name='latin1') is codec True When adding a charset we can state that we want that charset to be the preferred charset for its codec. >>> addCharset(codec.name, 'iso8859-1', preferred=True) >>> addCharset(codec2.name, 'utf-8', preferred=True) A codec can have at most one preferred charset. >>> addCharset(codec.name, 'test', preferred=True) Traceback (most recent call last): ... ValueError: Codec already has a preferred charset. Preferred charsets are registered as utilities for ICodecPreferredCharset under the name of the python codec. >>> from zope.mimetype.interfaces import ICodecPreferredCharset >>> preferred = component.getUtility(ICodecPreferredCharset, name='iso8859-1') >>> preferred >>> preferred.name 'iso8859-1' >>> sorted(component.getUtilitiesFor(ICodecPreferredCharset)) [(u'iso8859-1', ), (u'utf-8', )] We can look up a codec by the name of its charset: >>> component.getUtility(ICharsetCodec, name='latin1') is codec True >>> component.getUtility(ICharsetCodec, name='utf-8') is codec2 True Or we can look up all codecs: >>> sorted(component.getUtilitiesFor(ICharsetCodec)) [(u'iso8859-1', ), (u'latin1', ), (u'test', ), (u'utf-8', )] zope.mimetype-1.3.1/src/zope/mimetype/configure.zcml000644 000766 000024 00000002223 11466575473 022513 0ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope/mimetype/constraints.txt000644 000766 000024 00000010243 11466575473 022754 0ustar00macstaff000000 000000 =================================== Constraint Functions for Interfaces =================================== The `zope.mimetype.interfaces` module defines interfaces that use some helper functions to define constraints on the accepted data. These helpers are used to determine whether values conform to the what's allowed for parts of a MIME type specification and other parts of a Content-Type header as specified in RFC 2045. Single Token ------------ The first is the simplest: the `tokenConstraint()` function returns `True` if the ASCII string it is passed conforms to the `token` production in section 5.1 of the RFC. Let's import the function:: >>> from zope.mimetype.interfaces import tokenConstraint Typical token are the major and minor parts of the MIME type and the parameter names for the Content-Type header. The function should return `True` for these values:: >>> tokenConstraint("text") True >>> tokenConstraint("plain") True >>> tokenConstraint("charset") True The function should also return `True` for unusual but otherwise normal token that may be used in some situations:: >>> tokenConstraint("not-your-fathers-token") True It must also allow extension tokens and vendor-specific tokens:: >>> tokenConstraint("x-magic") True >>> tokenConstraint("vnd.zope.special-data") True Since we expect input handlers to normalize values to lower case, upper case text is not allowed:: >>> tokenConstraint("Text") False Non-ASCII text is also not allowed:: >>> tokenConstraint("\x80") False >>> tokenConstraint("\xC8") False >>> tokenConstraint("\xFF") False Note that lots of characters are allowed in tokens, and there are no constraints that the token "look like" something a person would want to read:: >>> tokenConstraint(".-.-.-.") True Other characters are disallowed, however, including all forms of whitespace:: >>> tokenConstraint("foo bar") False >>> tokenConstraint("foo\tbar") False >>> tokenConstraint("foo\nbar") False >>> tokenConstraint("foo\rbar") False >>> tokenConstraint("foo\x7Fbar") False Whitespace before or after the token is not accepted either:: >>> tokenConstraint(" text") False >>> tokenConstraint("plain ") False Other disallowed characters are defined in the `tspecials` production from the RFC (also in section 5.1):: >>> tokenConstraint("(") False >>> tokenConstraint(")") False >>> tokenConstraint("<") False >>> tokenConstraint(">") False >>> tokenConstraint("@") False >>> tokenConstraint(",") False >>> tokenConstraint(";") False >>> tokenConstraint(":") False >>> tokenConstraint("\\") False >>> tokenConstraint('"') False >>> tokenConstraint("/") False >>> tokenConstraint("[") False >>> tokenConstraint("]") False >>> tokenConstraint("?") False >>> tokenConstraint("=") False A token must contain at least one character, so `tokenConstraint()` returns false for an empty string:: >>> tokenConstraint("") False MIME Type --------- A MIME type is specified using two tokens separated by a slash; whitespace between the tokens and the slash must be normalized away in the input handler. The `mimeTypeConstraint()` function is available to test a normalized MIME type value; let's import that function now:: >>> from zope.mimetype.interfaces import mimeTypeConstraint Let's test some common MIME types to make sure the function isn't obviously insane:: >>> mimeTypeConstraint("text/plain") True >>> mimeTypeConstraint("application/xml") True >>> mimeTypeConstraint("image/svg+xml") True If parts of the MIME type are missing, it isn't accepted:: >>> mimeTypeConstraint("text") False >>> mimeTypeConstraint("text/") False >>> mimeTypeConstraint("/plain") False As for individual tokens, whitespace is not allowed:: >>> mimeTypeConstraint("foo bar/plain") False >>> mimeTypeConstraint("text/foo bar") False Whitespace is not accepted around the slash either:: >>> mimeTypeConstraint("text /plain") False >>> mimeTypeConstraint("text/ plain") False Surrounding whitespace is also not accepted:: >>> mimeTypeConstraint(" text/plain") False >>> mimeTypeConstraint("text/plain ") False zope.mimetype-1.3.1/src/zope/mimetype/contentinfo.py000644 000766 000024 00000005634 11466575473 022554 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Default IContentInfo implementation. """ __docformat__ = "reStructuredText" import zope.component import zope.contenttype.parse import zope.interface import zope.mimetype.interfaces class ContentInfo(object): """Basic IContentInfo that provides information from an IContentTypeAware. """ zope.interface.implements( zope.mimetype.interfaces.IContentInfo) zope.component.adapts( zope.interface.Interface) def __init__(self, context): self.context = context aware = zope.mimetype.interfaces.IContentTypeAware(context) self.effectiveMimeType = aware.mimeType self.effectiveParameters = dict(aware.parameters) if self.effectiveParameters: encoded = zope.mimetype.interfaces.IContentTypeEncoded.providedBy( context) if "charset" in self.effectiveParameters and not encoded: del self.effectiveParameters["charset"] major, minor = self.effectiveMimeType.split("/") self.contentType = zope.contenttype.parse.join( (major, minor, self.effectiveParameters)) else: self.contentType = self.effectiveMimeType def getCodec(self): if "_codec" in self.__dict__: return self._codec isencoded = zope.mimetype.interfaces.IContentTypeEncoded.providedBy( self.context) if isencoded: charset = self.effectiveParameters.get("charset") if charset: utility = zope.component.queryUtility( zope.mimetype.interfaces.ICharset, charset) if utility is None: raise ValueError("unsupported charset: %r" % charset) codec = zope.component.getUtility( zope.mimetype.interfaces.ICodec, utility.encoding) self._codec = codec else: raise ValueError("charset not known") else: self._codec = None return self._codec def decode(self, s): codec = self.getCodec() if codec is not None: text, consumed = codec.decode(s) if consumed != len(s): raise ValueError("data not completely consumed") return text else: raise ValueError("no matching codec found") zope.mimetype-1.3.1/src/zope/mimetype/contentinfo.txt000644 000766 000024 00000015102 11466575473 022732 0ustar00macstaff000000 000000 =================================== Minimal IContentInfo Implementation =================================== The `zope.mimetype.contentinfo` module provides a minimal `IContentInfo` implementation that adds no information to what's provided by a content object. This represents the most conservative content-type policy that might be useful. Let's take a look at how this operates by creating a couple of concrete content-type interfaces:: >>> from zope.mimetype import interfaces >>> class ITextPlain(interfaces.IContentTypeEncoded): ... """text/plain""" >>> class IApplicationOctetStream(interfaces.IContentType): ... """application/octet-stream""" Now, we'll create a minimal content object that provide the necessary information:: >>> import zope.interface >>> class Content(object): ... zope.interface.implements(interfaces.IContentTypeAware) ... ... def __init__(self, mimeType, charset=None): ... self.mimeType = mimeType ... self.parameters = {} ... if charset: ... self.parameters["charset"] = charset We can now create examples of both encoded and non-encoded content:: >>> encoded = Content("text/plain", "utf-8") >>> zope.interface.alsoProvides(encoded, ITextPlain) >>> unencoded = Content("application/octet-stream") >>> zope.interface.alsoProvides(unencoded, IApplicationOctetStream) The minimal IContentInfo implementation only exposes the information available to it from the base content object. Let's take a look at the unencoded content first:: >>> from zope.mimetype import contentinfo >>> ci = contentinfo.ContentInfo(unencoded) >>> ci.effectiveMimeType 'application/octet-stream' >>> ci.effectiveParameters {} >>> ci.contentType 'application/octet-stream' For unencoded content, there is never a codec:: >>> print ci.getCodec() None It is also disallowed to try decoding such content:: >>> ci.decode("foo") Traceback (most recent call last): ... ValueError: no matching codec found Attemping to decode data using an uncoded object causes an exception to be raised:: >>> print ci.decode("data") Traceback (most recent call last): ... ValueError: no matching codec found If we try this with encoded data, we get somewhat different behavior:: >>> ci = contentinfo.ContentInfo(encoded) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {'charset': 'utf-8'} >>> ci.contentType 'text/plain;charset=utf-8' The `getCodec()` and `decode()` methods can be used to handle encoded data using the encoding indicated by the ``charset`` parameter. Let's store some UTF-8 data in a variable:: >>> utf8_data = unicode("\xAB\xBB", "iso-8859-1").encode("utf-8") >>> utf8_data '\xc2\xab\xc2\xbb' We want to be able to decode the data using the `IContentInfo` object. Let's try getting the corresponding `ICodec` object using `getCodec()`:: >>> codec = ci.getCodec() Traceback (most recent call last): ... ValueError: unsupported charset: 'utf-8' So, we can't proceed without some further preparation. What we need is to register an `ICharset` for UTF-8. The `ICharset` will need a reference (by name) to a `ICodec` for UTF-8. So let's create those objects and register them:: >>> import codecs >>> from zope.mimetype.i18n import _ >>> class Utf8Codec(object): ... zope.interface.implements(interfaces.ICodec) ... ... name = "utf-8" ... title = _("UTF-8") ... ... def __init__(self): ... ( self.encode, ... self.decode, ... self.reader, ... self.writer ... ) = codecs.lookup(self.name) >>> utf8_codec = Utf8Codec() >>> class Utf8Charset(object): ... zope.interface.implements(interfaces.ICharset) ... ... name = utf8_codec.name ... encoding = name >>> utf8_charset = Utf8Charset() >>> import zope.component >>> zope.component.provideUtility( ... utf8_codec, interfaces.ICodec, utf8_codec.name) >>> zope.component.provideUtility( ... utf8_charset, interfaces.ICharset, utf8_charset.name) Now that that's been initialized, let's try getting the codec again:: >>> codec = ci.getCodec() >>> codec.name 'utf-8' >>> codec.decode(utf8_data) (u'\xab\xbb', 4) We can now check that the `decode()` method of the `IContentInfo` will decode the entire data, returning the Unicode representation of the text:: >>> ci.decode(utf8_data) u'\xab\xbb' Another possibilty, of course, is that you have content that you know is encoded text of some sort, but you don't actually know what encoding it's in:: >>> encoded2 = Content("text/plain") >>> zope.interface.alsoProvides(encoded2, ITextPlain) >>> ci = contentinfo.ContentInfo(encoded2) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {} >>> ci.contentType 'text/plain' >>> ci.getCodec() Traceback (most recent call last): ... ValueError: charset not known It's also possible that the initial content type information for an object is incorrect for some reason. If the browser provides a content type of "text/plain; charset=utf-8", the content will be seen as encoded. A user correcting this content type using UI elements can cause the content to be considered un-encoded. At this point, there should no longer be a charset parameter to the content type, and the content info object should reflect this, though the previous encoding information will be retained in case the content type should be changed to an encoded type in the future. Let's see how this behavior will be exhibited in this API. We'll start by creating some encoded content:: >>> content = Content("text/plain", "utf-8") >>> zope.interface.alsoProvides(content, ITextPlain) We can see that the encoding information is included in the effective MIME type information provided by the content-info object:: >>> ci = contentinfo.ContentInfo(content) >>> ci.effectiveMimeType 'text/plain' >>> ci.effectiveParameters {'charset': 'utf-8'} We now change the content type information for the object:: >>> ifaces = zope.interface.directlyProvidedBy(content) >>> ifaces -= ITextPlain >>> ifaces += IApplicationOctetStream >>> zope.interface.directlyProvides(content, *ifaces) >>> content.mimeType = 'application/octet-stream' At this point, a content type object would provide different information:: >>> ci = contentinfo.ContentInfo(content) >>> ci.effectiveMimeType 'application/octet-stream' >>> ci.effectiveParameters {} The underlying content type parameters still contain the original encoding information, however:: >>> content.parameters {'charset': 'utf-8'} zope.mimetype-1.3.1/src/zope/mimetype/event.py000644 000766 000024 00000004122 11466575473 021336 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Implementation of and support for the `IContentTypeChangedEvent`. """ __docformat__ = "reStructuredText" import zope.event import zope.component.interfaces import zope.interface import zope.mimetype.interfaces import zope.security.proxy class ContentTypeChangedEvent(zope.component.interfaces.ObjectEvent): zope.interface.implements( zope.mimetype.interfaces.IContentTypeChangedEvent) def __init__(self, object, oldContentType, newContentType): super(ContentTypeChangedEvent, self).__init__(object) self.newContentType = newContentType self.oldContentType = oldContentType def changeContentType(object, newContentType): """Set the content type interface for the object. If this represents a change, an `IContentTypeChangedEvent` will be fired. """ ifaces = zope.interface.directlyProvidedBy(object) oldContentType = None for iface in ifaces: if zope.mimetype.interfaces.IContentTypeInterface.providedBy(iface): oldContentType = iface ifaces -= iface break if newContentType is not oldContentType: # update the interfaces for the object: if newContentType is not None: ifaces += newContentType zope.interface.directlyProvides(object, *ifaces) # fire the event: event = ContentTypeChangedEvent( zope.security.proxy.ProxyFactory(object), oldContentType, newContentType) zope.event.notify(event) zope.mimetype-1.3.1/src/zope/mimetype/event.txt000644 000766 000024 00000006373 11466575473 021537 0ustar00macstaff000000 000000 =============================== Events and content-type changes =============================== The `IContentTypeChangedEvent` is fired whenever an object's `IContentTypeInterface` is changed. This includes the cases when a content type interface is applied to an object that doesn't have one, and when the content type interface is removed from an object. Let's start the demonstration by defining a subscriber for the event that simply prints out the information from the event object:: >>> def handler(event): ... print "changed content type interface:" ... print " from:", event.oldContentType ... print " to:", event.newContentType We'll also define a simple content object:: >>> import zope.interface >>> class IContent(zope.interface.Interface): ... pass >>> class Content(object): ... ... zope.interface.implements(IContent) ... ... def __str__(self): ... return "" >>> obj = Content() We'll also need a couple of content type interfaces:: >>> from zope.mimetype import interfaces >>> class ITextPlain(interfaces.IContentTypeEncoded): ... """text/plain""" >>> ITextPlain.setTaggedValue("mimeTypes", ["text/plain"]) >>> ITextPlain.setTaggedValue("extensions", [".txt"]) >>> zope.interface.directlyProvides( ... ITextPlain, interfaces.IContentTypeInterface) >>> class IOctetStream(interfaces.IContentType): ... """application/octet-stream""" >>> IOctetStream.setTaggedValue("mimeTypes", ["application/octet-stream"]) >>> IOctetStream.setTaggedValue("extensions", [".bin"]) >>> zope.interface.directlyProvides( ... IOctetStream, interfaces.IContentTypeInterface) Let's register our subscriber:: >>> import zope.component >>> import zope.component.interfaces >>> zope.component.provideHandler( ... handler, ... (zope.component.interfaces.IObjectEvent,)) Changing the content type interface on an object is handled by the `zope.mimetype.event.changeContentType()` function. Let's import that module and demonstrate that the expected event is fired appropriately:: >>> from zope.mimetype import event Since the object currently has no content type interface, "removing" the interface does not affect the object and the event is not fired:: >>> event.changeContentType(obj, None) Setting a content type interface on an object that doesn't have one will cause the event to be fired, with the `.oldContentType` attribute on the event set to `None`:: >>> event.changeContentType(obj, ITextPlain) changed content type interface: from: None to: Calling the `changeContentType()` function again with the same "new" content type interface causes no change, so the event is not fired again:: >>> event.changeContentType(obj, ITextPlain) Providing a new interface does cause the event to be fired again:: >>> event.changeContentType(obj, IOctetStream) changed content type interface: from: to: Similarly, removing the content type interface triggers the event as well:: >>> event.changeContentType(obj, None) changed content type interface: from: to: None zope.mimetype-1.3.1/src/zope/mimetype/i18n.py000644 000766 000024 00000002136 11466575473 020777 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """I18N support for the zope.mime package. This defines a `MessageFactory` for the I18N domain for the zope.mimetype package. This is normally used with this import:: from i18n import MessageFactory as _ The factory is then used normally. Two examples:: text = _('some internationalized text') text = _('helpful-descriptive-message-id', 'default text') """ __docformat__ = "reStructuredText" from zope import i18nmessageid MessageFactory = _ = i18nmessageid.MessageFactory("zope.mimetype") zope.mimetype-1.3.1/src/zope/mimetype/icons/000755 000766 000024 00000000000 11466575503 020751 5ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope/mimetype/interfaces.py000644 000766 000024 00000024654 11466575473 022354 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """interfaces for mimetype package $Id: interfaces.py 112083 2010-05-06 13:23:33Z tseaver $ """ import re import zope.component.interfaces import zope.schema.interfaces from zope import interface, schema from zope.configuration.fields import MessageID from i18n import _ # Note that MIME types and content type parameter names are considered # case-insensitive. For our purposes, they should always be # lower-cased on input. This should be handled by the input machinery # (widgets) rather than in the application or MIME-handling code. # Constraints defined here specify lower case values. # # The MIME type is defined to be two tokens separated by a slash; for # our purposes, any whitespace between the tokens must be normalized # by removing it. This too should be handled by input mechanisms. # This RE really assumes you know the ASCII codes; note that upper # case letters are not accepted; tokens must be normalized. # http://czyborra.com/charsets/iso646.html # http://www.faqs.org/rfcs/rfc2045.html _token_re = r"[!#$%&'*+\-.\d^_`a-z{|}~]+" _token_rx = re.compile("%s$" % _token_re) _mime_type_rx = re.compile("%s/%s$" % (_token_re, _token_re)) # These helpers are used to define constraints for specific schema # fields. Documentation and tests for these are located in # constraints.txt. def mimeTypeConstraint(value): """Return `True` iff `value` is a syntactically legal MIME type.""" return _mime_type_rx.match(value) is not None def tokenConstraint(value): """Return `True` iff `value` is a syntactically legal RFC 2045 token.""" return _token_rx.match(value) is not None class IContentTypeAware(interface.Interface): """Interface for MIME content type information. Objects that can provide content type information about the data they contain, such as file objects, should be adaptable to this interface. """ parameters = schema.Dict( title=_('Mime Type Parameters'), description=_("The MIME type parameters (such as charset)."), required=True, key_type=schema.ASCIILine(constraint=tokenConstraint) ) mimeType = schema.ASCIILine( title=_('Mime Type'), description=_("The mime type explicitly specified for the object" " that this MIME information describes, if any. " " May be None, or an ASCII MIME type string of the" " form major/minor."), constraint=mimeTypeConstraint, required=False, ) class IContentTypeInterface(interface.Interface): """Interface that describes a logical mime type. Interfaces that provide this interfaces are content-type interfaces. Most MIME types are described by the IANA MIME-type registry (http://www.iana.org/assignments/media-types/). """ class IContentType(interface.Interface): """Marker interface for objects that represent content with a MIME type. """ interface.directlyProvides(IContentType, IContentTypeInterface) class IContentTypeEncoded(IContentType): """Marker interface for content types that care about encoding. This does not imply that encoding information is known for a specific object. Content types that derive from `IContentTypeEncoded` support a content type parameter named 'charset', and that parameter is used to control encoding and decoding of the text. For example, interfaces for text/* content types all derive from this base interface. """ interface.directlyProvides(IContentTypeEncoded, IContentTypeInterface) class IContentTypeChangedEvent(zope.component.interfaces.IObjectEvent): """The content type for an object has changed. All changes of the `IContentTypeInterface` for an object are reported by this event, including the setting of an initial content type and the removal of the content type interface. This event should only be used if the content type actually changes. """ newContentType = interface.Attribute( """Content type interface after the change, if any, or `None`. """) oldContentType = interface.Attribute( """Content type interface before the change, if any, or `None`. """) class IContentTypeTerm(zope.schema.interfaces.ITitledTokenizedTerm): """Extended term that describes a content type interface.""" mimeTypes = schema.List( title=_("MIME types"), description=_("List of MIME types represented by this interface;" " the first should be considered the preferred" " MIME type."), required=True, min_length=1, value_type=schema.ASCIILine(constraint=mimeTypeConstraint), readonly=True, ) extensions = schema.List( title=_("Extensions"), description=_("Filename extensions commonly associated with this" " type of file."), required=True, min_length=0, readonly=True, ) class IContentTypeSource(zope.schema.interfaces.ISource, zope.schema.interfaces.IIterableSource): """Source for content types.""" class IContentInfo(interface.Interface): # XXX """Interface describing effective MIME type information. When using MIME data from an object, an application should adapt the object to this interface to determine how it should be interpreted. This may be different from the information """ effectiveMimeType = schema.ASCIILine( title=_("Effective MIME type"), description=_("MIME type that should be reported when" " downloading the document this `IContentInfo`" " object is for."), required=False, constraint=mimeTypeConstraint, ) effectiveParameters = schema.Dict( title=_("Effective parameters"), description=_("Content-Type parameters that should be reported " " when downloading the document this `IContentInfo`" " object is for."), required=True, key_type=schema.ASCIILine(constraint=tokenConstraint), value_type=schema.ASCII(), ) contentType = schema.ASCIILine( title=_("Content type"), description=_("The value of the Content-Type header," " including both the MIME type and any parameters."), required=False, ) def getCodec(): """Return an `ICodec` that should be used to decode/encode data. This should return `None` if the object's `IContentType` interface does not derive from `IContentTypeEncoded`. If the content type is encoded and no encoding information is available in the `effectiveParameters`, this method may return None, or may provide a codec based on application policy. If `effectiveParameters` indicates a specific charset, and no codec is registered to support that charset, `ValueError` will be raised. """ def decode(s): """Return the decoding of `s` based on the effective encoding. The effective encoding is determined by the return from the `getCodec()` method. `ValueError` is raised if no codec can be found for the effective charset. """ class IMimeTypeGetter(interface.Interface): """A utility that looks up a MIME type string.""" def __call__(name=None, data=None, content_type=None): """Look up a MIME type. If a MIME type cannot be determined based on the input, this returns `None`. """ class ICharsetGetter(interface.Interface): """A utility that looks up a character set (charset).""" def __call__(name=None, data=None, content_type=None): """Look up a charset. If a charset cannot be determined based on the input, this returns `None`. """ class ICodec(interface.Interface): """Information about a codec.""" name = schema.ASCIILine( title=_('Name'), description=_('The name of the Python codec.'), required=True, ) title = MessageID( title=_('Title'), description=_('The human-readable name of this codec.'), required=True, ) def encode(input, errors='strict'): """Encodes the input and returns a tuple (output, length consumed). """ def decode(input, errors='strict'): """Decodes the input and returns a tuple (output, length consumed). """ def reader(stream, errors='strict'): """Construct a StreamReader object for this codec.""" def writer(stream, errors='strict'): """Construct a StramWriter object for this codec.""" class ICharset(interface.Interface): """Information about a charset""" name = schema.ASCIILine( title=_('Name'), description=_("The charset name. This is what is used for the " "'charset' parameter in content-type headers."), required=True, ) encoding = schema.ASCIILine( # This *must* match the `name` of the ICodec that's used to # handle this charset. title=_('Encoding'), description=_("The id of the encoding used for this charset."), required=True, ) class ICodecPreferredCharset(interface.Interface): """Marker interface for locating the preferred charset for a Codec.""" class ICharsetCodec(interface.Interface): """Marker interface for locating the codec for a given charset.""" class ICodecTerm(zope.schema.interfaces.ITitledTokenizedTerm): """Extended term that describes a content type interface.""" preferredCharset = schema.ASCIILine( title=_("Preferred Charset"), description=_("Charset that should be used to represent the codec"), required=False, readonly=True, ) class ICodecSource(zope.schema.interfaces.IIterableSource): """Source for codecs.""" zope.mimetype-1.3.1/src/zope/mimetype/meta.zcml000644 000766 000024 00000001144 11466575473 021461 0ustar00macstaff000000 000000 zope.mimetype-1.3.1/src/zope/mimetype/retrieving_mime_types.txt000644 000766 000024 00000004565 11466575473 025030 0ustar00macstaff000000 000000 =================================== Retrieving Content Type Information =================================== MIME Types ---------- We'll start by initializing the interfaces and registrations for the content type interfaces. This is normally done via ZCML. >>> from zope.mimetype import types >>> types.setup() A utility is used to retrieve MIME types. >>> from zope import component >>> from zope.mimetype import typegetter >>> from zope.mimetype.interfaces import IMimeTypeGetter >>> component.provideUtility(typegetter.smartMimeTypeGuesser, ... provides=IMimeTypeGetter) >>> mime_getter = component.getUtility(IMimeTypeGetter) To map a particular file name, file contents, and content type to a MIME type. >>> mime_getter(name='file.txt', data='A text file.', ... content_type='text/plain') 'text/plain' In the default implementation if not enough information is given to discern a MIME type, None is returned. >>> mime_getter() is None True Character Sets -------------- A utility is also used to retrieve character sets (charsets). >>> from zope.mimetype.interfaces import ICharsetGetter >>> component.provideUtility(typegetter.charsetGetter, ... provides=ICharsetGetter) >>> charset_getter = component.getUtility(ICharsetGetter) To map a particular file name, file contents, and content type to a charset. >>> charset_getter(name='file.txt', data='This is a text file.', ... content_type='text/plain;charset=ascii') 'ascii' In the default implementation if not enough information is given to discern a charset, None is returned. >>> charset_getter() is None True Finding Interfaces ------------------ Given a MIME type we need to be able to find the appropriate interface. >>> from zope.mimetype.interfaces import IContentTypeInterface >>> component.getUtility(IContentTypeInterface, name=u'text/plain') It is also possible to enumerate all content type interfaces. >>> utilities = list(component.getUtilitiesFor(IContentTypeInterface)) If you want to find an interface from a MIME string, you can use the utilityies. >>> component.getUtility(IContentTypeInterface, name='text/plain') zope.mimetype-1.3.1/src/zope/mimetype/source.py000644 000766 000024 00000013417 11466575473 021524 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Sources for IContentTypeInterface providers and codecs. """ __docformat__ = "reStructuredText" import sys from zope.browser.interfaces import ITerms import zope.component import zope.mimetype.interfaces import zope.publisher.interfaces.browser # Base classes class UtilitySource(object): """Source of utilities providing a specific interface.""" def __init__(self): self._length = None def __contains__(self, value): ok = self._interface.providedBy(value) if ok: for name, interface in zope.component.getUtilitiesFor( self._interface): if interface is value: return True return False def __iter__(self): length = 0 seen = set() # haven't been iterated over all the way yet, go ahead and # build the cached results list for name, interface in zope.component.getUtilitiesFor( self._interface): if interface not in seen: seen.add(interface) yield interface self._length = length def __len__(self): if self._length is None: self._length = len(list(iter(self))) return self._length class Terms(object): """Utility to provide terms for content type interfaces.""" zope.interface.implements(ITerms) def __init__(self, source, request): self.context = source self.request = request def getTerm(self, value): if value in self.context: return self._createTerm(value) raise LookupError("value is not an element in the source") # Source & vocabulary for `IContentTypeInterface` providers class ContentTypeSource(UtilitySource): """Source of IContentTypeInterface providers.""" zope.interface.implements( zope.mimetype.interfaces.IContentTypeSource) _interface = zope.mimetype.interfaces.IContentTypeInterface class ContentTypeTerms(Terms): """Utility to provide terms for content type interfaces.""" zope.component.adapts( zope.mimetype.interfaces.IContentTypeSource, zope.publisher.interfaces.browser.IBrowserRequest) def getValue(self, token): module, name = token.rsplit(".", 1) if module not in sys.modules: try: __import__(module) except ImportError: raise LookupError("could not import module for token") interface = getattr(sys.modules[module], name) if interface in self.context: return interface raise LookupError("token does not represent an element in the source") def _createTerm(self, value): return ContentTypeTerm(value) class ContentTypeTerm(object): zope.interface.implements( zope.mimetype.interfaces.IContentTypeTerm) def __init__(self, interface): self.value = interface @property def token(self): return "%s.%s" % (self.value.__module__, self.value.__name__) @property def title(self): return self.value.getTaggedValue("title") @property def mimeTypes(self): return self.value.getTaggedValue("mimeTypes") @property def extensions(self): return self.value.getTaggedValue("extensions") contentTypeSource = ContentTypeSource() # Source & vocabulary for `IContentTypeInterface` providers class CodecSource(UtilitySource): """Source of ICodec providers.""" zope.interface.implements( zope.mimetype.interfaces.ICodecSource) _interface = zope.mimetype.interfaces.ICodec class CodecTerms(Terms): """Utility to provide terms for codecs.""" zope.component.adapts( zope.mimetype.interfaces.ICodecSource, zope.publisher.interfaces.browser.IBrowserRequest) def getValue(self, token): codec = zope.component.queryUtility( zope.mimetype.interfaces.ICodec, token) if codec is None: raise LookupError("no matching code: %r" % token) if codec not in self.context: raise LookupError("codec not in source: %r" % token) return codec def _createTerm(self, value): return CodecTerm(value) class CodecTerm(object): zope.interface.implements( zope.mimetype.interfaces.ICodecTerm) def __init__(self, codec): self.value = codec @property def token(self): return self.value.name @property def title(self): return self.value.title @property def preferredCharset(self): charset = zope.component.queryUtility( zope.mimetype.interfaces.ICodecPreferredCharset, name=self.value.name) if charset is None: available = [(name, charset) for (charset, name) in zope.component.getUtilitiesFor( zope.mimetype.interfaces.ICharset) if charset.encoding == self.value.name] if not available: # no charsets are available; should not happen in practice return None # no charset marked preferred; pick one available.sort() charset = available[0][1] return charset.name codecSource = CodecSource() zope.mimetype-1.3.1/src/zope/mimetype/source.txt000644 000766 000024 00000006460 11466575473 021713 0ustar00macstaff000000 000000 =============================== Source for MIME type interfaces =============================== Some sample interfaces have been created in the zope.mimetype.tests module for use in this test. Let's import them:: >>> from zope.mimetype.tests import ( ... ISampleContentTypeOne, ISampleContentTypeTwo) The source should only include `IContentTypeInterface` interfaces that have been registered. Let's register one of these two interfaces so we can test this:: >>> import zope.component >>> from zope.mimetype.interfaces import IContentTypeInterface >>> zope.component.provideUtility( ... ISampleContentTypeOne, IContentTypeInterface, name="type/one") >>> zope.component.provideUtility( ... ISampleContentTypeOne, IContentTypeInterface, name="type/two") We should see that these interfaces are included in the source:: >>> from zope.mimetype import source >>> s = source.ContentTypeSource() >>> ISampleContentTypeOne in s True >>> ISampleContentTypeTwo in s False Interfaces that do not implement the `IContentTypeInterface` are not included in the source:: >>> import zope.interface >>> class ISomethingElse(zope.interface.Interface): ... """This isn't a content type interface.""" >>> ISomethingElse in s False The source is iterable, so we can get a list of the values:: >>> values = list(s) >>> len(values) 1 >>> values[0] is ISampleContentTypeOne True We can get terms for the allowed values:: >>> terms = source.ContentTypeTerms(s, None) >>> t = terms.getTerm(ISampleContentTypeOne) >>> terms.getValue(t.token) is ISampleContentTypeOne True Interfaces that are not in the source cause an error when a term is requested:: >>> terms.getTerm(ISomethingElse) Traceback (most recent call last): ... LookupError: value is not an element in the source The term provides a token based on the module name of the interface:: >>> t.token 'zope.mimetype.tests.ISampleContentTypeOne' The term also provides the title based on the "title" tagged value from the interface:: >>> t.title u'Type One' Each interface provides a list of MIME types with which the interface is associated. The term object provides access to this list:: >>> t.mimeTypes ['type/one', 'type/foo'] A list of common extensions for files of this type is also available, though it may be empty:: >>> t.extensions [] The term's value, of course, is the interface passed in:: >>> t.value is ISampleContentTypeOne True This extended term API is defined by the `IContentTypeTerm` interface:: >>> from zope.mimetype.interfaces import IContentTypeTerm >>> IContentTypeTerm.providedBy(t) True The value can also be retrieved using the `getValue()` method:: >>> iface = terms.getValue('zope.mimetype.tests.ISampleContentTypeOne') >>> iface is ISampleContentTypeOne True Attempting to retrieve an interface that isn't in the source using the terms object generates a LookupError:: >>> terms.getValue('zope.mimetype.tests.ISampleContentTypeTwo') Traceback (most recent call last): ... LookupError: token does not represent an element in the source Attempting to look up a junk token also generates an error:: >>> terms.getValue('just.some.dotted.name.that.does.not.exist') Traceback (most recent call last): ... LookupError: could not import module for token zope.mimetype-1.3.1/src/zope/mimetype/tests.py000644 000766 000024 00000005423 11466575473 021364 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Test harness for `zope.mimetype`. $Id: tests.py 112083 2010-05-06 13:23:33Z tseaver $ """ import unittest import doctest from zope.component import testing import zope.interface import zope.mimetype.interfaces class ISampleContentTypeOne(zope.interface.Interface): """This is a sample content type interface.""" ISampleContentTypeOne.setTaggedValue("title", u"Type One") ISampleContentTypeOne.setTaggedValue("extensions", []) ISampleContentTypeOne.setTaggedValue("mimeTypes", ["type/one", "type/foo"]) zope.interface.directlyProvides( ISampleContentTypeOne, zope.mimetype.interfaces.IContentTypeInterface) class ISampleContentTypeTwo(zope.interface.Interface): """This is a sample content type interface.""" ISampleContentTypeTwo.setTaggedValue("title", u"Type Two") ISampleContentTypeTwo.setTaggedValue("mimeTypes", [".ct2", ".ct3"]) ISampleContentTypeTwo.setTaggedValue("mimeTypes", ["type/two"]) zope.interface.directlyProvides( ISampleContentTypeTwo, zope.mimetype.interfaces.IContentTypeInterface) def test_suite(): return unittest.TestSuite(( doctest.DocFileSuite('retrieving_mime_types.txt', setUp=testing.setUp, tearDown=testing.tearDown), doctest.DocFileSuite('event.txt', setUp=testing.setUp, tearDown=testing.tearDown), doctest.DocFileSuite('source.txt', setUp=testing.setUp, tearDown=testing.tearDown), doctest.DocFileSuite('constraints.txt'), doctest.DocFileSuite('contentinfo.txt', setUp=testing.setUp, tearDown=testing.tearDown), doctest.DocFileSuite('typegetter.txt'), doctest.DocFileSuite('utils.txt'), doctest.DocFileSuite('widget.txt', setUp=testing.setUp, tearDown=testing.tearDown), doctest.DocFileSuite( 'codec.txt', optionflags=doctest.NORMALIZE_WHITESPACE|doctest.ELLIPSIS, ), )) if __name__ == '__main__': unittest.main(defaultTest='test_suite') zope.mimetype-1.3.1/src/zope/mimetype/typegetter.py000644 000766 000024 00000011476 11466575473 022423 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## # There's a zope.contenttype module that exports a similar API, # but that's pretty hueristic. Some of this should perhaps be folded # back into that, or this package could provide a replacement. # import mimetypes import codecs from zope import interface from zope.mimetype import interfaces import zope.contenttype.parse def mimeTypeGetter(name=None, data=None, content_type=None): if name is None and data is None and content_type is None: return None if content_type: try: major, minor, params = zope.contenttype.parse.parseOrdered( content_type) except ValueError: pass else: return "%s/%s" % (major, minor) return None interface.directlyProvides(mimeTypeGetter, interfaces.IMimeTypeGetter) def mimeTypeGuesser(name=None, data=None, content_type=None): if name is None and data is None and content_type is None: return None mimeType = mimeTypeGetter(name=name, data=data, content_type=content_type) if name and not mimeType: mimeType, encoding = mimetypes.guess_type(name, strict=True) if not mimeType: mimeType, encoding = mimetypes.guess_type(name, strict=False) # # XXX If `encoding` is not None, we should re-consider the # guess, since the encoding here is Content-Encoding, not # charset. In particular, things like .tar.gz map to # ('application/x-tar', 'gzip'), which may require different # handling, or at least a separate content-type. if data and not mimeType: # no idea, really, but let's sniff a few common things: for prefix, type, charset in _prefix_table: if data.startswith(prefix): mimeType = type break return mimeType interface.directlyProvides(mimeTypeGuesser, interfaces.IMimeTypeGetter) def smartMimeTypeGuesser(name=None, data=None, content_type=None): mimeType = mimeTypeGuesser(name=name, data=data, content_type=content_type) if data and mimeType == "text/html": for prefix, type, charset in _xml_prefix_table: if data.startswith(prefix): # don't use text/xml from the table, but take # advantage of the text/html hint (from the upload # or mimetypes.guess_type()) mimeType = "application/xhtml+xml" return mimeType interface.directlyProvides(smartMimeTypeGuesser, interfaces.IMimeTypeGetter) # Very simple magic numbers table for a few things we want to be good # at identifying even if we get no help from the input: # _xml_prefix_table = ( # prefix, mimeType, charset (">> from zope.mimetype import typegetter MIME types ---------- There are a number of interesting MIME-type extractors: `mimeTypeGetter()` A minimal extractor that never attempts to guess. `mimeTypeGuesser()` An extractor that tries to guess the content type based on the name and data if the input contains no content type information. `smartMimeTypeGuesser()` An extractor that checks the content for a variety of constructs to try and refine the results of the `mimeTypeGuesser()`. This is able to do things like check for XHTML that's labelled as HTML in upload data. `mimeTypeGetter()` ~~~~~~~~~~~~~~~~~~ We'll start with the simplest, which does no content-based guessing at all, but uses the information provided by the browser directly. If the browser did not provide any content-type information, or if it cannot be parsed, the extractor simply asserts a "safe" MIME type of application/octet-stream. (The rationale for selecting this type is that since there's really nothing productive that can be done with it other than download it, it's impossible to mis-interpret the data.) When there's no information at all about the content, the extractor returns None:: >>> print typegetter.mimeTypeGetter() None Providing only the upload filename or data, or both, still produces None, since no guessing is being done:: >>> print typegetter.mimeTypeGetter(name="file.html") None >>> print typegetter.mimeTypeGetter(data="...") None >>> print typegetter.mimeTypeGetter( ... name="file.html", data="...") None If a content type header is available for the input, that is used since that represents explicit input from outside the application server. The major and minor parts of the content type are extracted and returned as a single string:: >>> typegetter.mimeTypeGetter(content_type="text/plain") 'text/plain' >>> typegetter.mimeTypeGetter(content_type="text/plain; charset=utf-8") 'text/plain' If the content-type information is provided but malformed (not in conformance with RFC 2822), it is ignored, since the intent cannot be reliably guessed:: >>> print typegetter.mimeTypeGetter(content_type="foo bar") None This combines with ignoring the other values that may be provided as expected:: >>> print typegetter.mimeTypeGetter( ... name="file.html", data="...", content_type="foo bar") None `mimeTypeGuesser()` ~~~~~~~~~~~~~~~~~~~ A more elaborate extractor that tries to work around completely missing information can be found as the `mimeTypeGuesser()` function. This function will only guess if there is no usable content type information in the input. This extractor can be thought of as having the following pseudo-code:: def mimeTypeGuesser(name=None, data=None, content_type=None): type = mimeTypeGetter(name=name, data=data, content_type=content_type) if type is None: type = guess the content type return type Let's see how this affects the results we saw earlier. When there's no input to use, we still get None:: >>> print typegetter.mimeTypeGuesser() None Providing only the upload filename or data, or both, now produces a non-None guess for common content types:: >>> typegetter.mimeTypeGuesser(name="file.html") 'text/html' >>> typegetter.mimeTypeGuesser(data="...") 'text/html' >>> typegetter.mimeTypeGuesser(name="file.html", data="...") 'text/html' Note that if the filename and data provided separately produce different MIME types, the result of providing both will be one of those types, but which is unspecified:: >>> mt_1 = typegetter.mimeTypeGuesser(name="file.html") >>> mt_1 'text/html' >>> mt_2 = typegetter.mimeTypeGuesser(data="...") >>> mt_2 'text/xml' >>> mt = typegetter.mimeTypeGuesser( ... data="...", name="file.html") >>> mt in (mt_1, mt_2) True If a content type header is available for the input, that is used in the same way as for the `mimeTypeGetter()` function:: >>> typegetter.mimeTypeGuesser(content_type="text/plain") 'text/plain' >>> typegetter.mimeTypeGuesser(content_type="text/plain; charset=utf-8") 'text/plain' If the content-type information is provided but malformed, it is ignored:: >>> print typegetter.mimeTypeGetter(content_type="foo bar") None When combined with values for the filename or content data, those are still used to provide reasonable guesses for the content type:: >>> typegetter.mimeTypeGuesser(name="file.html", content_type="foo bar") 'text/html' >>> typegetter.mimeTypeGuesser( ... data="...", content_type="foo bar") 'text/html' Information from a parsable content-type is still used even if a guess from the data or filename would provide a different or more-refined result:: >>> typegetter.mimeTypeGuesser( ... data="GIF89a...", content_type="application/octet-stream") 'application/octet-stream' `smartMimeTypeGuesser()` ~~~~~~~~~~~~~~~~~~~~~~~~ The `smartMimeTypeGuesser()` function applies more knowledge to the process of determining the MIME-type to use. Essentially, it takes the result of the `mimeTypeGuesser()` function and attempts to refine the content-type based on various heuristics. We still see the basic behavior that no input produces None:: >>> print typegetter.smartMimeTypeGuesser() None An unparsable content-type is still ignored:: >>> print typegetter.smartMimeTypeGuesser(content_type="foo bar") None The interpretation of uploaded data will be different in at least some interesting cases. For instance, the `mimeTypeGuesser()` function provides these results for some XHTML input data:: >>> typegetter.mimeTypeGuesser( ... data="...", ... name="file.html") 'text/html' The smart extractor is able to refine this into more usable data:: >>> typegetter.smartMimeTypeGuesser( ... data="...", ... name="file.html") 'application/xhtml+xml' In this case, the smart extractor has refined the information determined from the filename using information from the uploaded data. The specific approach taken by the extractor is not part of the interface, however. `charsetGetter()` ~~~~~~~~~~~~~~~~~ If you're interested in the character set of textual data, you can use the `charsetGetter` function (which can also be registered as the `ICharsetGetter` utility): The simplest case is when the character set is already specified in the content type. >>> typegetter.charsetGetter(content_type='text/plain; charset=mambo-42') 'mambo-42' Note that the charset name is lowercased, because all the default ICharset and ICharsetCodec utilities are registered for lowercase names. >>> typegetter.charsetGetter(content_type='text/plain; charset=UTF-8') 'utf-8' If it isn't, `charsetGetter` can try to guess by looking at actual data >>> typegetter.charsetGetter(content_type='text/plain', data='just text') 'ascii' >>> typegetter.charsetGetter(content_type='text/plain', data='\xe2\x98\xba') 'utf-8' >>> import codecs >>> typegetter.charsetGetter(data=codecs.BOM_UTF16_BE + '\x12\x34') 'utf-16be' >>> typegetter.charsetGetter(data=codecs.BOM_UTF16_LE + '\x12\x34') 'utf-16le' If the character set cannot be determined, `charsetGetter` returns None. >>> typegetter.charsetGetter(content_type='text/plain', data='\xff') >>> typegetter.charsetGetter() zope.mimetype-1.3.1/src/zope/mimetype/types.csv000644 000766 000024 00000012765 11466575473 021540 0ustar00macstaff000000 000000 name, title, extensions, mime_types, icon_name, encoded IContentTypeMicrosoftWord, MS Word, .doc .dot, application/msword application/vnd.ms-word, icons/ms-word.gif, no IContentTypeMicrosoftExcel, MS Excel, .xls .xlt, application/vnd.ms-excel, icons/ms-excel.gif, no IContentTypeCSV, Comma Separated Values, .csv, text/comma-separated-values, icons/document.gif, yes IContentTypeTabDelimited, Tab Delimited, .tab .tsv, text/tab-separated-values, icons/document.gif, yes IContentTypeMicrosoftPowerPoint, MS Powerpoint, .ppt, application/vnd.ms-powerpoint, icons/ms-powerpoint.gif, no IContentTypeMicrosoftProject, MS Project, .mpp, application/vnd.ms-project, icons/ms-project.gif, no IContentTypeMicrosoftWorks, MS Works, .wps .wdb, application/vnd.ms-works, icons/document.gif, no IContentTypeMicrosoftVisio, MS Visio, .vsd .vst .vsw, application/x-visio application/vnd.ms-visio, icons/document.gif, no IContentTypeWordPerfect, WordPerfect, .wpd .wp5 .wp6, application/wordperfect, icons/wordperfect.gif, no IContentTypeRichText, Rich Text Format, .rtf, application/rtf, icons/document.gif, no IContentTypeTextPlain, Plain Text, .txt .text, text/plain, icons/document.gif, yes IContentTypeTextHtml, HTML Document, .html .htm, text/html, icons/html.gif, yes IContentTypeHtmlApplication, HTML Application, .hta, application/hta, icons/binary.gif, yes IContentTypeEmail, Email message (MIME), .mime, message/rfc822, icons/document.gif, yes IContentTypeXml, XML Document, .xml .xsl .xslt, text/xml application/xml, icons/xml.gif, yes IContentTypeXhtml, XHTML Document, .html .xhtml, application/xhtml+xml, icons/html.gif, yes IContentTypeXmlDtd, XML Document Type Definition, .dtd, application/xml-dtd, icons/xml.gif, yes IContentTypeXmlExternalEntity, XML External Parsed Entity, .xml, application/xml-external-parsed-entity text/xml-external-parsed-entity, icons/xml.gif, yes IContentTypeSgml, SGML Document, .sgm .sgml, text/sgml, icons/document.gif, yes IContentTypeCss, Cascading Style Sheet, .css, text/css, icons/css.gif, yes IContentTypeJavaScript, Javascript Program, .js .ls, text/javascript application/x-javascript, icons/javascript.gif, yes IContentTypeImageGif, GIF Image, .gif, image/gif, icons/image.gif, no IContentTypeImageJpeg, JPEG Image, .jpg .jpeg .jpe, image/jpeg image/pjpeg, icons/image.gif, no IContentTypeImagePng, PNG Image, .png, image/png image/x-png, icons/image.gif, no IContentTypeImageTiff, TIFF Image, .tif .tiff, image/tiff, icons/image.gif, no IContentTypeImageBitmap, MS Bitmap Image, .bmp, image/bmp, icons/image.gif, no IContentTypeImageMacintoshPicture, Macintosh Picture, .pict, image/x-pict, icons/image.gif, no IContentTypeImageMicrosoftIcon, MS Icon, .ico .icl, image/x-icon, icons/image.gif, no IContentTypeImagePhotoCd, Kodak Photo CD, .pcd .jpeg, image/x-folder-cd, icons/image.gif, no IContentTypeImageAutoCad, AutoCAD Drawing, .dwg, image/vnd.dwg, icons/image.gif, no IContentTypeAudioWave, Wave Audio, .wav, audio/x-wav, icons/sound.gif, no IContentTypeAudioMpeg3, MPEG Audio (MP3), .mpa .abs .mpega .mp3, audio/x-mpeg, icons/sound.gif, no IContentTypeAudioMpeg2, MPEG2 Audio, .mp2a .mpa2, audio/x-mpeg-2, icons/sound.gif, no IContentTypeAudioRealAudio, RealAudio, .ra .ram, application/x-pn-realaudio, icons/sound.gif, no IContentTypeRealMedia, RealMedia, .rm, application/vnd.rn-realmedia, icons/video.png, no IContentTypeAudioAiff, Audio Interchange Format, .aif .aiff .aifc, audio/x-aiff, icons/sound.gif, no IContentTypeAudioMidi, Midi Audio, .mid .midi, audio/x-midi, icons/sound.gif, no IContentTypeVideoMpeg, MPEG Video, .mpeg .mpg .mpe, video/mpeg, icons/video.png, no IContentTypeVideoMpeg2, MPEG2 Video, .mpv2 .mp2v, video/mpeg-2, icons/video.png, no IContentTypeVideoQuickTime, Quicktime Video, .mov .moov, video/quicktime, icons/video.png, no IContentTypeVideoAvi, MS AVI Video, .avi , video/x-msvideo, icons/video.png, no IContentTypePdf, Adobe Acrobat PDF, .pdf, application/pdf, icons/pdf.gif, no IContentTypePostscript, Postscript, .ps .ai .eps, application/postscript, icons/document.gif, yes IContentTypeZip, Zip Archive, .zip, application/zip, icons/archive.png, no IContentTypeTarGzip, Tar/Gzip Archive, .tar.gz .tgz, application/x-compressed, icons/archive.png, no IContentTypeGzip, Gzip Archive, .gz .gzip, application/x-gzip, icons/archive.png, no IContentTypeBzip2, Bzip Archive, .bz2, application/x-bzip2, icons/archive.png, no IContentTypeStuffit, Stuffit Archive, .sit .sea, application/x-stuffit, icons/archive.png, no IContentTypeBinHex, BinHex Archive, .hqx, application/mac-binhex40, icons/archive.png, no IContentTypeBinary, Binary/Executable, .bin .uu .exe, application/octet-stream, icons/binary.gif, no IContentTypeOpenOfficeWriter, OpenOffice Writer, .sxw .sxg, application/vnd.sun.xml.writer, icons/oo-writer.png, no IContentTypeOpenOfficeWriterTemplate, OpenOffice Writer Template, .sxw, application/vnd.sun.xml.writer.template, icons/oo-writer.png, no IContentTypeOpenOfficeCalc, OpenOffice Calc, .sxc, application/vnd.sun.xml.calc, icons/oo-calc.png, no IContentTypeOpenOfficeCalcTemplate, OpenOffice Calc, .stc, application/vnd.sun.xml.calc.template, icons/oo-calc.png, no IContentTypeOpenOfficeDraw, OpenOffice Draw, .sxd .std, application/vnd.sun.xml.draw, icons/oo-draw.png, no IContentTypeOpenOfficeImpress, OpenOffice Impress, .sxi .sti, application/vnd.sun.xml.impress, icons/oo-impress.png, no IContentTypeStarOfficeWriter, StarOffice Writer, .sdw .sgl, application/vnd.stardivision.writer, icons/document.gif, no IContentTypeStarOfficeCalc, StarOffice Calc, .sdc, application/vnd.stardivision.calc, icons/document.gif, no zope.mimetype-1.3.1/src/zope/mimetype/types.py000644 000766 000024 00000005212 11466575473 021362 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## from zope.component import provideUtility from zope.interface.interface import InterfaceClass from zope.mimetype.interfaces import IContentType, IContentTypeEncoded from zope.mimetype.interfaces import IContentTypeInterface from zope.mimetype.i18n import _ import os import csv import zope.interface def read(file_name): file = open(file_name) result = {} for name, title, extensions, mime_types, icon_name, encoded in csv.reader(file): extensions = extensions.split() mime_types = mime_types.split() encoded = (encoded.strip().lower() == 'yes') result[name] = (title.strip(), extensions, mime_types, icon_name.strip(), encoded) return result def getInterfaces(data, module=None): results = {} if module is None: module = __name__ globs = globals() for name, info in data.iteritems(): interface = globs.get(name) if interface is None: interface = makeInterface(name, info, module) globs[name] = interface results[name] = interface return results def makeInterface(name, info, module): title, extensions, mime_types, icon_name, encoded = info if encoded: base = IContentTypeEncoded else: base = IContentType interface = InterfaceClass(name, bases=(base,), __module__=module) zope.interface.directlyProvides(interface, IContentTypeInterface) interface.setTaggedValue('extensions', extensions) interface.setTaggedValue('mimeTypes', mime_types) interface.setTaggedValue('title', _(title, default=title)) return interface def registerUtilities(interfaces, data): for name, interface in interfaces.iteritems(): for mime_type in data[name][2]: provideUtility(interface, provides=IContentTypeInterface, name=mime_type) here = os.path.dirname(os.path.abspath(__file__)) types_data = os.path.join(here, "types.csv") def setup(): data = read(types_data) interfaces = getInterfaces(data) registerUtilities(interfaces, data) zope.mimetype-1.3.1/src/zope/mimetype/utils.py000644 000766 000024 00000001710 11466575473 021355 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## """Utility helpers $Id: utils.py 112083 2010-05-06 13:23:33Z tseaver $ """ from email.Charset import Charset def decode(s, charset_name): "given a string and a IANA character set name, decode string to unicode" codec = Charset(charset_name).input_codec if codec is None: return unicode(s) else: return unicode(s, codec) zope.mimetype-1.3.1/src/zope/mimetype/utils.txt000644 000766 000024 00000002405 11466575473 021546 0ustar00macstaff000000 000000 The utils module contains various helpers for working with data goverened by MIME content type information, as found in the HTTP Content-Type header: mime types and character sets. The decode function takes a string and an IANA character set name and returns a unicode object decoded from the string, using the codec associated with the character set name. Errors will generally arise from the unicode conversion rather than the mapping of character set to codec, and will be LookupErrors (the character set did not cleanly convert to a codec that Python knows about) or UnicodeDecodeErrors (the string included characters that were not in the range of the codec associated with the character set). >>> original = 'This is an o with a slash through it: \xb8.' >>> charset = 'Latin-7' # Baltic Rim or iso-8859-13 >>> from zope.mimetype import utils >>> utils.decode(original, charset) u'This is an o with a slash through it: \xf8.' >>> utils.decode(original, 'foo bar baz') Traceback (most recent call last): ... LookupError: unknown encoding: foo bar baz >>> utils.decode(original, 'iso-ir-6') # alias for ASCII ... # doctest: +ELLIPSIS Traceback (most recent call last): ... UnicodeDecodeError: 'ascii' codec can't decode... zope.mimetype-1.3.1/src/zope/mimetype/widget.py000644 000766 000024 00000005556 11466575473 021514 0ustar00macstaff000000 000000 """Widget that provides translation and sorting for an IIterableSource. This widget translates the term titles and presents those in sorted order. Properly, this should call on a language-specific collation routine, but we don't currently have those. Also, it would need to deal with a partially-translated list of titles when translations are only available for some of the titles. The implementation ignores these issues for now. """ __docformat__ = "reStructuredText" import zope.formlib.source import zope.i18n class TranslatableSourceSelectWidget( zope.formlib.source.SourceSelectWidget): def __init__(self, field, source, request): super(TranslatableSourceSelectWidget, self).__init__( field, source, request) self.displays = {} # value --> (display, token) self.order = [] # values in sorted order # XXX need a better way to sort in an internationalized context sortable = [] for value in source: t = self.vocabulary.terms.getTerm(value) title = zope.i18n.translate(t.title, context=request) self.displays[value] = title, t.token lower = title.lower() sortable.append((lower, value)) sortable.sort() self.order = [value for (lower, value) in sortable] def renderItemsWithValues(self, values): """Render the list of possible values, with those found in `values` being marked as selected.""" cssClass = self.cssClass # multiple items with the same value are not allowed from a # vocabulary, so that need not be considered here rendered_items = [] count = 0 # Handle case of missing value missing = self._toFormValue(self.context.missing_value) if self._displayItemForMissingValue and not self.context.required: if missing in values: render = self.renderSelectedItem else: render = self.renderItem missing_item = render(count, self.translate(self._messageNoValue), missing, self.name, cssClass) rendered_items.append(missing_item) count += 1 # Render normal values for value in self.order: item_text, token = self.displays[value] if value in values: render = self.renderSelectedItem else: render = self.renderItem rendered_item = render(count, item_text, token, self.name, cssClass) rendered_items.append(rendered_item) count += 1 return rendered_items def textForValue(self, term): return self.displays[term.value] class TranslatableSourceDropdownWidget(TranslatableSourceSelectWidget): size = 1 zope.mimetype-1.3.1/src/zope/mimetype/widget.txt000644 000766 000024 00000006114 11466575473 021672 0ustar00macstaff000000 000000 ============================== TranslatableSourceSelectWidget ============================== TranslatableSourceSelectWidget is a SourceSelectWidget that translates and sorts the choices. We will borrow the boring set up code from the SourceSelectWidget test (source.txt in zope.formlib). >>> import zope.interface >>> import zope.component >>> import zope.schema >>> import zope.schema.interfaces >>> class SourceList(list): ... zope.interface.implements(zope.schema.interfaces.IIterableSource) >>> import zope.publisher.interfaces.browser >>> from zope.browser.interfaces import ITerms >>> from zope.schema.vocabulary import SimpleTerm >>> class ListTerms: ... ... zope.interface.implements(ITerms) ... ... def __init__(self, source, request): ... pass # We don't actually need the source or the request :) ... ... def getTerm(self, value): ... title = unicode(value) ... try: ... token = title.encode('base64').strip() ... except binascii.Error: ... raise LookupError(token) ... return SimpleTerm(value, token=token, title=title) ... ... def getValue(self, token): ... return token.decode('base64') >>> zope.component.provideAdapter( ... ListTerms, ... (SourceList, zope.publisher.interfaces.browser.IBrowserRequest)) >>> dog = zope.schema.Choice( ... __name__ = 'dog', ... title=u"Dogs", ... source=SourceList(['spot', 'bowser', 'prince', 'duchess', 'lassie']), ... ) >>> dog = dog.bind(object()) Now that we have a field and a working source, we can construct and render a widget. >>> from zope.mimetype.widget import TranslatableSourceSelectWidget >>> from zope.publisher.browser import TestRequest >>> request = TestRequest() >>> widget = TranslatableSourceSelectWidget( ... dog, dog.source, request) >>> print widget()
Note that the options are ordered alphabetically. If the field is not required, we will also see a special choice labeled "(nothing selected)" at the top of the list >>> dog.required = False >>> print widget()
zope.mimetype-1.3.1/src/zope/mimetype/zcml.py000644 000766 000024 00000010361 11466575473 021164 0ustar00macstaff000000 000000 ############################################################################## # # Copyright (c) 2005 Zope Foundation and Contributors. # # This software is subject to the provisions of the Zope Public License, # Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE. # ############################################################################## import os.path from zope import interface from zope import schema from zope.configuration import fields from i18n import _ from zope.mimetype import codec from zope.mimetype import interfaces from zope.mimetype import types import zope.browserresource.metaconfigure import zope.component.zcml import zope.component.interface class IMimeTypesDirective(interface.Interface): """Request loading of a MIME type definition table. Example: """ file = fields.Path( title=_("File"), description=_("Path of the CSV file to load registrations from."), required=True, ) module = fields.GlobalObject( title=_("Module"), description=_("Module which contains the interfaces" " referenced from the CSV file."), required=True, ) def mimeTypesDirective(_context, file, module): codec.initialize(_context) directory = os.path.dirname(file) data = types.read(file) provides = interfaces.IContentTypeInterface for name, info in data.iteritems(): iface = getattr(module, name, None) if iface is None: # create missing interface iface = types.makeInterface( name, info, getattr(module, "__name__", None)) setattr(module, name, iface) # Register the interface as a utility: _context.action( discriminator = None, callable = zope.component.interface.provideInterface, args = (iface.__module__ + '.' + iface.getName(), iface) ) for mime_type in info[2]: # Register the interface as the IContentTypeInterface # utility for each appropriate MIME type: _context.action( discriminator = ('utility', provides, mime_type), callable = zope.component.zcml.handler, args = ('registerUtility', iface, provides, mime_type), ) icon = os.path.join(directory, info[3]) if icon and os.path.isfile(icon): zope.browserresource.metaconfigure.icon( _context, "zmi_icon", iface, icon) class ICodecDirective(interface.Interface): """Defines a codec. Example: ... """ name = schema.ASCIILine( title=_('Name'), description=_('The name of the Python codec.'), required=True, ) title = fields.MessageID( title=_('Title'), description=_('The human-readable name for this codec.'), required=False, ) class ICharsetDirective(interface.Interface): """Defines a charset in a codec. Example: """ name = schema.ASCIILine( title=_('Name'), description=_('The name of the Python codec.'), required=True, ) preferred = schema.Bool( title=_('Preferred'), description=_('Is this is the preferred charset for the encoding.'), required=False, ) class CodecDirective(object): def __init__(self, _context, name, title): self.name = name self.title = title _context.action( discriminator = None, callable = codec.addCodec, args = (name, title), ) def charset(self, _context, name, preferred=False): _context.action( discriminator = (self.name, name), callable = codec.addCharset, args = (self.name, name, preferred), ) zope.mimetype-1.3.1/src/zope/mimetype/icons/archive.png000644 000766 000024 00000000610 11466575473 023103 0ustar00macstaff000000 000000 PNG  IHDR(-ScPLTEΚzy {@O=<^شSLJ ^޶trhe`ގƾ31 {Ƌ~ł~k`\xMtRNS@fbKGDH pHYs  ~tIME !\:bIDATxE D.7bb:>$; 5dBGk)PH 5IfKۘ1r΅〭ZƘXwl#xw0VwBD8O7)r8v S1% 9RJMXj(`aN/u |L(IENDB`zope.mimetype-1.3.1/src/zope/mimetype/icons/audio.gif000644 000766 000024 00000000220 11466575473 022541 0ustar00macstaff000000 000000 GIF89a̙!,UxgA)T }el]`1\ Swھ3Pv 0a )` ӓ݀rFm;zope.mimetype-1.3.1/src/zope/mimetype/icons/binary.gif000644 000766 000024 00000000556 11466575473 022740 0ustar00macstaff000000 000000 GIF89a  w  "z&ʳ p}~|&\햶Dr{! ,pH,L*ax|(SL@7"M9B"10\&# k__V{{[_HJIFCA;zope.mimetype-1.3.1/src/zope/mimetype/icons/css.gif000644 000766 000024 00000002101 11466575473 022230 0ustar00macstaff000000 000000 GIF89aXƗ|ʕs ΔmZ:# RҞv|{)bWD1TJ(G.zc)NGɖtʖvv/+^;Ό4tVWGJT_# J78K&KK+gY~E2Hz0X~5FLƖ{B)LYߌ SAG1ްHh~HѠzǗ,٣@P_=n*\#;6Ƙ}Vk ǻVO(LZfPYT\ŝV%a_LjN;zope.mimetype-1.3.1/src/zope/mimetype/icons/ms-powerpoint.gif000644 000766 000024 00000000213 11466575473 024265 0ustar00macstaff000000 000000 GIF89a! ,Pxܮ@+eJ @p|V%PDJ E ķ,n2ңp_0|PDvR{mbOR7⿿GJ=`J^tRNS@fbKGDH pHYs  ~tIME [t~IDATxM 1bjP-;gr&QJ.x"74a.I55 #s - څD,s'<~Vuv h[*8 ԆoE (IENDB`zope.mimetype-1.3.1/src/zope/mimetype/icons/oo-impress.png000644 000766 000024 00000000544 11466575473 023565 0ustar00macstaff000000 000000 PNG  IHDR(-ScPLTEֆN.BBB斖JJJڮJNf~溮fffuuu^^^ƹ~~~¢ТntRNS@fbKGDH pHYs  ~tIME s{zIDATxM @Q䡭VAZR_)Baq 1X m "j[> 1dՂgN6R[HJ/濐-5 nV(84xQIENDB`zope.mimetype-1.3.1/src/zope/mimetype/icons/pdf.gif000644 000766 000024 00000000253 11466575473 022217 0ustar00macstaff000000 000000 GIF89a0! ,XI!n`D'q@'7C _hhNQ $P<@Ϫ$Jgr Ϸm x;zope.mimetype-1.3.1/src/zope/mimetype/icons/python.gif000644 000766 000024 00000000270 11466575473 022766 0ustar00macstaff000000 000000 GIF89a <<<< !,eI8OֹZ a.H*.b0*^9]%HԌBQa5aXue(թ:A՘g:{0|vL ;zope.mimetype-1.3.1/src/zope/mimetype/icons/video.png000644 000766 000024 00000000565 11466575473 022601 0ustar00macstaff000000 000000 PNG  IHDR(-ScPLTEvvZ^nJJREE56X⚚&&:""(^^rvvbbOPww̝de ~ҹCtRNS@fbKGDH pHYs  #utIME  ^(EIDATxU !QQP$ 5KH_xM 6#խ*Zvs6%m`f=|r\.߼hF+:\ބD. @r.0G?=GR< ǃIENDB`zope.mimetype-1.3.1/src/zope/mimetype/icons/wordperfect.gif000644 000766 000024 00000000666 11466575473 024002 0ustar00macstaff000000 000000 GIF89a ␒sux~gf@BBYYQS` ɡ11.*+r68af~2@RSM! ,``$fHP8JM0P\%ҁ~Ebv|;"yV OphEEVWO g O  Wgĺ  Wf gp ޫ8@pɄNWZY/$JHB ;zope.mimetype-1.3.1/src/zope/mimetype/icons/xml.gif000644 000766 000024 00000000204 11466575473 022242 0ustar00macstaff000000 000000 GIF89a4b!,@Ix*ܮ@K`8/Diٶ`lTay@bDJ`~h[ٙQUr:f[!sL$;zope.mimetype-1.3.1/src/zope/mimetype/icons/xsl.gif000644 000766 000024 00000001674 11466575473 022264 0ustar00macstaff000000 000000 GIF89a11Bc!!!k)))1c1c111119BBBJJJJcckkkΜ1c11cΜ!F,H@#!!1L$4aC$*GX  ABC2X0DH-&ȀB6UtIB4mP9D+V†MY ?f]jĎjlW8ւl^)|Lc@;