=========================================
NLTK Python 2.x - 3.x Compatibility Layer
=========================================

NLTK comes with a Python 2.x/3.x compatibility layer, nltk.compat
(which is loosely based on `six <http://packages.python.org/six/>`_)::

    >>> from nltk import compat
    >>> compat.PY3
    False
    >>> compat.integer_types
    (<type 'int'>, <type 'long'>)
    >>> compat.string_types
    (<type 'basestring'>,)
    >>> # and so on

@python_2_unicode_compatible
----------------------------

Under Python 2.x ``__str__`` and ``__repr__`` methods must
return bytestrings.

``@python_2_unicode_compatible`` decorator allows writing these methods
in a way compatible with Python 3.x:

1) wrap a class with this decorator,
2) define ``__str__`` and ``__repr__`` methods returning unicode text
   (that's what they must return under Python 3.x),

and they would be fixed under Python 2.x to return byte strings::

    >>> from nltk.compat import python_2_unicode_compatible

    >>> @python_2_unicode_compatible
    ... class Foo(object):
    ...     def __str__(self):
    ...         return u'__str__ is called'
    ...     def __repr__(self):
    ...         return u'__repr__ is called'

    >>> foo = Foo()
    >>> foo.__str__().__class__
    <type 'str'>
    >>> foo.__repr__().__class__
    <type 'str'>
    >>> print(foo)
    __str__ is called
    >>> foo
    __repr__ is called

Original versions of ``__str__`` and ``__repr__`` are available as
``__unicode__`` and ``unicode_repr``::

    >>> foo.__unicode__().__class__
    <type 'unicode'>
    >>> foo.unicode_repr().__class__
    <type 'unicode'>
    >>> unicode(foo)
    u'__str__ is called'
    >>> foo.unicode_repr()
    u'__repr__ is called'

There is no need to wrap a subclass with ``@python_2_unicode_compatible``
if it doesn't override ``__str__`` and ``__repr__``::

    >>> class Bar(Foo):
    ...     pass
    >>> bar = Bar()
    >>> bar.__str__().__class__
    <type 'str'>

However, if a subclass overrides ``__str__`` or ``__repr__``,
wrap it again::

    >>> class BadBaz(Foo):
    ...     def __str__(self):
    ...         return u'Baz.__str__'
    >>> baz = BadBaz()
    >>> baz.__str__().__class__  # this is incorrect!
    <type 'unicode'>

    >>> @python_2_unicode_compatible
    ... class GoodBaz(Foo):
    ...     def __str__(self):
    ...         return u'Baz.__str__'
    >>> baz = GoodBaz()
    >>> baz.__str__().__class__
    <type 'str'>
    >>> baz.__unicode__().__class__
    <type 'unicode'>

Applying ``@python_2_unicode_compatible`` to a subclass
shouldn't break methods that was not overridden::

    >>> baz.__repr__().__class__
    <type 'str'>
    >>> baz.unicode_repr().__class__
    <type 'unicode'>

unicode_repr
------------

Under Python 3.x ``repr(unicode_string)`` doesn't have a leading "u" letter.

``nltk.compat.unicode_repr`` function may be used instead of ``repr`` and
``"%r" % obj`` to make the output more consistent under Python 2.x and 3.x::

    >>> from nltk.compat import unicode_repr
    >>> print(repr(u"test"))
    u'test'
    >>> print(unicode_repr(u"test"))
    'test'

It may be also used to get an original unescaped repr (as unicode)
of objects which class was fixed by ``@python_2_unicode_compatible``
decorator::

    >>> @python_2_unicode_compatible
    ... class Foo(object):
    ...     def __repr__(self):
    ...         return u'<Foo: foo>'

    >>> foo = Foo()
    >>> repr(foo)
    '<Foo: foo>'
    >>> unicode_repr(foo)
    u'<Foo: foo>'

For other objects it returns the same value as ``repr``::

    >>> unicode_repr(5)
    '5'

It may be a good idea to use ``unicode_repr`` instead of ``%r``
string formatting specifier inside ``__repr__`` or ``__str__``
methods of classes fixed by ``@python_2_unicode_compatible``
to make the output consistent between Python 2.x and 3.x.