Handling bytes consistently and correctly has traditionally been one
of the most difficult tasks in writing a Py2/3 compatible codebase. This
is because the Python 2 :class:`bytes` object is simply an alias for
Python 2's :class:`str`, rather than a true implementation of the Python
3 :class:`bytes` object, which is substantially different.
future contains a backport of the :mod:`bytes` object from Python 3
which passes most of the Python 3 tests for :mod:`bytes`. (See
future/tests/test_bytes.py in the source tree.) You can use it as
follows:
>>> from future.builtins import bytes >>> b = bytes(b'ABCD')
On Py3, this is simply the builtin :class:`bytes` object. On Py2, this object is a subclass of Python 2's :class:`str` that enforces the same strict separation of unicode strings and byte strings as Python 3's :class:`bytes` object:
>>> b + u'EFGH' # TypeError Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: argument can't be unicode string >>> bytes(b',').join([u'Fred', u'Bill']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: sequence item 0: expected bytes, found unicode string
Various other comparisons between byte-strings and other types return a result in Py2 but raise a TypeError in Py3. For example, this is permissible on Py2:
>>> b'u' > 10 True >>> b'u' <= u'u' True
On Py3, these raise TypeErrors.
In most other ways, these :class:`bytes` objects have identical behaviours to Python 3's :class:`bytes`:
b = bytes(b'ABCD') assert list(b) == [65, 66, 67, 68] assert repr(b) == "b'ABCD'" assert b.split(b'b') == [b'A', b'CD']
Currently the easiest way to ensure identical behaviour of byte-strings
in a Py2/3 codebase is to wrap all byte-string literals b'...' in a
:func:`~bytes` call as follows:
from future.builtins import * # ... b = bytes(b'This is my bytestring') # ...
This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.