Skip to content

Latest commit

 

History

History
95 lines (66 loc) · 2.97 KB

File metadata and controls

95 lines (66 loc) · 2.97 KB

bytes

Handling bytes consistently and correctly has traditionally been one of the most difficult tasks in writing a Py2/3 compatible codebase. This is because the Python 2 :class:`bytes` object is simply an alias for Python 2's :class:`str`, rather than a true implementation of the Python 3 :class:`bytes` object, which is substantially different.

future contains a backport of the :mod:`bytes` object from Python 3 which passes most of the Python 3 tests for :mod:`bytes`. (See future/tests/test_bytes.py in the source tree.) You can use it as follows:

>>> from future.builtins import bytes

>>> b = bytes(b'ABCD')

On Py3, this is simply the builtin :class:`bytes` object. On Py2, this object is a subclass of Python 2's :class:`str` that enforces the same strict separation of unicode strings and byte strings as Python 3's :class:`bytes` object:

>>> b + u'EFGH'      # TypeError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument can't be unicode string

>>> bytes(b',').join([u'Fred', u'Bill'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected bytes, found unicode string

Various other comparisons between byte-strings and other types return a result in Py2 but raise a TypeError in Py3. For example, this is permissible on Py2:

>>> b'u' > 10
True

>>> b'u' <= u'u'
True

On Py3, these raise TypeErrors.

In most other ways, these :class:`bytes` objects have identical behaviours to Python 3's :class:`bytes`:

b = bytes(b'ABCD')
assert list(b) == [65, 66, 67, 68]
assert repr(b) == "b'ABCD'"
assert b.split(b'b') == [b'A', b'CD']

Currently the easiest way to ensure identical behaviour of byte-strings in a Py2/3 codebase is to wrap all byte-string literals b'...' in a :func:`~bytes` call as follows:

from future.builtins import *

# ...

b = bytes(b'This is my bytestring')

# ...

This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.