bytesΒΆ

Handling bytes consistently and correctly has traditionally been one of the most difficult tasks in writing a Py2/3 compatible codebase. This is because the Python 2 bytes object is simply an alias for Python 2’s str, rather than a true implementation of the Python 3 bytes object, which is substantially different.

future contains a backport of the bytes object from Python 3 which passes most of the Python 3 tests for bytes. (See tests/test_future/test_bytes.py in the source tree.) You can use it as follows:

>>> from builtins import bytes
>>> b = bytes(b'ABCD')

On Py3, this is simply the builtin bytes object. On Py2, this object is a subclass of Python 2’s str that enforces the same strict separation of unicode strings and byte strings as Python 3’s bytes object:

>>> b + u'EFGH'      # TypeError
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument can't be unicode string

>>> bytes(b',').join([u'Fred', u'Bill'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected bytes, found unicode string

>>> b == u'ABCD'
False

>>> b < u'abc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() and <type 'unicode'>

In most other ways, these bytes objects have identical behaviours to Python 3’s bytes:

b = bytes(b'ABCD')
assert list(b) == [65, 66, 67, 68]
assert repr(b) == "b'ABCD'"
assert b.split(b'B') == [b'A', b'CD']

Currently the easiest way to ensure identical behaviour of byte-strings in a Py2/3 codebase is to wrap all byte-string literals b'...' in a bytes() call as follows:

from builtins import bytes

# ...

b = bytes(b'This is my bytestring')

# ...

This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.