Unicode decoder error python

User

The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.

Decoding from str to unicode. >>> "a".decode("utf-8") u'a' >>> "\x81".decode("utf-8") Traceback (most recent call last): File "", line 1, in File "encodings/utf_8.py", line 16, in decode UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 0: unexpected code byte >>> "a\x81b".decode("utf-8", "replace") u'a\ufffdb'

Paradoxically, a UnicodeDecodeError may happen when _encoding_. The cause of it seems to be the coding-specific encode() functions that normally expect a parameter of type unicode. It appears that on seeing an str parameter, the encode() functions «up-convert» it into unicode before converting to their own coding. It also appears that such «up-conversion» makes no assumption of str parameter’s coding, choosing a default ascii decoder. Hence a decoding failure inside an encoder.

Unlike a similar case with UnicodeEncodeError, such a failure cannot be always avoided. This is because the str result of encode() must be a legal coding-specific sequence. However, a more flexible treatment of the unexpected str argument type might first validate the str argument by decoding it, then return it unmodified if the validation was successful. As of Python2.5, this is not implemented.

Читайте также:  Html документ вставить звук

Alternatively, a TypeError exception could always be thrown on receiving an str argument in encode() functions. (This would require StreamWriter.write() to accept only unicode. The underlying stream’s .write() will receive only str‘s).

Encoding from unicode to str. >>> u"a".encode("utf-8") 'a' >>> u"\u0411".encode("utf-8") '\xd0\x91' >>> "a".encode("utf-8") # Unexpected argument type. 'a' >>> "\xd0\x91".encode("utf-8") # Unexpected argument type. Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Python 3000 will prohibit encoding of bytes, according to PEP 3137: «encoding always takes a Unicode string and returns a bytes sequence, and decoding always takes a bytes sequence and returns a Unicode string».

UnicodeDecodeError (last edited 2008-11-15 13:59:56 by localhost )

Источник

Unicode decoder error python

Last updated: Feb 18, 2023
Reading time · 3 min

banner

# UnicodeDecodeError: ‘ascii’ codec can’t decode byte

The Python «UnicodeDecodeError: ‘ascii’ codec can’t decode byte in position» occurs when we use the ascii codec to decode bytes that were encoded using a different codec.

To solve the error, specify the correct encoding, e.g. utf-8 .

unicodedecodeerror ascii codec cant decode byte in position

Here is an example of how the error occurs.

I have a file called example.txt with the following contents.

Copied!
𝘈Ḇ𝖢𝕯٤ḞԍНǏ hello world

And here is the code that tries to decode the contents of example.txt using the ascii codec.

Copied!
# ⛔️ UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) with open('example.txt', 'r', encoding='ascii') as f: lines = f.readlines() print(lines)

setting incorrect ascii encoding

The error is caused because the example.txt file doesn’t use the ascii encoding.

Copied!
𝘈Ḇ𝖢𝕯٤ḞԍНǏ hello world

If you know the encoding the file uses, make sure to specify it using the encoding keyword argument.

# Try setting the encoding to utf-8

Otherwise, the first thing you can try is setting the encoding to utf-8 .

Copied!
# 👇️ set encoding to utf-8 with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() print(lines) # 👉️ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world']

set encoding to utf 8

You can view all of the standard encodings in this table of the official docs.

Encoding is the process of converting a string to a bytes object and decoding is the process of converting a bytes object to a string .

When decoding a bytes object, we have to use the same encoding that was used to encode the string to a bytes object.

Here is an example that shows how using a different encoding to encode a string to bytes than the one used to decode the bytes object causes the error.

Copied!
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # ⛔️ UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) my_text_again = my_binary_data.decode('ascii')

using different encoding causes the error

We can solve the error by using the utf-8 encoding to decode the bytes object.

Copied!
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # 👉️ b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f' print(my_binary_data) # ✅ specify correct encoding my_text_again = my_binary_data.decode('utf-8') print(my_text_again) # 👉️ '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'

same encoding used

The code sample doesn’t cause an issue because the same encoding was used to encode the string into bytes and decode the bytes object into a string.

# Set the errors keyword argument to ignore

If you get an error when decoding the bytes using the utf-8 encoding, you can try setting the errors keyword argument to ignore to ignore the characters that cannot be decoded.

Copied!
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # 👇️ set errors to ignore my_text_again = my_binary_data.decode('utf-8', errors='ignore') print(my_text_again)

Note that ignoring characters that cannot be decoded can lead to data loss.

Here is an example where errors is set to ignore when opening a file.

Copied!
# 👇️ set errors to ignore with open('example.txt', 'r', encoding='utf-8', errors='ignore') as f: lines = f.readlines() # ✅ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world'] print(lines)

Opening the file with an incorrect encoding with errors set to ignore won’t raise an error.

Copied!
with open('example.txt', 'r', encoding='ascii', errors='ignore') as f: lines = f.readlines() # ✅ ['\n', 'hello world'] print(lines)

The example.txt file doesn’t use the ascii encoding, however, opening the file with errors set to ignore doesn’t raise an error.

Copied!
𝘈Ḇ𝖢𝕯٤ḞԍНǏ hello world

Instead, it ignores the data it cannot parse and returns the data it can parse.

# Make sure you aren’t mixing up encode() and decode()

Make sure you aren’t mixing up calls to the str.encode() and bytes.decode() method.

Encoding is the process of converting a string to a bytes object and decoding is the process of converting a bytes object to a string .

If you have a str that you want to convert to bytes, use the encode() method.

Copied!
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ' my_binary_data = my_text.encode('utf-8') # 👉️ b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f' print(my_binary_data) # ✅ specify correct encoding my_text_again = my_binary_data.decode('utf-8') print(my_text_again) # 👉️ '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'

If you have a bytes object that you need to convert to a string, use the decode() method.

Make sure to specify the same encoding in the call to the str.encode() and bytes.decode() methods.

# Discussion

The default encoding in Python 3 is utf-8 .

Python 3 no longer has the concept of Unicode like Python 2 did.

Instead, Python 3 supports strings and bytes objects.

Using the ascii encoding to decode a bytes object that was encoded in a different encoding causes the error.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

Источник

Python UnicodeDecodeError: ASCII Codec Can’t Decode Byte in Position: Ordinal Not in Range

Python UnicodeDecodeError: ASCII Codec Can

  1. Unicode Decode Error in Python
  2. How to Solve the Unicode Decode Error in Python

In this article, we will learn how to resolve the UnicodeDecodeError that occurs during the execution of the code. We will look at the different reasons that cause this error.

We will also find ways to resolve this error in Python. Let’s begin with what the UnicodeDecodeError is in Python.

Unicode Decode Error in Python

If you are facing a recurring UnicodeDecodeError and are unsure of why it is happening or how to resolve it, this is the article for you.

In this article, we go in-depth about why this error comes up and a simple approach to resolving it.

Causes of Unicode Decode Error in Python

In Python, the UnicodeDecodeError comes up when we use one kind of codec to try and decode bytes that weren’t even encoded using this codec. To be more specific, let’s understand this problem with the help of a lock and key analogy.

Suppose we created a lock that can only be opened using a unique key made specifically for that lock.

What happens when you would try and open this lock with a key that wasn’t made for this lock? It wouldn’t fit.

Let’s create the file example.txt with the following contents.

Let’s attempt to decode this file using the ascii codec using the following code.

with open('example.txt', 'r', encoding='ascii') as f:  lines = f.readlines()  print(lines) 
Traceback (most recent call last):  File "/home/fatina/PycharmProjects/examples/main.py", line 2, in  lines = f.readlines()  File "/usr/lib/python3.10/encodings/ascii.py", line 26, in decode  return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128) 

Let’s look at another more straightforward example of what happens when you encode a string using one codec and decode using a different one.

Источник

Оцените статью