python - "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

ID : 10434

viewed : 69

Tags : pythonpython-3.xcharacter-encodingpython

Top 5 Answer for python - "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

vote vote

91

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.

vote vote

87

The following also worked for me. ISO 8859-1 is going to save a lot, mainly if using Speech Recognition APIs.

Example:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1") 
vote vote

74

Your file doesn't actually contain UTF-8 encoded data; it contains some other encoding. Figure out what that encoding is and use it in the open call.

In Windows-1252 encoding, for example, the 0xe9 would be the character é.

vote vote

67

Try this to read using Pandas:

pd.read_csv('u.item', sep='|', names=m_cols, encoding='latin-1') 
vote vote

57

This works:

open('filename', encoding='latin-1') 

Or:

open('filename', encoding="ISO-8859-1") 

Top 3 video Explaining python - "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

Related QUESTION?