Question :
I’m learning python3 and I ended up packing on the issue of reading a simple csv file that contains the character ‘à’.
I’ve tried using decode, encode that I found on the internet but nothing seems to work, it is always printed as ‘ xc3 xa0’.
Remembering that I use the sublime to edit the code and run it.
import csv
with open('teste.csv', 'r') as ficheiro:
reader = csv.reader(ficheiro, delimiter=';')
for row in reader:
print(row)
The test.csv file:
batata;14;True
pàtato;19;False
papa;10;False
The error:
Traceback (most recent call last):
File "/Users/Mine/Desktop/testando csv.py", line 5, in <module>
for row in reader:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)
[Finished in 0.1s with exit code 1]
I look forward to helping you.
Answer :
Depends on which encoding saved the file .csv
Note: in python 2
csv
only supports ASCII
UTF-8
If the .csv
file is saved as UTF-8 it can do according to Python 3 documentation :
import csv
with open('teste.csv', encoding='utf-8') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
print(row)
If the .csv file is not in UTF-8, an error similar to this will occur:
C:UsersguilhermeDesktop>python testcsv.py Traceback (most recent call last): File "testcsv.py", line 5, in <module> for row in reader: File "C:PythonPython36-32libcodecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0: invalid continuation byte
If everything is correct it will look like this:
Ifitisaproblemonaterminalinalike-unixenvironment(egMacandLinux)applythis(IbelievethedocumentmustalsobesavedinUTF-8withoutBOM):
#-*-coding:utf-8-*-importcsvwithopen('teste.csv',encoding='utf-8')asf:reader=csv.reader(f,delimiter=';')forrowinreader:print(row)
Latin1
IfthefileissavedinANSI,eitherlatin1orwindows-1252oriso-8859-1(theyare”compatible”) can be encoding='latin-1'
(although in Python3 on was needed), it should look like this:
import csv
with open('teste.csv', encoding='latin-1') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
print(row)