Python 3 encoding/decoding problems between FreeBSD/Linux BeautifulSoup -
i have application modifys contents of xml file (via beautiful soup), writes disk. easy enough, on development machine (linux), have working code:
first off, lets load file soup:
# load document document = open(contentxml, encoding="utf-8") # load soup soup = beautifulsoup(document, "lxml") # soupy stuff here open(document.name, "w") f: # soup beautiful soup data f.write(soup.decode("utf-8"))
now works fine , dandy, when run exact same code on freebsd production system, error:
unicodeencodeerror: 'ascii' codec can't encode character '\xa3' in position 8253: ordinal not in range(128)
so in case, thought try encoding file, , write disk:
with open(document.name, "w") f: # soup beautiful soup data # srting output cannot write bytes soup_enc = str(soup.encode('utf8')) f.write(soup_enc)
now works without error, writes incorrect xml output file, outputs
b'<myxmlcontent>'
which in turn makes end file useless, best way around clean solution work on both platforms?
note:
some reading online suggests not open original document, specified encoding e.g. do:
# load document document = open(contentxml) # load soup soup = beautifulsoup(document, "lxml") # soupy stuff here open(document.name, "w") f: # soup beautiful soup data f.write(str(soup))
this works fine on linux, on freebsd throws error when performing initial open(..) of:
unicodedecodeerror: 'ascii' codec can't decode byte 0xc2 in position 7551: ordinal not in range(128)
in order write directly binary file, needed open correct method, write encoded byte string:
with open(document.name, 'wb') f: f.write(soup.encode('utf8'))
Comments
Post a Comment