python - Error when verifying SSL certificate -
i got error when tried download data wikipedia pandas.
pd.read_html('http://simple.wikipedia.org/wiki/list_of_u.s._states')
the error message says,
sslerror traceback (most recent call last) /users/soma/.pyenv/versions/3.5.0/lib/python3.5/urllib/request.py in do_open(self, http_class, req, **http_conn_args) 1239 try: -> 1240 h.request(req.get_method(), req.selector, req.data, headers) 1241 except oserror err: # timeout error /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in request(self, method, url, body, headers) 1082 """send complete request server.""" -> 1083 self._send_request(method, url, body, headers) 1084 /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in _send_request(self, method, url, body, headers) 1127 body = body.encode('iso-8859-1') -> 1128 self.endheaders(body) 1129 /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in endheaders(self, message_body) 1078 raise cannotsendheader() -> 1079 self._send_output(message_body) 1080 /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in _send_output(self, message_body) 910 --> 911 self.send(msg) 912 if message_body not none: /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in send(self, data) 853 if self.auto_open: --> 854 self.connect() 855 else: /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in connect(self) 1236 self.sock = self._context.wrap_socket(self.sock, -> 1237 server_hostname=server_hostname) 1238 if not self._context.check_hostname , self._check_hostname: /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname) 375 server_hostname=server_hostname, --> 376 _context=self) 377 /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context) 746 raise valueerror("do_handshake_on_connect should not specified non-blocking sockets") --> 747 self.do_handshake() 748 /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in do_handshake(self, block) 982 self.settimeout(none) --> 983 self._sslobj.do_handshake() 984 finally: /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in do_handshake(self) 627 """start ssl/tls handshake.""" --> 628 self._sslobj.do_handshake() 629 if self.context.check_hostname: sslerror: [ssl: certificate_verify_failed] certificate verify failed (_ssl.c:646) during handling of above exception, exception occurred: urlerror traceback (most recent call last) <ipython-input-51-330bd889a78f> in <module>() ----> 1 fiddy_states = pd.read_html('http://simple.wikipedia.org/wiki/list_of_u.s._states') 2 print(fiddy_states) /users/soma/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding) 864 _validate_header_arg(header) 865 return _parse(flavor, io, match, header, index_col, skiprows, --> 866 parse_dates, tupleize_cols, thousands, attrs, encoding) /users/soma/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pandas/io/html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding) 726 break 727 else: --> 728 raise_with_traceback(retained) 729 730 ret = [] /users/soma/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback) 746 if traceback == ellipsis: 747 _, _, traceback = sys.exc_info() --> 748 raise exc.with_traceback(traceback) 749 else: 750 # version of raise syntax error in python 3 urlerror: <urlopen error [ssl: certificate_verify_failed] certificate verify failed (_ssl.c:646)>
i have no idea why happens.
i had same problem ssl website on linux funny enough -on windows same code parsed tables website. after spending time comparing , updating library versions on linux no result, added code handle ssl certificate before using read_html:
> import urllib3,certifi > > #force certificate check , use certifi handle certificate. > https = urllib3.poolmanager( cert_reqs='cert_required', > ca_certs=certifi.where(),) > > url = https.urlopen('get','https://yoursecureproblematicwebsite.com') > > #then parse html usual > foo = pd.read_html(url.data)
also make sure have latest version of certifi:
>python -m pip install certifi --update
this not efficient way, hope helps.
fonzi
Comments
Post a Comment