python - Error when verifying SSL certificate -


i got error when tried download data wikipedia pandas.

pd.read_html('http://simple.wikipedia.org/wiki/list_of_u.s._states') 

the error message says,

sslerror                                  traceback (most recent call   last) /users/soma/.pyenv/versions/3.5.0/lib/python3.5/urllib/request.py in do_open(self, http_class, req, **http_conn_args)    1239             try: -> 1240                 h.request(req.get_method(), req.selector, req.data, headers)    1241             except oserror err: # timeout error  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in request(self, method, url, body, headers)    1082         """send complete request server.""" -> 1083         self._send_request(method, url, body, headers)    1084   /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in _send_request(self, method, url, body, headers)    1127             body = body.encode('iso-8859-1') -> 1128         self.endheaders(body)    1129   /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in endheaders(self, message_body)    1078             raise cannotsendheader() -> 1079         self._send_output(message_body)    1080   /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in _send_output(self, message_body)     910  --> 911         self.send(msg)     912         if message_body not none:  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in send(self, data)     853             if self.auto_open: --> 854                 self.connect()     855             else:  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/http/client.py in connect(self)    1236             self.sock = self._context.wrap_socket(self.sock, -> 1237                                                   server_hostname=server_hostname)    1238             if not self._context.check_hostname , self._check_hostname:  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)     375                          server_hostname=server_hostname, --> 376                          _context=self)     377   /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context)     746                         raise valueerror("do_handshake_on_connect should not specified non-blocking sockets") --> 747                     self.do_handshake()     748   /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in do_handshake(self, block)     982                 self.settimeout(none) --> 983             self._sslobj.do_handshake()     984         finally:  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/ssl.py in do_handshake(self)     627         """start ssl/tls handshake.""" --> 628         self._sslobj.do_handshake()     629         if self.context.check_hostname:  sslerror: [ssl: certificate_verify_failed] certificate verify failed (_ssl.c:646)  during handling of above exception, exception occurred:  urlerror                                  traceback (most recent call last) <ipython-input-51-330bd889a78f> in <module>() ----> 1 fiddy_states = pd.read_html('http://simple.wikipedia.org/wiki/list_of_u.s._states')       2 print(fiddy_states)  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)     864     _validate_header_arg(header)     865     return _parse(flavor, io, match, header, index_col, skiprows, --> 866                   parse_dates, tupleize_cols, thousands, attrs, encoding)  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pandas/io/html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding)     726             break     727     else: --> 728         raise_with_traceback(retained)     729      730     ret = []  /users/soma/.pyenv/versions/3.5.0/lib/python3.5/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback)     746         if traceback == ellipsis:     747             _, _, traceback = sys.exc_info() --> 748         raise exc.with_traceback(traceback)     749 else:     750     # version of raise syntax error in python 3  urlerror: <urlopen error [ssl: certificate_verify_failed] certificate verify failed (_ssl.c:646)> 

i have no idea why happens.

i had same problem ssl website on linux funny enough -on windows same code parsed tables website. after spending time comparing , updating library versions on linux no result, added code handle ssl certificate before using read_html:

> import urllib3,certifi >  > #force certificate check , use certifi handle certificate.  > https = urllib3.poolmanager( cert_reqs='cert_required', > ca_certs=certifi.where(),)   >  > url = https.urlopen('get','https://yoursecureproblematicwebsite.com')  >  > #then parse html usual  > foo = pd.read_html(url.data) 

also make sure have latest version of certifi:

>python -m pip install certifi --update 

this not efficient way, hope helps.

fonzi


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -