python - Extracting website (URL) from Google local search when URL not found in source? -
i'm looking extract (with webdriver, xpath, css selector, class or id) url lives behind each of website images in google local search results page such this
when mouseover of these, can see url reached if click image. yet if view full page source , search of these urls, they're not found. @ source around 1 of images:
suggest urls perhaps read in dynamically, though knowledge of web design ends. possible construct xpath or css selector or indeed plain-text search these urls?
clarification: when url, mean ultimate urls. mouseover of website images , you'll see urls such bodinbalanceny.com, lamchiropractic.com etc. – these urls i'm looking extract.
you can use urlparse. once fetch href
attribute, append "https://www.google.com" , try code below.
>>> import urlparse >>> url = """https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0cbaqgu8wagovchmi6c6mhpvjyaivqyeuch0eiaai&url=http%3a%2f%2fwww.taihealthsolutions.com%2f&usg=afqjcnhhovnrx0zdxz1cu4p2xiueffczta&bvm=bv.105841590,d.dgo""" >>> parsed = urlparse.urlparse(url) >>> print urlparse.parse_qs(parsed.query)['url'][0] http://www.taihealthsolutions.com/
note: python 2.x. python 3, code different.
Comments
Post a Comment