python - Extract 2 arguments from web page -
i want extract 2 arguments (title
, href
) <a>
tag wikipedia page.
i want output eg (https://en.wikipedia.org/wiki/riddley_walker):
canterbury cathedral /wiki/canterbury_cathedral
the code:
import os, re, lxml.html, urllib def extractplaces(hlink): connection = urllib.urlopen(hlink) places = {} dom = lxml.html.fromstring(connection.read()) name in dom.xpath('//a/@title'): # select url in href tags(links) print name
in case @title
.
you should elements tag a
have title attribute (instead of directly getting title
attribute).and use .attrib
element attributes need. example -
for name in dom.xpath('//a[@title]'): print('title :',name.attrib['title']) print('href :',name.attrib['href'])
Comments
Post a Comment