python - Extract 2 arguments from web page -


i want extract 2 arguments (title , href) <a> tag wikipedia page.

i want output eg (https://en.wikipedia.org/wiki/riddley_walker):

canterbury cathedral   /wiki/canterbury_cathedral   

the code:

import os, re, lxml.html, urllib  def extractplaces(hlink):     connection = urllib.urlopen(hlink)     places = {}      dom =  lxml.html.fromstring(connection.read())      name in dom.xpath('//a/@title'): # select url in href tags(links)             print name 

in case @title.

you should elements tag a have title attribute (instead of directly getting title attribute).and use .attrib element attributes need. example -

for name in dom.xpath('//a[@title]'):     print('title :',name.attrib['title'])     print('href :',name.attrib['href']) 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -