python - Extract 2 arguments from web page -

- June 15, 2012

i want extract 2 arguments (title , href) <a> tag wikipedia page.

i want output eg (https://en.wikipedia.org/wiki/riddley_walker):

canterbury cathedral   /wiki/canterbury_cathedral

the code:

import os, re, lxml.html, urllib  def extractplaces(hlink):     connection = urllib.urlopen(hlink)     places = {}      dom =  lxml.html.fromstring(connection.read())      name in dom.xpath('//a/@title'): # select url in href tags(links)             print name

in case @title.

you should elements tag a have title attribute (instead of directly getting title attribute).and use .attrib element attributes need. example -

for name in dom.xpath('//a[@title]'):     print('title :',name.attrib['title'])     print('href :',name.attrib['href'])

Search This Blog

WIKI

python - Extract 2 arguments from web page -

Comments

Post a Comment

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

jquery - ReferenceError: CKEDITOR is not defined -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -