html - Python Scrapy, parsing multiple child objects into the same item? -


for non-profit college assignment trying scrape website www.rateyourmusic.com, able scrape things have encountered problem when trying scrape multiple children of html element.

specifically i'm trying scrape genre of artist many artists multiple genres , can't scrape of them, here parsing method:

def parse_dir_contents(self, response):       item = rateyourmusicartist()      #get genres of artist     sel in response.xpath('//a[@class="genre"]'):              item['genre'] = sel.xpath('text()').extract()      yield item 

there multiple //a[@class="genre"] xpaths representing genre, put them in 1 string separated ', '.

is there easy way this? here sample url site i'm scraping http://rateyourmusic.com/artist/kanye_west.

a simple str.join() trick:

", ".join(response.xpath('//a[@class="genre"]/text()').extract()) 

demo (from scrapy shell):

$ scrapy shell http://rateyourmusic.com/artist/kanye_west in [1]: ", ".join(response.xpath('//a[@class="genre"]/text()').extract()) out[1]: u'hip hop, pop rap, experimental hip hop, hardcore hip hop, electropop, synthpop' 

note that, if use item loaders, can make cleaner:

from scrapy.loader.processors import join  loader = myitemloader(response=response) loader.add_xpath("genre", '//a[@class="genre"]/text()', join(", "))  yield loader.load_item() 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -