python - How to use scrapy to crawl same url by post different data? -
i want crawl website post different page numbers,but data of first page spider finished,i think maybe crawl same url
, filtered scrappy
.
here code:
class zhejiangcrawl(spider): name = 'zhejiangcrawl' root_url= 'http://www.zjsfgkw.cn/execute/creditcompany' start_page = 1 current_page = start_page end_page = 24974 post_data = {'pageno': str(current_page), 'pagesize': '5', 'reallyname': '', 'credentialsnumber': '', 'ah': '', 'zxfy': '', 'startlarq': '','endlarq':''} headers = header cookies = cookies def start_requests(self): return [formrequest(self.root_url, headers=self.headers, cookies=self.cookies, formdata=self.post_data, dont_filter=true, callback=self.parse)] def parse(self, response): if self.current_page < self.end_page: self.current_page += 1 self.post_data['pageno'] = str(self.current_page) yield [formrequest(self.root_url, headers=self.headers, cookies=self.cookies, dont_filter=true, formdata=self.post_data, callback=self.parse)] jsonstr = json.loads(response.body) item_dict in jsonstr['informationmodels']: item = zhejiangcrawlitem() item['name'] = item_dict['reallyname'] item['cardnum'] = item_dict['credentialsnumber'] item['performance'] = item_dict['zxje'] item['unperformance'] = item_dict['wzxje'] item['gistunit'] = item_dict['zxfy'] item['address'] = item_dict['address'] item['gistid'] = item_dict['zxyj'] item['casecode'] = item_dict['ah'] item['regdate'] = item_dict['larq'] item['exposuredate'] = item_dict['bgrq'] item['gistreason'] = item_dict['zxay'] yield item
how fix it?
if think it's being filtered because of dupefilter, add dont_filter=true
formrequests
.
also note, there's no reason make lists out of yielded/returned content.
Comments
Post a Comment