源码网商城,靠谱的源码在线交易网站 我的订单 购物车 帮助

源码网商城

python使用urllib模块和pyquery实现阿里巴巴排名查询

  • 时间:2020-02-13 11:08 编辑: 来源: 阅读:
  • 扫一扫,手机访问
摘要:python使用urllib模块和pyquery实现阿里巴巴排名查询
urllib基础模块的应用,通过该类获取到url中的html文档信息,内部可以重写代理的获取方法
[u]复制代码[/u] 代码如下:
class ProxyScrapy(object):     def __init__(self):         self.proxy_robot = ProxyRobot()         self.current_proxy = None         self.cookie = cookielib.CookieJar()     def __builder_proxy_cookie_opener(self):                cookie_handler = urllib2.HTTPCookieProcessor(self.cookie)                handlers = [cookie_handler]         if PROXY_ENABLE:             self.current_proxy = ip_port = self.proxy_robot.get_random_proxy()             proxy_handler = urllib2.ProxyHandler({'http': ip_port[7:]})             handlers.append(proxy_handler)         opener = urllib2.build_opener(*handlers)         urllib2.install_opener(opener)         return opener     def get_html_body(self,url):         opener = self.__builder_proxy_cookie_opener()         request=urllib2.Request(url)         #request.add_header("Accept-Encoding", "gzip,deflate,sdch")         #request.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")         #request.add_header("Cache-Control", "no-cache")         #request.add_header("Connection", "keep-alive")         try:             response = opener.open(request,timeout=2)             http_code = response.getcode()             if http_code == 200:                 if PROXY_ENABLE:                     self.proxy_robot.handle_success_proxy(self.current_proxy)                 html = response.read()                 return html             else:                 if PROXY_ENABLE:                     self.proxy_robot.handle_double_proxy(self.current_proxy)                 return self.get_html_body(url)         except Exception as inst:             print inst,self.current_proxy             self.proxy_robot.handle_double_proxy(self.current_proxy)             return self.get_html_body(url)
  • 全部评论(0)
联系客服
客服电话:
400-000-3129
微信版

扫一扫进微信版
返回顶部