基于Python3.4实现简单抓取爬虫功能详细介绍

这篇文章主要介绍了python3.4编程实现简单抓取爬虫功能,涉及python3.4网页抓取及正则解析相关操作技巧,需要的朋友可以参考下
本文实例讲述了python3.4编程实现简单抓取爬虫功能。分享给大家供大家参考，具体如下：
import urllib.request import urllib.parse import re import urllib.request,urllib.parse,http.cookiejar import time def gethtml(url): cj=http.cookiejar.cookiejar() opener=urllib.request.build_opener(urllib.request.httpcookieprocessor(cj)) opener.addheaders=[('user-agent','mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/41.0.2272.101 safari/537.36'),('cookie','4564564564564564565646540')] urllib.request.install_opener(opener) page = urllib.request.urlopen(url) html = page.read() return html #print ( html) #html = gethtml("http://weibo.com/") def getimg(html): html = html.decode('utf-8') reg='"screen_name":"(.*?)"' imgre = re.compile(reg) src=re.findall(imgre,html) return src #print ("",getimg(html)) uid=['2808675432','3888405676','2628551531','2808587400'] for a in list(uid): print (getimg(gethtml("http://weibo.com/"+a))) time.sleep(1)
以上就是基于python3.4实现简单抓取爬虫功能详细介绍的详细内容。

基于Python3.4实现简单抓取爬虫功能详细介绍

推荐信息