那些有趣好玩强大的Python库

ython 语言向来以丰富的第三方库而闻名，今天来介绍几个非常nice的库，有趣好玩且强大！
数据采集在当今互联网时代，数据实在是太重要了，首先我们就来介绍几个优秀的数据采集项目
akshareakshare 是基于 python 的财经数据接口库，目的是实现对股票、期货、期权、基金、外汇、债券、指数、加密货币等金融产品的基本面数据、实时和历史行情数据、衍生数据从数据采集、数据清洗到数据落地的一套工具，主要用于学术研究目的。
import akshare as ak stock_zh_a_hist_df = ak.stock_zh_a_hist(symbol=000001, period=daily, start_date=20170301, end_date='20210907', adjust=) print(stock_zh_a_hist_df)
output:
日期开盘收盘最高...振幅涨跌幅涨跌额换手率 0 2017-03-01 9.49 9.49 9.55...0.840.110.010.21 1 2017-03-02 9.51 9.43 9.54...1.26 -0.63 -0.060.24 2 2017-03-03 9.41 9.40 9.43...0.74 -0.32 -0.030.20 3 2017-03-06 9.40 9.45 9.46...0.740.530.050.24 4 2017-03-07 9.44 9.45 9.46...0.630.000.000.17 ............... ... ... ... ... 11002021-09-0117.4817.8817.92...5.110.450.081.19 11012021-09-0218.0018.4018.78...5.482.910.521.25 11022021-09-0318.5018.0418.50...4.35 -1.96 -0.360.72 11032021-09-0617.9318.4518.60...4.552.270.410.78 11042021-09-0718.6019.2419.56...6.564.280.790.84 [1105 rows x 11 columns]
https://github.com/akfamily/akshare
tusharetushare 是实现对股票/期货等金融数据从数据采集、清洗加工到数据存储过程的工具，满足金融量化分析师和学习数据分析的人在数据获取方面的需求，它的特点是数据覆盖范围广，接口调用简单,响应快速。
不过该项目有一部分功能是收费的，大家选择使用哦
import tushare as ts ts.get_hist_data('600848') #一次性获取全部数据
output:
openhigh close low volumep_changema5 date 2012-01-11 6.880 7.380 7.060 6.880 14129.96 2.62 7.060 2012-01-12 7.050 7.100 6.980 6.9007895.19-1.13 7.020 2012-01-13 6.950 7.000 6.700 6.6906611.87-4.01 6.913 2012-01-16 6.680 6.750 6.510 6.4802941.63-2.84 6.813 2012-01-17 6.660 6.880 6.860 6.4608642.57 5.38 6.822 2012-01-18 7.000 7.300 6.890 6.880 13075.40 0.44 6.788 2012-01-19 6.690 6.950 6.890 6.6806117.32 0.00 6.770 2012-01-20 6.870 7.080 7.010 6.8706813.09 1.74 6.832 ma10ma20v_ma5 v_ma10 v_ma20 turnover date 2012-01-11 7.060 7.060 14129.96 14129.96 14129.96 0.48 2012-01-12 7.020 7.020 11012.58 11012.58 11012.58 0.27 2012-01-13 6.913 6.9139545.679545.679545.67 0.23 2012-01-16 6.813 6.8137894.667894.667894.66 0.10 2012-01-17 6.822 6.8228044.248044.248044.24 0.30 2012-01-18 6.833 6.8337833.338882.778882.77 0.45 2012-01-19 6.841 6.8417477.768487.718487.71 0.21 2012-01-20 6.863 6.8637518.008278.388278.38 0.23
https://github.com/waditu/tushare
gopupgopup 项目所采集的数据皆来自公开的数据源，不涉及任何个人隐私数据和非公开数据。不过同样的，部分接口是需要注册 token 才能使用的。
import gopup as gp df = gp.weibo_index(word=疫情, time_type=1hour) print(df)
output:
疫情 index 2022-12-17 18:15:0018544 2022-12-17 18:20:0014927 2022-12-17 18:25:0013004 2022-12-17 18:30:0013145 2022-12-17 18:35:0013485 2022-12-17 18:40:0014091 2022-12-17 18:45:0014265 2022-12-17 18:50:0014115 2022-12-17 18:55:0015313 2022-12-17 19:00:0014346 2022-12-17 19:05:0014457 2022-12-17 19:10:0013495 2022-12-17 19:15:0014133
https://github.com/justinzm/gopup
generalnewsextractor该项目基于《基于文本及符号密度的网页正文提取方法》论文，使用 python 实现的正文抽取器，可以用来提取 html 中正文的内容、作者、标题。
>>> from gne import generalnewsextractor >>> html = '''经过渲染的网页 html 代码''' >>> extractor = generalnewsextractor() >>> result = extractor.extract(html, noise_node_list=['//div[@]']) >>> print(result)
output:
{title: xxxx, publish_time: 2019-09-10 11:12:13, author: yyy, content: zzzz, images: [/xxx.jpg, /yyy.png]}
新闻页提取示例
https://github.com/generalnewsextractor/generalnewsextractor
爬虫爬虫也是 python 语言的一大应用方向，很多朋友也都是以爬虫来入门的，我们来看看有哪些优秀的爬虫项目吧
playwright-python微软开源的浏览器自动化工具，可以用 python 语言操作浏览器。支持 linux、macos、windows 系统下的 chromium、firefox 和 webkit 浏览器。
from playwright.sync_api import sync_playwright with sync_playwright() as p: for browser_type in [p.chromium, p.firefox, p.webkit]: browser = browser_type.launch() page = browser.new_page() page.goto('http://whatsmyuseragent.org/') page.screenshot(path=f'example-{browser_type.name}.png') browser.close()
https://github.com/microsoft/playwright-python
awesome-python-login-model该项目收集了各大网站登陆方式和部分网站的爬虫程序。登陆方式实现包含 selenium 登录、通过抓包直接模拟登录等。有助于新手研究、编写爬虫。
不过众所周知，爬虫是非常吃后期维护的，该项目已经很久没有更新了，所以各种登录接口是否还能正常使用，还存在疑问，大家选择使用，或者自行二次开发。
https://github.com/kr1s77/awesome-python-login-model
decryptlogin相比于上一个，该项目则还在持续更新，同样是模拟登录各大网站，对于新手还是非常有研究价值的。
from decryptlogin import login # the instanced login class object lg = login.login() # use the provided api function to login in the target website (e.g., twitter) infos_return, session = lg.twitter(username='your username', password='your password')
https://github.com/charlespikachu/decryptlogin
scyllascylla 是一款高质量的免费代理 ip 池工具，当前仅支持 python 3.6。
http://localhost:8899/api/v1/stats
output:
{ median: 181.2566407083, valid_count: 1780, total_count: 9528, mean: 174.3290085201 }
https://github.com/scylladb/scylladb
proxypool爬虫代理ip池项目，主要功能为定时采集网上发布的免费代理验证入库，定时验证入库的代理保证代理的可用性，提供api和cli两种使用方式。同时也可以扩展代理源以增加代理池ip的质量和数量。该项目设计文档详细、模块结构简明易懂，同时适合爬虫新手更好的学习爬虫技术。
import requests def get_proxy(): return requests.get(http://127.0.0.1:5010/get/).json() def delete_proxy(proxy): requests.get(http://127.0.0.1:5010/delete/?proxy={}.format(proxy)) # your spider code def gethtml(): # .... retry_count = 5 proxy = get_proxy().get(proxy) while retry_count > 0: try: html = requests.get('http://www.example.com', proxies={http: http://{}.format(proxy)}) # 使用代理访问 return html except exception: retry_count -= 1 # 删除代理池中代理 delete_proxy(proxy) return none
https://github.com/python3webspider/proxypool
getproxygetproxy 是一个抓取发放代理网站，获取 http/https 代理的程序，每 15 min 更新数据。
(test2.7) ➜~ getproxy info:getproxy.getproxy:[*] init info:getproxy.getproxy:[*] current ip address: 1.1.1.1 info:getproxy.getproxy:[*] load input proxies info:getproxy.getproxy:[*] validate input proxies info:getproxy.getproxy:[*] load plugins info:getproxy.getproxy:[*] grab proxies info:getproxy.getproxy:[*] validate web proxies info:getproxy.getproxy:[*] check 6666 proxies, got 666 valid proxies ...
https://github.com/fate0/getproxy
freeproxy同样是一个抓取免费代理的项目，该项目支持抓取的代理网站非常多，而且使用简单。
from freeproxy import freeproxy proxy_sources = ['proxylistplus', 'kuaidaili'] fp_client = freeproxy.freeproxy(proxy_sources=proxy_sources) headers = { 'user-agent': 'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/98.0.4758.102 safari/537.36' } response = fp_client.get('https://space.bilibili.com/406756145', headers=headers) print(response.text)
https://github.com/charlespikachu/freeproxy
fake-useragent伪装浏览器身份，常用于爬虫。这个项目的代码很少，可以阅读一下，看看ua.random是如何返回随机的浏览器身份的。
from fake_useragent import useragent ua = useragent() ua.ie # mozilla/5.0 (windows; u; msie 9.0; windows nt 9.0; en-us); ua.msie # mozilla/5.0 (compatible; msie 10.0; macintosh; intel mac os x 10_7_3; trident/6.0)' ua['internet explorer'] # mozilla/5.0 (compatible; msie 8.0; windows nt 6.1; trident/4.0; gtb7.4; infopath.2; sv1; .net clr 3.3.69573; wow64; en-us) ua.opera # opera/9.80 (x11; linux i686; u; ru) presto/2.8.131 version/11.11 ua.chrome # mozilla/5.0 (windows nt 6.1) applewebkit/537.2 (khtml, like gecko) chrome/22.0.1216.0 safari/537.2' ua.google # mozilla/5.0 (macintosh; intel mac os x 10_7_4) applewebkit/537.13 (khtml, like gecko) chrome/24.0.1290.1 safari/537.13 ua['google chrome'] # mozilla/5.0 (x11; cros i686 2268.111.0) applewebkit/536.11 (khtml, like gecko) chrome/20.0.1132.57 safari/536.11 ua.firefox # mozilla/5.0 (windows nt 6.2; win64; x64; rv:16.0.1) gecko/20121011 firefox/16.0.1 ua.ff # mozilla/5.0 (x11; ubuntu; linux i686; rv:15.0) gecko/20100101 firefox/15.0.1 ua.safari # mozilla/5.0 (ipad; cpu os 6_0 like mac os x) applewebkit/536.26 (khtml, like gecko) version/6.0 mobile/10a5355d safari/8536.25 # and the best one, get a random browser user-agent string ua.random
https://github.com/fake-useragent/fake-useragent
web 相关python web 有太多优秀且老牌的库了，比如 django，flask 就不说了，大家都知道，我们介绍几个小众但是好用的。
streamlitstreamlit 能够快速地把数据制作成可视化、交互页面的 python 框架。分分钟让我们的数据变成图表。
import streamlit as st x = st.slider('select a value') st.write(x, 'squared is', x * x)
output:
https://github.com/streamlit/streamlit
wagtail是一个强大的开源 django cms（内容管理系统）。首先该项目更新、迭代活跃，其次项目首页提到的功能都是免费的，没有付费解锁的骚操作。专注于内容管理，不束缚前端实现。
https://github.com/wagtail/wagtail
fastapi基于 python 3.6+ 的高性能 web 框架。“人如其名”用 fastapi 写接口那叫一个快、调试方便，python 在进步而它基于这些进步，让 web 开发变得更快、更强。
from typing import union from fastapi import fastapi app = fastapi() @app.get(/) def read_root(): return {hello: world} @app.get(/items/{item_id}) def read_item(item_id: int, q: union[str, none] = none): return {item_id: item_id, q: q}
https://github.com/tiangolo/fastapi
django-blog-tutorial这是一个 django 使用教程，该项目一步步带我们使用 django 从零开发一个个人博客系统，在实践的同时掌握 django 的开发技巧。
https://github.com/jukanntenn/django-blog-tutorial
dashdash 是一个专门为机器学习而来的 web 框架，通过该框架可以快速搭建一个机器学习 app。
https://github.com/plotly/dash
pywebio同样是一个非常优秀的 python web 框架，在不需要编写前端代码的情况下就可以完成整个 web 页面的搭建，实在是方便。
https://github.com/pywebio/pywebio
python 教程practical-python一个人气超高的 python 学习资源项目，是 markdown 格式的教程，非常友好。
https://github.com/dabeaz-course/practical-python
learn-python3一个 python3 的教程，该教程采用 jupyter notebooks 形式，便于运行和阅读。并且还包含了练习题，对新手友好。
https://github.com/jerry-git/learn-python3
python-guiderequests 库的作者——kennethreitz，写的 python 入门教程。不单单是语法层面的，涵盖项目结构、代码风格，进阶、工具等方方面面。一起在教程中领略大神的风采吧~
https://github.com/realpython/python-guide
其他pytools这是一位大神编写的类似工具集的项目，里面包含了众多有趣的小工具。
截图只是冰山一角，全貌需要大家自行探索了
import random from pytools import pytools tool_client = pytools.pytools() all_supports = tool_client.getallsupported() tool_client.execute(random.choice(list(all_supports.values())))
https://github.com/charlespikachu/pytools
amazing-qr可以生成动态、彩色、各式各样的二维码，真是个有趣的库。
#3 -n, -d amzqr https://github.com -n github_qr.jpg -d .../paths/
https://github.com/x-hw/amazing-qr
shsh 是一个成熟的，用于替代 subprocess 的库，它允许我们调用任何程序，看起来它就是一个函数一样。
$> ./run.sh functionaltests.test_unicode_arg
https://github.com/amoffat/sh
tqdm强大、快速、易扩展的 python 进度条库。
from tqdm import tqdm for i in tqdm(range(10000)): ...
https://github.com/tqdm/tqdm
loguru一个让 python 记录日志变得简单的库。
from loguru import logger logger.debug(that's it, beautiful and simple logging!)
https://github.com/delgan/loguru
clickpython 的第三方库，用于快速创建命令行。支持装饰器方式调用、多种参数类型、自动生成帮助信息等。
import click @click.command() @click.option(--count, default=1, help=number of greetings.) @click.option(--name, prompt=your name, help=the person to greet.) def hello(count, name): simple program that greets name for a total of count times. for _ in range(count): click.echo(fhello, {name}!) if __name__ == '__main__': hello()
output:
$ python hello.py --count=3 your name: click hello, click! hello, click! hello, click!
keymousegopython 实现的精简绿色版按键精灵，记录用户的鼠标、键盘操作，自动执行之前记录的操作，可设定执行的次数。在进行某些简单、单调重复的操作时，使用该软件可以十分省事儿。只需要录制一遍，剩下的交给 keymousego 来做就可以了。
https://github.com/taojy123/keymousego
以上就是那些有趣好玩强大的python库的详细内容。

那些有趣好玩强大的Python库

推荐信息