使用Python读取Github仓库的star用户
使用Redis的hash存储读取到的用户信息,使用用户的唯一id,作为hash的key
。
必须依赖
PyGithub
pip install PyGithub
Redis
pip install redis
脚本
from github import Github
from concurrent.futures.thread import ThreadPoolExecutor
from multiprocessing import cpu_count
import redis
import json
# -----------------------------------------
# AccessKey
ACCESS_KEY = 'dcaddbccc7c3217aac0eb***********'
# 需要抓取的仓库
REPOSITORY = 'spring-projects/spring-boot'
# 存储用户的hash key
USER_HASH_KEY = 'users'
# Redis数据源配置
REDIS_CONFG = {
'host': 'localhost',
'port': 6379,
'db': 0,
'decode_responses': True
}
# -----------------------------------------
threadPoolExecutor = ThreadPoolExecutor(max_workers = cpu_count() * 2)
g = Github(ACCESS_KEY)
repo = g.get_repo(REPOSITORY)
stargazers = repo.get_stargazers_with_dates()
conection = redis.Redis(**REDIS_CONFG)
def save(future):
user = future.result()._rawData
conection.hset(USER_HASH_KEY, user['id'], json.dumps(user))
print('{id} {login} {email}'.format(**user))
for people in stargazers:
threadPoolExecutor.submit(lambda login: g.get_user(login), people.user.login).add_done_callback(save)
End
本来想在标题写“爬取”的,但是这个实在是算不上爬。以前吭哧吭哧写了好几百行的代码,用PyGithub,几行代码就完成了,总之很感谢这种方便人类的项目。