使用Python读取Github仓库的star用户

使用Python读取Github仓库的star用户

使用Redis的hash存储读取到的用户信息,使用用户的唯一id,作为hash的key

必须依赖

PyGithub

pip install PyGithub

Redis

pip install redis

脚本

from github import Github
from concurrent.futures.thread import ThreadPoolExecutor
from multiprocessing import cpu_count
import redis
import json

# -----------------------------------------
# AccessKey
ACCESS_KEY = 'dcaddbccc7c3217aac0eb***********'

# 需要抓取的仓库
REPOSITORY = 'spring-projects/spring-boot'

# 存储用户的hash key
USER_HASH_KEY = 'users'

# Redis数据源配置
REDIS_CONFG = {
    'host': 'localhost',
    'port': 6379,
    'db': 0,
    'decode_responses': True
}

# -----------------------------------------

threadPoolExecutor = ThreadPoolExecutor(max_workers = cpu_count() * 2)

g = Github(ACCESS_KEY) 
repo = g.get_repo(REPOSITORY)
stargazers = repo.get_stargazers_with_dates()

conection = redis.Redis(**REDIS_CONFG)  

def save(future):
    user = future.result()._rawData
    conection.hset(USER_HASH_KEY, user['id'], json.dumps(user))
    print('{id} {login} {email}'.format(**user))

for people in stargazers:
    threadPoolExecutor.submit(lambda login: g.get_user(login), people.user.login).add_done_callback(save)

End

本来想在标题写“爬取”的,但是这个实在是算不上爬。以前吭哧吭哧写了好几百行的代码,用PyGithub,几行代码就完成了,总之很感谢这种方便人类的项目。