Python Requests库学习笔记

Requests库学习笔记

Requests库学习笔记

简介

Requests: HTTP for Humans

该库作者(KENNETH REITZ)

http://httpbin.org

利用 GitHub API 学习

https://developer.github.com

请求方法

  • GET: 查看资源
  • POST: 增加资源
  • PUT: 修改资料
  • DELETE: 删除资源
  • HEAD: 查看响应头
  • OPTIONS: 查看可用请求方法

带参数的请求

  1. URL Parameters: URL参数

  2. 表单参数提交:

    • Content-Type: application/x-www-form-urlencoded
    • 内容: key1=value1&key2=value2
    • requests.post(url,data={‘key1’:’value1’,’key2’:’value2’})
  3. json参数提交:

    • Content-Type: application/json
    • 内容: ‘{‘key1’:’value1’,’key2’:’value2’}’
    • requests.post(url,json={‘key1’:’value1’,’key2’:’value2’})

请求异常处理

requests库包括的异常都继承自requests.exceptions.RequestException

这是request库包含的所有异常
  • BaseHTTPError
  • ChunkedEncodingError
  • ConnectTimeout
  • ConnectionError
  • ContentDecodingError
  • HTTPError
  • InvalidSchema
  • InvalidURL
  • MissingSchema
  • ProxyError
  • ReadTimeout
  • RequestException
  • RetryError
  • SSLError
  • StreamConsumedError
  • Timeout
  • TooMangRedirects
  • URLRequired
1
2
3
4
5
其中常见的
ConnectionError 由于网络原因,无法建立连接。
HTTPError 如果响应的状态码不为200,Response.raise\_for\_status()会抛出HTTPError 异常。
Timeout 超时异常。
TooManyRedirects 若请求超过了设定的最大重定向次数,则会抛出一TooManyRedirects 异常。
  • 请求超时处理
1
2
3
4
requests.get(url,timeout=(3,7))   # TCP协议三次握手,对于1过程等待3秒,2过程等待7秒
requests.get(url,timeout=10) # 总过程10秒

> requests.exceptions.Timeout
  • HTTPError
1
> requests.exceptions.HTTPError

自定义 Request

通过Request Session实现

1
from requests import Request, Session

这是实现之前的一个实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# 这是调用github一些的api写的实例
#! /usr/bin/.env python3
#-*- coding: utf-8 -*-
import json
import requests

url = 'http://api.github.com'


def build_uri(endpoint):
return '/'.join([url, endpoint])


def better_print(json_str):
return json.dumps(json.loads(json_str), indent=4)


def request_method():
response = requests.get(build_uri('user/emails'),
auth=('usrname', 'password')) # GitHub的用户名及密码
print(better_print(response.text))


def params_request():
response = requests.get(build_uri('users'), params={'since': 11})
print(better_print(response.text))
print(response.headers)
print(response.url)


def json_request():
response = requests.patch(
build_uri('user'),
auth=('usrname', 'password'), # GitHub的用户名及密码
json={'name': 'xxx',
'email': '111@qq.com'}) # 想改的名字和邮箱
print(better_print(response.text))
print(response.headers)
print(response.body)
print(response.status_code)
print(response.reason)


def timeout_request():
try:
response = requests.get(build_uri('user/emails'), timeout=0.1)
response.raise_for_status()
except requests.exceptions.Timeout as e:
print(e)
except requests.exceptions.HTTPError as e:
print(e)
else:
print(requests.text)
print(response.text)
print(response.status_code)


def hard_requests():
from requests import Request, Session
s = Session()
headers = {'User-Agent': 'fake1.3.4'} # 伪造UA
req = requests.Request(
'GET',
build_uri('user/emails'),
auth=('usrname', 'password'), # GitHub的用户名及密码
headers=headers)
prepped = req.prepare()
print(prepped.body)
print(prepped.headers)

resp = s.send(prepped, timeout=5)
print(resp.status_code)
print(resp.request.headers)
print(resp.text)


if __name__ == '__main__':
hard_requests()

响应基本API

百度百科:HTTP状态码

1XX:消息
2XX:成功
3XX:重定向
4XX:请求(客户端)错误
5XX:服务器错误
600:源站没有返回响应头部,只返回实体内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
>>> import requests
>>> response = requests.get('https://api.github.com')
>>> response.status_code
200
>>> response.reason
'OK'
>>> response.headers
{'X-GitHub-Media-Type': 'github.v3; format=json', 'X-RateLimit-Reset': '1488289215', 'Content-Type': 'application/json; charset=utf-8', 'ETag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'Date': 'Tue, 28 Feb 2017 12:40:15 GMT', 'Content-Security-Policy': "default-src 'none'", 'Transfer-Encoding': 'chunked', 'Server': 'GitHub.com', 'Content-Encoding': 'gzip', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Expose-Headers': 'ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval', 'X-GitHub-Request-Id': '6E6D:0DDE:61E3A4:7ACD2E:58B56FA3', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Served-By': '07ff1c8a09e44b62e277fae50a1b1dc4', 'Access-Control-Allow-Origin': '*', 'X-Frame-Options': 'deny', 'Vary': 'Accept, Accept-Encoding', 'X-RateLimit-Remaining': '59', 'Status': '200 OK', 'X-RateLimit-Limit': '60'}
>>> response.url
'https://api.github.com/'
>>> response.history
[]
>>> response = requests.get('http://api.github.com')
>>> response.history
[<Response [301]>]
>>> response.elapsed
datetime.timedelta(0, 1, 664629)
>>> response.request
<PreparedRequest [GET]>
>>> response
<Response [200]>
>>> response.encoding
'utf-8'
>>> response.raw.read(10)
b''
>>> response.json()['team_url']
'https://api.github.com/teams'
>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

进阶

自动下载图片/文件

  1. 浏览器模拟
  2. 构建request
  3. 读取流data
  4. 存入数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
'''demo:下载图片
'''
#-*- coding: utf-8 -*-

import requests

def download_image():

headers = {
'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
url = 'https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1488296531013&di=689048e17449c05f7d55698526a71ccc&imgtype=0&src=http%3A%2F%2Fpic.baike.soso.com%2Fp%2F20130515%2F20130515150325-379335333.jpg'
response = requests.get(url, headers=headers, stream=True) # 流传入,这种方法没有关闭流
with open('demo.jpg', 'wb') as fd:
for chunk in response.iter_content(128):
fd.write(chunk)

def download_image_improved(): # 这里使用了一个python的语法糖,来关闭流
# 伪造headers信息
headers = {
'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
# 限定url,这是我们需要下载图片的地址
url = 'https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1488296531013&di=689048e17449c05f7d55698526a71ccc&imgtype=0&src=http%3A%2F%2Fpic.baike.soso.com%2Fp%2F20130515%2F20130515150325-379335333.jpg'
response = requests.get(url, headers=headers, stream=True)
from contextlib import closing
with closing(requests.get(url, headers=headers, stream=True)) as response:
# 打开文件
with open('demo1.jpg', 'wb') as fd:
# 每128写入一次
for chunk in response.iter_content(128):
fd.write(chunk)

download_image_improved()

事件钩子(Event Hooks)

1
2
3
4
5
6
7
8
9
10
11
12
13
#-*- coding: utf-8 -*-
import requests


def get_key_info(response, *args, **kwargs):
print(response.headers['Content-Type'])


def main():
requests.get('https://www.baidu.com', hooks=dict(response=get_key_info))


main()

HTTP认证

![](https://ww2.sinaimg.cn/large/006tKfTcgy1fd6i610odyj31480k6gse.jpg)
HTTP基本认证

OAUTH认证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#-*- coding: utf-8 -*-
import requests

url = 'https://api.github.com'


def construct_url(endpoint):
return '/'.join([url, endpoint])


def basic_auth():
response = requests.get(construct_url('user'),
auth=('usrname', 'password'))
print(response.text)
print(response.request.headers)

def basic_oauth(): # 第一种实现
headers = {'Authorization':'token xxxxPersonal access tokens'} #这里token是GitHub的Personal access tokens
response = requests.get(construct_url('user/emails'),headers=headers)
print(response.request.headers)
print(response.text)
print(response.status_code)

from requests.auth import AuthBase # 第二种利用面向对象实现

class GithubAuth(AuthBase):
def __init__(self, token):
self.token = token

def __call__(self, r):
# requests 加 headers
r.headers['Authorization']=' '.join(['token',self.token])
return r


def oauth_advanced():
auth = GithubAuth('xxxxPersonal access tokens')
response = requests.get(construct_url('user/emails'),auth=auth)
print(response.request.headers)
print(response.text)
print(response.status_code)

basic_oauth()
oauth_advanced()

代理(Proxy)

  1. 启动代理服务Heroku
  2. 在主机1080端口启动Socks服务
  3. 将请求转发到1080端口
  4. 获取相应资源
1
2
3
4
5
6
7
8
9
import requests
proxies = {
'http': 'socks5://127.0.0.1:1080',
'https': 'socks5://127.0.0.1:1080'
}
url = 'https://www.facebook.com'
response = requests.get(url, timeout=3) # 这里没有应用代理访问不了
response = requests.get(url, proxies=proxies, timeout=3) # 这里成功访问
print(response.status_code) # 返回200
  • Session 服务器存储数据
  • Cookie 浏览器存储数据

Session 的作用就是它在 Web服务器上保持用户的状态信息供在任何时间从任何设备上的页面进行访问。因为浏览器不需要存储任何这种信息,所以可以使用任何浏览器,即使是像 Pad 或手机这样的浏览器设备。

“Cookie”是小量信息,由网络服务器发送出来以存储在网络浏览器上,从而下次这位独一无二的访客又回到该网络服务器时,可从该浏览器读回此信息。这是很有用的,让浏览器记住这位访客的特定信息,如上次访问的位置、花费的时间或用户首选项(如样式表)。Cookie 是个存储在浏览器目录的文本文件,当浏览器运行时,存储在 RAM 中。一旦你从该网站或网络服务器退出,Cookie 也可存储在计算机的硬驱上。当访客结束其浏览器对话时,即终止的所有 Cookie。


源码(python3.x)改自iMOOC的课程
Python-走进Requests库

Buy me a coffe. XD