python第三方模块requests模块

it2022-05-05 151

requests库，让http服务人类。requests库的作用就是请求网站获取网页数据。让我们从简单的实例开始讲解requests库的使用方法。 import requests res = requests.get(‘https://www.baidu.com/’) print(res) print(res.text) 运行结果如下：这时打开Chrome浏览器，进入https://www.baidu.com在空白处右击，在弹出的快捷菜单中选择“查看网页源代码”命令，可以看到代码返回的结果就是网页源代码。有时爬虫需要加入请求头来伪装成浏览器，以便更好地抓取数据。在Chrome浏览器中按F12打开Chrome开发者工具，刷新网页后找到User-Agent进行复制，如图：请求头的使用方法： import requests headers = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36’} res = requests.get(‘https://www.baidu.com/’,headers=headers) print(res.text) requests库不仅有get（）方法，还有post（）等方法。post()方法用于提交表单来爬取需要登录才能获得数据的网站，这里不再赘述。学习get()方法足够我们爬取大部分的网站了总体功能演示： import requests headers = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36’} res = requests.get(‘https://www.baidu.com/’,headers=headers) print(res.text) print(type(res)) print(res.status_code) print(res.headers) print(res.content) print(res.cookies) print(res.content.decode(‘utf-8’)) 很多情况下网站如果直接res.text会出现乱码问题，所以需要使用res.content,这样返回的数据格式其实是二进制格式，然后通过decode()转换成utf-8，这样就解决了通过res.text直接返回显示乱码的问题 requests的其他请求方式 import requests headers = {‘User-Agent’:‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36’} res = requests.get(‘https://www.baidu.com/’,headers=headers) #获取html网页的主要方法，对应http的get res1 = requests.post(‘https://www.baidu.com/’,headers=headers) #向html网页提交post请求方法，对应http的post res2 = requests.put(‘https://www.baidu.com/’) #向html网页提交put请求的方法，对应http的put res3 = requests.delete(‘https://www.baidu.com/’) #向html页面提交删除请求，对应http的delete res4 = requests.head(‘https://www.baidu.com/’) #获取html网页头的信息方法，对应http的head res6 = requests.patch(‘https://www.baidu.com/’) #向html网页提交局部修改请求，对应于http的patch

对象的属性 res.status_code：http请求的返回状态 res.text：http响应内容的字符串形式，即：URL对应的页面内容 res.encoding：从http header中猜测的响应内容编码方式 res.apparent_encoding：从内容中分析出的响应内容编码方式 res.content：http响应内容的二进制形式

专利

最新回复(0)