Python實(shí)戰(zhàn)實(shí)現(xiàn)爬取天氣數(shù)據(jù)并完成可視化分析詳解
實(shí)現(xiàn)需求:
從網(wǎng)上(隨便一個(gè)網(wǎng)址,我爬的網(wǎng)址會(huì)在評(píng)論區(qū)告訴大家,dddd)獲取某一年的歷史天氣信息,包括每天最高氣溫、最低氣溫、天氣狀況、風(fēng)向等,完成以下功能:
(1)將獲取的數(shù)據(jù)信息存儲(chǔ)到csv格式的文件中,文件命名為”城市名稱.csv”,其中每行數(shù)據(jù)格式為“日期,最高溫,最低溫,天氣,風(fēng)向”;
(2)在數(shù)據(jù)中增加“平均溫度”一列,其中:平均溫度=(最高溫+最低溫)/2,在同一張圖中繪制兩個(gè)城市一年平均氣溫走勢(shì)折線圖;
(3)統(tǒng)計(jì)兩個(gè)城市各類天氣的天數(shù),并繪制條形圖進(jìn)行對(duì)比,假設(shè)適合旅游的城市指數(shù)由多云天氣占比0.3,晴天占比0.4,陰天數(shù)占比0.3,試比較兩個(gè)城市中哪個(gè)城市更適合旅游;
(4)統(tǒng)計(jì)這兩個(gè)城市每個(gè)月的平均氣溫,繪制折線圖,并通過(guò)折線圖分析該城市的哪個(gè)月最適合旅游;
(5)統(tǒng)計(jì)出這兩個(gè)城市一年中,平均氣溫在18~25度,風(fēng)力小于5級(jí)的天數(shù),并假設(shè)該類天氣數(shù)越多,城市就越適宜居住,判斷哪個(gè)城市更適合居住;
爬蟲(chóng)代碼:
import random
import time
from spider.data_storage import DataStorage
from spider.html_downloader import HtmlDownloader
from spider.html_parser import HtmlParser
class SpiderMain:
def __init__(self):
self.html_downloader=HtmlDownloader()
self.html_parser=HtmlParser()
self.data_storage=DataStorage()
def start(self):
"""
爬蟲(chóng)啟動(dòng)方法
將獲取的url使用下載器進(jìn)行下載
將html進(jìn)行解析
數(shù)據(jù)存取
:return:
"""
for i in range(1,13): # 采用循環(huán)的方式進(jìn)行依次爬取
time.sleep(random.randint(0, 10)) # 隨機(jī)睡眠0到40s防止ip被封
url="XXXX"
if i<10:
url =url+"20210"+str(i)+".html" # 拼接url
else:
url=url+"2021"+str(i)+".html"
html=self.html_downloader.download(url)
resultWeather=self.html_parser.parser(html)
if i==1:
t = ["日期", "最高氣溫", "最低氣溫", "天氣", "風(fēng)向"]
resultWeather.insert(0,t)
self.data_storage.storage(resultWeather)
if __name__=="__main__":
main=SpiderMain()
main.start()import requests as requests
class HtmlDownloader:
def download(self,url):
"""
根據(jù)給定的url下載網(wǎng)頁(yè)
:param url:
:return: 下載好的文本
"""
headers = {"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0"}
result = requests.get(url,headers=headers)
return result.content.decode('utf-8')此處大家需要注意,將User-Agent換成自己瀏覽器訪問(wèn)該網(wǎng)址的,具體如何查看呢,其實(shí)很簡(jiǎn)單,只需大家進(jìn)入網(wǎng)站后,右鍵網(wǎng)頁(yè),然后點(diǎn)擊檢查將出現(xiàn)這樣的界面:

然后只需再點(diǎn)擊網(wǎng)絡(luò),再隨便點(diǎn)擊一個(gè)請(qǐng)求,如下圖:

就可以進(jìn)入如下圖,然后再?gòu)?fù)制,圖中User-Agent的內(nèi)容就好了!

繼續(xù):
from bs4 import BeautifulSoup
class HtmlParser:
def parser(self,html):
"""
解析給定的html
:param html:
:return: area set
"""
weather = []
bs = BeautifulSoup(html, "html.parser")
body = bs.body # 獲取html中的body部分
div = body.find('div', {'class:', 'tian_three'}) # 獲取class為tian_three的<div></div>
ul = div.find('ul') # 獲取div中的<ul></ul>
li = ul.find_all('li') # 獲取ul中的所有<li></li>
for l in li:
tempWeather = []
div1 = l.find_all("div") # 獲取當(dāng)前l(fā)i中的所有div
for i in div1:
tempStr = i.string.replace("℃", "") # 將℃進(jìn)行替換
tempStr = tempStr.replace(" ", "") # 替換空格
tempWeather.append(tempStr)
weather.append(tempWeather)
return weatherimport pandas as pd
class DataStorage:
def storage(self,weather):
"""
數(shù)據(jù)存儲(chǔ)
:param weather list
:return:
"""
data = pd.DataFrame(columns=weather[0], data=weather[1:]) # 格式化數(shù)據(jù)
data.to_csv("C:\\Users\\86183\\Desktop\\成都.csv", index=False, sep=",",mode="a") # 保存到csv文件當(dāng)中注意,文件保存路徑該成你們自己的哦!
ok,爬取代碼就到這,接下來(lái)是圖形化效果大致如下:




代碼如下:
import pandas as pd
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["font.sans-serif"] = ["SimHei"] # 設(shè)置字體
plt.rcParams["axes.unicode_minus"] = False # 該語(yǔ)句解決圖像中的“-”負(fù)號(hào)的亂碼問(wèn)題
def broken_line_chart(x, y1, y2): # 折線圖繪制函數(shù)
plt.figure(dpi=500, figsize=(10, 5))
plt.title("瀘州-成都每日平均氣溫折線圖")
plt.plot(x, y1, color='cyan', label='瀘州')
plt.plot(x, y2, color='yellow', label='成都')
# 獲取圖的坐標(biāo)信息
coordinates = plt.gca()
# 設(shè)置x軸每個(gè)刻度的間隔天數(shù)
xLocator = mpl.ticker.MultipleLocator(30)
coordinates.xaxis.set_major_locator(xLocator)
# 將日期旋轉(zhuǎn)30°
plt.xticks(rotation=30)
plt.xticks(fontsize=8)
plt.ylabel("溫度(℃)")
plt.xlabel("日期")
plt.legend()
plt.savefig("平均氣溫走勢(shì)折線圖.png") # 平均氣溫折線圖
plt.show()
plt.close()
data_luZhou = pd.read_csv('C:\\Users\\86183\\Desktop\\瀘州.csv')
data_chengdu = pd.read_csv('C:\\Users\\86183\\Desktop\\成都.csv')
# 將列的名稱轉(zhuǎn)為列表類型方便添加
columS = data_luZhou.columns.tolist()
columY = data_chengdu.columns.tolist()
# 將數(shù)據(jù)轉(zhuǎn)換為列表
data_luZhou=np.array(data_luZhou).tolist()
data_chengdu=np.array(data_chengdu).tolist()
# 在最開(kāi)始的位置上添加列的名字
data_luZhou.insert(0, columS)
data_chengdu.insert(0, columY)
# 添加平均氣溫列
data_luZhou[0].append("平均氣溫")
data_chengdu[0].append("平均氣溫")
weather_dict_luZhou = {}
weather_dict_chengdu = {}
for i in range(1, len(data_luZhou)):
# 去除日期中的星期
data_luZhou[i][0] = data_luZhou[i][0][0:10]
data_chengdu[i][0] = data_chengdu[i][0][0:10]
# 獲取平均氣溫
average_luZhou = int((int(data_luZhou[i][1]) + int(data_luZhou[i][2])) / 2)
average_chengdu = int((int(data_chengdu[i][1]) + int(data_chengdu[i][2])) / 2)
# 將平均氣溫添加進(jìn)入列表中
data_luZhou[i].append(average_luZhou)
data_chengdu[i].append(average_chengdu)
# 將新的數(shù)據(jù)存入新的csv中
new_data_luZhou = pd.DataFrame(columns=data_luZhou[0], data=data_luZhou[1:])
new_data_chengdu = pd.DataFrame(columns=data_chengdu[0], data=data_chengdu[1:])
new_data_luZhou.to_csv("D:/PythonProject/spider/瀘州.csv", index=False, sep=",")
new_data_chengdu.to_csv("D:/PythonProject/spider/成都.csv", index=False, sep=",")
# 折線圖的繪制
y1 = np.array(new_data_luZhou.get("平均氣溫")).tolist()
y2 = np.array(new_data_chengdu.get("平均氣溫")).tolist()
x = np.array(new_data_luZhou.get("日期")).tolist()
broken_line_chart(x, y1, y2)
# 進(jìn)行每個(gè)月的平均氣溫求解
new_data_luZhou["日期"] = pd.to_datetime(new_data_luZhou["日期"])
new_data_chengdu["日期"] = pd.to_datetime(new_data_chengdu["日期"])
new_data_luZhou.set_index("日期", inplace=True)
new_data_chengdu.set_index("日期", inplace=True)
# 按月進(jìn)行平均氣溫的求取
month_l = new_data_luZhou.resample('m').mean()
month_l = np.array(month_l).tolist()
month_c = new_data_chengdu.resample('m').mean()
month_c = np.array(month_c).tolist()
length = len(month_c)
month_average_l = []
month_average_c = []
for i in range(length):
month_average_l.append(month_l[i][2])
month_average_c.append(month_c[i][2])
month_list = [str(i) + "月" for i in range(1, 13)]
plt.figure(dpi=500, figsize=(10, 5))
plt.title("瀘州-成都每月平均折線氣溫圖")
plt.plot(month_list, month_average_l, color="cyan",label="瀘州", marker='o')
plt.plot(month_list, month_average_c, color="blue",label='成都', marker='v')
for a, b in zip(month_list, month_average_l):
plt.text(a, b + 0.5, '%.2f' % b, horizontalalignment='center', verticalalignment='bottom', fontsize=6)
for a, b in zip(month_list, month_average_c):
plt.text(a, b - 0.5, '%.2f' % b, horizontalalignment='center', verticalalignment='bottom', fontsize=6)
plt.legend()
plt.xlabel("月份")
plt.ylabel("溫度(℃)")
plt.savefig("月平均氣溫折線圖.png") # 月平均氣溫折線圖
plt.show()
#
# 只獲取兩列的數(shù)據(jù)
data_l = pd.read_csv("瀘州.csv", usecols=['風(fēng)向', '平均氣溫'])
data_c = pd.read_csv("成都.csv", usecols=['風(fēng)向', '平均氣溫'])
data_l = np.array(data_l).tolist()
data_c = np.array(data_c).tolist()
day_c = 0
day_l = 0
for i in range(len(data_l)):
if len(data_l[i][0]) == 5:
if int(data_l[i][0][3]) < 5 and 18 <= int(data_l[i][1]) <= 25:
day_l += 1
else:
if int(data_l[i][0][2]) < 5 and 18 <= int(data_l[i][1]) <= 25:
day_l += 1
if len(data_c[i][0]) == 5:
if int(data_c[i][0][3]) < 5 and 10 <= int(data_c[i][1]) <= 25:
day_c += 1
else:
if int(data_c[i][0][2]) < 5 and 18 <= int(data_c[i][1]) <= 25:
day_c += 1
plt.figure(dpi=500, figsize=(8, 4))
plt.title("瀘州-成都平均氣溫在18-25且風(fēng)力<5級(jí)的天數(shù)")
list_name = ['瀘州', '成都']
list_days = [day_l, day_c]
plt.bar(list_name, list_days, width=0.5)
plt.text(0, day_l, '%.0f' % day_l, horizontalalignment='center', verticalalignment='bottom', fontsize=7)
plt.text(1, day_c, '%.0f' % day_c, horizontalalignment='center', verticalalignment='bottom', fontsize=7)
plt.xlabel("城市")
plt.ylabel("天數(shù)(d)")
plt.savefig("適宜居住柱形圖.png")
plt.show()
data_l=pd.read_csv("瀘州.csv")
data_c=pd.read_csv("成都.csv")
# 將數(shù)據(jù)轉(zhuǎn)換為列表
data_l=np.array(data_l).tolist()
data_c=np.array(data_c).tolist()
# 獲取每種天氣的天數(shù),采用字典類型進(jìn)行存儲(chǔ)
for i in range(1,365):
weather_l = data_l[i][3]
weather_c = data_c[i][3]
if weather_l in weather_dict_luZhou:
weather_dict_luZhou[weather_l] = weather_dict_luZhou.get(weather_l) + 1
else:
weather_dict_luZhou[weather_l]=1
if weather_c in weather_dict_chengdu:
weather_dict_chengdu[weather_c]=weather_dict_chengdu.get(weather_c)+1
else:
weather_dict_chengdu[weather_c]=1
weather_list_luZhou = list(weather_dict_luZhou)
weather_list_chengdu = list(weather_dict_chengdu)
value_l = []
value_c = []
# 獲取所有的天氣種類
weather_list = sorted(set(weather_list_luZhou + weather_list_chengdu))
# 獲取每種天氣的天數(shù),并將其對(duì)應(yīng)的放入列表中,沒(méi)有的則用0進(jìn)行替代,方便條形圖的繪制。
for i in weather_list:
if i in weather_dict_luZhou:
value_l.append(weather_dict_luZhou[i])
else:
value_l.append(0)
if i in weather_dict_chengdu:
value_c.append(weather_dict_chengdu[i])
else:
value_c.append(0)
# 繪制條形圖進(jìn)行對(duì)比
plt.figure(dpi=500, figsize=(10, 5))
plt.title("瀘州-成都各種天氣情況對(duì)比")
x1 = list(range(len(weather_list)))
x = [i + 0.4 for i in x1]
plt.bar(x1, value_l, width=0.4, color='red', label='瀘州')
plt.bar(x, value_c, width=0.4, color='orange', label='成都')
for a, b in zip(x1, value_l):
plt.text(a, b + 0.4, '%.0f' % b, ha='center', va='bottom', fontsize=7)
for a, b in zip(x, value_c):
plt.text(a, b + 0.4, '%.0f' % b, ha='center', va='bottom', fontsize=7)
plt.xticks(x1, weather_list)
plt.ylabel("天數(shù)")
plt.xlabel("天氣")
plt.xticks(rotation=270)
plt.legend()
plt.savefig("瀘州成都天氣情況對(duì)比.png")
plt.show()
plt.close()好的這次就到這兒吧,我們下次見(jiàn)哦?。?!
到此這篇關(guān)于Python實(shí)戰(zhàn)實(shí)現(xiàn)爬取天氣數(shù)據(jù)并完成可視化分析詳解的文章就介紹到這了,更多相關(guān)Python爬取天氣數(shù)據(jù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
- Python?數(shù)據(jù)可視化超詳細(xì)講解折線圖的實(shí)現(xiàn)
- Python?echarts實(shí)現(xiàn)數(shù)據(jù)可視化實(shí)例詳解
- Python實(shí)現(xiàn)爬取天氣數(shù)據(jù)并可視化分析
- Python?數(shù)據(jù)可視化實(shí)現(xiàn)5種炫酷的動(dòng)態(tài)圖
- 基于Python實(shí)現(xiàn)股票數(shù)據(jù)分析的可視化
- python實(shí)現(xiàn)股票歷史數(shù)據(jù)可視化分析案例
- Python實(shí)現(xiàn)數(shù)據(jù)可視化案例分析
相關(guān)文章
Python中關(guān)于字符串對(duì)象的一些基礎(chǔ)知識(shí)
這篇文章主要介紹了詳解Python中的字符串對(duì)象,關(guān)于字符串的操作和特性是Python學(xué)習(xí)當(dāng)中的基礎(chǔ)知識(shí),需要的朋友可以參考下2015-04-04
利用python Pandas實(shí)現(xiàn)批量拆分Excel與合并Excel
今天帶大家學(xué)習(xí)利用python Pandas實(shí)現(xiàn)批量拆分Excel與合并Excel,文中有非常詳細(xì)的的代碼示例,對(duì)正在學(xué)習(xí)python的小伙伴們很有幫助,需要的朋友可以參考下2021-05-05
python接入GoogleAuth的實(shí)現(xiàn)
經(jīng)常會(huì)用到GoogleAuth作為二次驗(yàn)證碼,本文主要介紹了python接入GoogleAuth的實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2023-08-08
Python通過(guò)Tesseract庫(kù)實(shí)現(xiàn)文字識(shí)別
這篇文章主要介紹了Python通過(guò)Tesseract庫(kù)實(shí)現(xiàn)文字識(shí)別,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-03-03
opencv實(shí)現(xiàn)靜態(tài)手勢(shì)識(shí)別 opencv實(shí)現(xiàn)剪刀石頭布游戲
這篇文章主要為大家詳細(xì)介紹了opencv實(shí)現(xiàn)靜態(tài)手勢(shì)識(shí)別,opencv實(shí)現(xiàn)剪刀石頭布游戲,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2019-01-01
詳解如何使用Python實(shí)現(xiàn)刪除重復(fù)文件
這篇文章主要為大家詳細(xì)介紹了如何利用Python實(shí)現(xiàn)刪除重復(fù)文件功能,文中的示例代碼講解詳細(xì),對(duì)我們學(xué)習(xí)Python有一定幫助,需要的可以參考一下2022-10-10

