快捷導(dǎo)航

python清洗疫情歷史數(shù)據(jù)的過(guò)程詳解

更新時(shí)間：2022年05月09日 14:23:11 作者：不愿意做魚的小鯨魚

這篇文章主要介紹了python清洗疫情歷史數(shù)據(jù),包括數(shù)據(jù)獲取方法及使用python讀取csv的詳細(xì)代碼，本文通過(guò)實(shí)例代碼給大家介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值，需要的朋友可以參考下

在我2020年大三的一個(gè)實(shí)訓(xùn)的大作業(yè)中，我整了一個(gè)新冠肺炎疫情的數(shù)據(jù)采集和可視化分析系統(tǒng)，大致就是先找數(shù)據(jù)，然后將數(shù)據(jù)導(dǎo)入hive中，然后使用hive對(duì)數(shù)據(jù)進(jìn)行清洗，然后將清洗后的數(shù)據(jù)使用hql導(dǎo)入MySql，之后就是用ssm開發(fā)后臺(tái)數(shù)據(jù)接口，然后前端使用echarts和表格對(duì)數(shù)據(jù)進(jìn)行可視化。具體可以查看：http://www.dbjr.com.cn/article/179889.htm。由于那時(shí)候主要要求使用hive處理數(shù)據(jù)，但那時(shí)的數(shù)據(jù)是來(lái)自于某位大佬的數(shù)據(jù)接口中獲取的，最后用hive處理再導(dǎo)入數(shù)據(jù)庫(kù)的確是大材小用。因此只是在數(shù)據(jù)的處理上不太妥，其他對(duì)數(shù)據(jù)的處理和數(shù)據(jù)的可視化做的還是不錯(cuò)的。
這次是有位小伙伴也想做一個(gè)疫情的數(shù)據(jù)采集和可視化系統(tǒng)，想借鑒我之前做的，并且讓我指點(diǎn)。那么問(wèn)題就來(lái)了：之前的數(shù)據(jù)是比較少的，直接從網(wǎng)上提供的免費(fèi)接口就可以直接獲取，而現(xiàn)在疫情已經(jīng)過(guò)去了兩年多，如果要整理出歷史各省份、城市每一天的數(shù)據(jù)，那這個(gè)數(shù)據(jù)就相對(duì)龐大，再想找現(xiàn)成的符合功能的接口幾乎是沒(méi)有，因此我做了以下的工作獲取數(shù)據(jù)和處理數(shù)據(jù)：

1. 數(shù)據(jù)獲取

數(shù)據(jù)的來(lái)源是用了GitHub上這個(gè)我收藏了很久的項(xiàng)目：https://lab.isaaclin.cn/nCoV/
數(shù)據(jù)倉(cāng)庫(kù)鏈接：https://github.com/BlankerL/DXY-COVID-19-Data/releases

這個(gè)另外部署了一個(gè)數(shù)據(jù)倉(cāng)庫(kù)，每天0點(diǎn)，程序?qū)?zhǔn)時(shí)執(zhí)行，數(shù)據(jù)會(huì)被推送至Release中。
我們就可以從大佬的那個(gè)數(shù)據(jù)倉(cāng)庫(kù)直接下載現(xiàn)成爬蟲爬取的數(shù)據(jù)，數(shù)據(jù)直接下載csv格式的DXYArea.csv就好了，方便用于做處理。
下載后打開，會(huì)發(fā)現(xiàn)這個(gè)92MB的的文件里面有近100W條數(shù)據(jù)。直接讀取的話肯定會(huì)有點(diǎn)慢了。
因此這時(shí)候我就想到可以嘗試使用python的pandas分塊讀取數(shù)據(jù)，這個(gè)工具對(duì)數(shù)據(jù)處理很方便，對(duì)數(shù)據(jù)的讀取也賊快。

2. 使用python讀取csv

讀取csv選擇使用pandas模塊，使用原生讀取很對(duì)很慢
注：py腳本文件和csv文件放在同一目錄下

import pandas as pd
import numpy as np
# 讀取的文件
filePath = "DXYArea.csv"
# 獲取數(shù)據(jù)
def read_csv_feature(filePath):
    # 讀取文件
    f = open(filePath, encoding='utf-8')
    reader = pd.read_csv(f, sep=',', iterator=True)
    loop = True
    chunkSize = 1000000
    chunks = []
    while loop:
        try:
            chunk = reader.get_chunk(chunkSize)
            chunks.append(chunk)
        except StopIteration:
            loop = False
    df = pd.concat(chunks, axis=0, ignore_index=True)
    f.close()
    return df 
data = read_csv_feature(filePath)
print('數(shù)據(jù)讀取成功---------------')

csv數(shù)據(jù)讀取成功之后，就全部存在data里面了，而這個(gè)data是一個(gè)數(shù)據(jù)集。
可以使用numpy模塊工具對(duì)數(shù)據(jù)集進(jìn)行篩選、導(dǎo)出轉(zhuǎn)換成list,方便對(duì)數(shù)據(jù)進(jìn)行操作

countryName = np.array(data["countryName"])
countryEnglishName = np.array(data["countryEnglishName"])
provinceName = np.array(data["provinceName"])
province_confirmedCount = np.array(data["province_confirmedCount"])
province_curedCount = np.array(data["province_curedCount"])
province_deadCount = np.array(data["province_deadCount"])
updateTime = np.array(data["updateTime"])
cityName = np.array(data["cityName"])
city_confirmedCount = np.array(data["city_confirmedCount"])
city_curedCount = np.array(data["city_curedCount"])
city_deadCount = np.array(data["city_deadCount"])

這樣就把所有需要用到的數(shù)據(jù)篩選出來(lái)了。

3.使用pyhon進(jìn)行數(shù)據(jù)清洗

這里的清洗我還是使用了笨方法，很直接暴力的把數(shù)據(jù)裝進(jìn)對(duì)應(yīng)的list中：

# 全國(guó)歷史數(shù)據(jù)
historyed = list()
# 全國(guó)最新數(shù)據(jù)
totaled = list()
# province最新數(shù)據(jù)
provinceed = list()
# area最新數(shù)據(jù)
areaed = list()
for i in range(len(data)):
    if(countryName[i] == "中國(guó)"):
        updatetimeList = str(updateTime[i]).split(' ')
        updatetime = updatetimeList[0]
        # 處理historyed
        historyed_temp = list()
        if(provinceName[i] == "中國(guó)"):
            # 處理totaled
            if(len(totaled) == 0):
                totaled.append(str(updateTime[i]))
                totaled.append(int(province_confirmedCount[i]))
                totaled.append(int(province_curedCount[i]))
                totaled.append(int(province_deadCount[i]))
            
            if((len(historyed) > 0) and (str(updatetime) != historyed[len(historyed) - 1][0])):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))
            if(len(historyed) == 0):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))
        if(len(historyed_temp) > 0):
            historyed.append(historyed_temp)

        # 處理areaed
        areaed_temp = list()
        if(provinceName[i] != "中國(guó)"):
            if(provinceName[i] != "內(nèi)蒙古自治區(qū)" and provinceName[i] != "黑龍江省"):
                provinceName[i] = provinceName[i][0:2]
            else:
                provinceName[i] = provinceName[i][0:3]
            flag = 1
            for item in areaed:
                if(item[1] == str(cityName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed.append(areaed_temp)
            flag = 1
            for item in areaed_tmp:
                if(item[0] == str(provinceName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed_tmp.append(areaed_temp)
        
# 處理provinceed（需要根據(jù)areaed獲?。?
province_temp = list()
for temp in areaed_tmp:
    if(len(provinceed) == 0 and len(province_temp) == 0):
        province_temp.append(temp[0])
        province_temp.append(temp[2])
        province_temp.append(temp[3])
        province_temp.append(temp[4])
    else:
        if(temp[0] == province_temp[0]):
            province_temp[1] = province_temp[1] + temp[2]
            province_temp[1] = province_temp[2] + temp[3]
            province_temp[1] = province_temp[3] + temp[4]
        else:
            provinceed.append(province_temp)
            province_temp = list()
            province_temp.append(temp[0])
            province_temp.append(temp[2])
            province_temp.append(temp[3])
            province_temp.append(temp[4])
provinceed.append(province_temp)
print('數(shù)據(jù)清洗成功---------------')

這里沒(méi)有什么說(shuō)的，完全是體力活，將上面篩選出來(lái)的數(shù)據(jù)進(jìn)行清洗，需要注意的是要仔細(xì)的觀察讀取出來(lái)的數(shù)據(jù)的數(shù)據(jù)格式，有些數(shù)據(jù)格式不是很標(biāo)準(zhǔn)，需要手動(dòng)處理。

4. 將清洗的數(shù)據(jù)自動(dòng)導(dǎo)入MySql

將數(shù)據(jù)導(dǎo)入Mysql這里還是使用python，使用了python的pymysql模塊

import pymysql
"""
將數(shù)據(jù)導(dǎo)入數(shù)據(jù)庫(kù)
"""
# 打開數(shù)據(jù)庫(kù)連接
db=pymysql.connect(host="localhost",user="root",password="123456",database="yq")
# 使用 cursor() 方法創(chuàng)建一個(gè)游標(biāo)對(duì)象 cursor
cursor = db.cursor()
#創(chuàng)建yq數(shù)據(jù)庫(kù)
cursor.execute('CREATE DATABASE IF NOT EXISTS yq DEFAULT CHARSET utf8 COLLATE utf8_general_ci;')
print('創(chuàng)建yq數(shù)據(jù)庫(kù)成功')
#創(chuàng)建相關(guān)表表
cursor.execute('drop table if exists areaed')
sql="""
CREATE TABLE IF NOT EXISTS `areaed`  (
  `provinceName` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `cityName` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `confirmedCount` int(11) NULL DEFAULT NULL,
  `deadCount` int(11) NULL DEFAULT NULL,
  `curedCount` int(11) NULL DEFAULT NULL,
  `currentCount` int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists provinceed')
sql="""
CREATE TABLE `provinceed`  (
  `provinceName` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `confirmedNum` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `deathsNum` int(11) NULL DEFAULT NULL,
  `curesNum` int(11) NULL DEFAULT NULL,
  `currentNum` int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists totaled')
sql="""
CREATE TABLE `totaled`  (
  `date` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `diagnosed` int(11) NULL DEFAULT NULL,
  `death` int(11) NULL DEFAULT NULL,
  `cured` int(11) NULL DEFAULT NULL,
  `current` int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
cursor.execute('drop table if exists historyed')
sql="""
CREATE TABLE `historyed`  (
  `date` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `confirmedNum` int(11) NULL DEFAULT NULL,
  `deathsNum` int(11) NULL DEFAULT NULL,
  `curesNum` int(11) NULL DEFAULT NULL,
  `currentNum` int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
print('創(chuàng)建相關(guān)表成功')
# 導(dǎo)入historyed
for item in historyed:
    sql='INSERT INTO historyed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出現(xiàn)如下異常%s"%ex)
        db.rollback()
        break
print("導(dǎo)入historyed成功-------------")
# 導(dǎo)入areaed
for item in areaed:
    sql='INSERT INTO areaed VALUES(%s,"%s","%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(item[0],item[1],item[2],item[4],item[3],item[2]-item[3]-item[4]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出現(xiàn)如下異常%s"%ex)
        db.rollback()
        break
print("導(dǎo)入areaed成功-------------")
# 導(dǎo)入provinceed
for item in provinceed:
    sql='INSERT INTO provinceed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出現(xiàn)如下異常%s"%ex)
        db.rollback()
        break
print("導(dǎo)入provinceed成功-------------")
# 導(dǎo)入totaled
sql='INSERT INTO totaled VALUES(%s,"%s","%s","%s","%s")' 
try:
    cursor.execute(sql,(str(totaled[0]),totaled[1],totaled[3],totaled[2],totaled[1]-totaled[2]-totaled[3]))
    db.commit()
except Exception as ex:
    print("error:")
    print("出現(xiàn)如下異常%s"%ex)
    db.rollback()
print("導(dǎo)入totaled成功-------------")
cursor.close()#先關(guān)閉游標(biāo)
db.close()#再關(guān)閉數(shù)據(jù)庫(kù)連接

這里為了腳本的使用方便，首先進(jìn)行了建庫(kù)、然后建表、最后將清洗的數(shù)據(jù)導(dǎo)入MySql

完整代碼

import pandas as pd
import numpy as np
import pymysql
"""
@ProjectName: cleanData
@FileName: cleanData.py
@Author: tao
@Date: 2022/05/03
"""
# 讀取的文件
filePath = "DXYArea.csv"
# 全國(guó)歷史數(shù)據(jù)
historyed = list()
# 全國(guó)最新數(shù)據(jù)
totaled = list()
# province最新數(shù)據(jù)
provinceed = list()
# area最新數(shù)據(jù)
areaed = list()
# 獲取數(shù)據(jù)
def read_csv_feature(filePath):
    # 讀取文件
    f = open(filePath, encoding='utf-8')
    reader = pd.read_csv(f, sep=',', iterator=True)
    loop = True
    chunkSize = 1000000
    chunks = []
    while loop:
        try:
            chunk = reader.get_chunk(chunkSize)
            chunks.append(chunk)
        except StopIteration:
            loop = False
    df = pd.concat(chunks, axis=0, ignore_index=True)
    f.close()
    return df 
data = read_csv_feature(filePath)
print('數(shù)據(jù)讀取成功---------------')
areaed_tmp = list()
countryName = np.array(data["countryName"])
countryEnglishName = np.array(data["countryEnglishName"])
provinceName = np.array(data["provinceName"])
province_confirmedCount = np.array(data["province_confirmedCount"])
province_curedCount = np.array(data["province_curedCount"])
province_deadCount = np.array(data["province_deadCount"])
updateTime = np.array(data["updateTime"])
cityName = np.array(data["cityName"])
city_confirmedCount = np.array(data["city_confirmedCount"])
city_curedCount = np.array(data["city_curedCount"])
city_deadCount = np.array(data["city_deadCount"])
for i in range(len(data)):
    if(countryName[i] == "中國(guó)"):
        updatetimeList = str(updateTime[i]).split(' ')
        updatetime = updatetimeList[0]
        # 處理historyed
        historyed_temp = list()
        if(provinceName[i] == "中國(guó)"):
            # 處理totaled
            if(len(totaled) == 0):
                totaled.append(str(updateTime[i]))
                totaled.append(int(province_confirmedCount[i]))
                totaled.append(int(province_curedCount[i]))
                totaled.append(int(province_deadCount[i]))
            
            if((len(historyed) > 0) and (str(updatetime) != historyed[len(historyed) - 1][0])):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))
            if(len(historyed) == 0):
                historyed_temp.append(str(updatetime))
                historyed_temp.append(int(province_confirmedCount[i]))
                historyed_temp.append(int(province_curedCount[i]))
                historyed_temp.append(int(province_deadCount[i]))
        
        if(len(historyed_temp) > 0):
            historyed.append(historyed_temp)
        # 處理areaed
        areaed_temp = list()
        if(provinceName[i] != "中國(guó)"):
            if(provinceName[i] != "內(nèi)蒙古自治區(qū)" and provinceName[i] != "黑龍江省"):
                provinceName[i] = provinceName[i][0:2]
            else:
                provinceName[i] = provinceName[i][0:3]
            flag = 1
            for item in areaed:
                if(item[1] == str(cityName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed.append(areaed_temp)
            flag = 1
            for item in areaed_tmp:
                if(item[0] == str(provinceName[i])):
                    flag = 0
            if(flag == 1):
                areaed_temp.append(str(provinceName[i]))
                areaed_temp.append(str(cityName[i]))
                areaed_temp.append(int(0 if np.isnan(city_confirmedCount[i]) else city_confirmedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_curedCount[i]) else city_curedCount[i]))
                areaed_temp.append(int(0 if np.isnan(city_deadCount[i]) else city_deadCount[i]))
                areaed_tmp.append(areaed_temp)
        
# 處理provinceed（需要根據(jù)areaed獲?。?
province_temp = list()
for temp in areaed_tmp:
    if(len(provinceed) == 0 and len(province_temp) == 0):
        province_temp.append(temp[0])
        province_temp.append(temp[2])
        province_temp.append(temp[3])
        province_temp.append(temp[4])
    else:
        if(temp[0] == province_temp[0]):
            province_temp[1] = province_temp[1] + temp[2]
            province_temp[1] = province_temp[2] + temp[3]
            province_temp[1] = province_temp[3] + temp[4]
        else:
            provinceed.append(province_temp)
            province_temp = list()
            province_temp.append(temp[0])
            province_temp.append(temp[2])
            province_temp.append(temp[3])
            province_temp.append(temp[4])
provinceed.append(province_temp)
print('數(shù)據(jù)清洗成功---------------')
# print(historyed)
# print(areaed)
print(totaled)
# print(provinceed)
"""
print(len(provinceed))
for item in provinceed:
    print(item[1]-item[2]-item[3])
"""
"""
將數(shù)據(jù)導(dǎo)入數(shù)據(jù)庫(kù)
"""
# 打開數(shù)據(jù)庫(kù)連接
db=pymysql.connect(host="localhost",user="root",password="123456",database="yq")
# 使用 cursor() 方法創(chuàng)建一個(gè)游標(biāo)對(duì)象 cursor
cursor = db.cursor()
#創(chuàng)建yq數(shù)據(jù)庫(kù)
cursor.execute('CREATE DATABASE IF NOT EXISTS yq DEFAULT CHARSET utf8 COLLATE utf8_general_ci;')
print('創(chuàng)建yq數(shù)據(jù)庫(kù)成功')
#創(chuàng)建相關(guān)表表
cursor.execute('drop table if exists areaed')
sql="""
CREATE TABLE IF NOT EXISTS `areaed`  (
  `provinceName` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `cityName` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `confirmedCount` int(11) NULL DEFAULT NULL,
  `deadCount` int(11) NULL DEFAULT NULL,
  `curedCount` int(11) NULL DEFAULT NULL,
  `currentCount` int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists provinceed')
sql="""
CREATE TABLE `provinceed`  (
  `provinceName` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `confirmedNum` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `deathsNum` int(11) NULL DEFAULT NULL,
  `curesNum` int(11) NULL DEFAULT NULL,
  `currentNum` int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
"""
cursor.execute(sql)
cursor.execute('drop table if exists totaled')
sql="""
CREATE TABLE `totaled`  (
  `date` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `diagnosed` int(11) NULL DEFAULT NULL,
  `death` int(11) NULL DEFAULT NULL,
  `cured` int(11) NULL DEFAULT NULL,
  `current` int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
cursor.execute('drop table if exists historyed')
sql="""
CREATE TABLE `historyed`  (
  `date` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `confirmedNum` int(11) NULL DEFAULT NULL,
  `deathsNum` int(11) NULL DEFAULT NULL,
  `curesNum` int(11) NULL DEFAULT NULL,
  `currentNum` int(11) NULL DEFAULT NULL
) ENGINE = MyISAM CHARACTER SET = latin1 COLLATE = latin1_swedish_ci ROW_FORMAT = Dynamic;
"""
cursor.execute(sql)
print('創(chuàng)建相關(guān)表成功')
# 導(dǎo)入historyed
for item in historyed:
    sql='INSERT INTO historyed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出現(xiàn)如下異常%s"%ex)
        db.rollback()
        break
print("導(dǎo)入historyed成功-------------")
# 導(dǎo)入areaed
for item in areaed:
    sql='INSERT INTO areaed VALUES(%s,"%s","%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(item[0],item[1],item[2],item[4],item[3],item[2]-item[3]-item[4]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出現(xiàn)如下異常%s"%ex)
        db.rollback()
        break
print("導(dǎo)入areaed成功-------------")
# 導(dǎo)入provinceed
for item in provinceed:
    sql='INSERT INTO provinceed VALUES(%s,"%s","%s","%s","%s")'
    try:
        cursor.execute(sql,(str(item[0]),item[1],item[3],item[2],item[1]-item[2]-item[3]))
        db.commit()
    except Exception as ex:
        print("error:")
        print("出現(xiàn)如下異常%s"%ex)
        db.rollback()
        break
print("導(dǎo)入provinceed成功-------------")
# 導(dǎo)入totaled
sql='INSERT INTO totaled VALUES(%s,"%s","%s","%s","%s")' 
try:
    cursor.execute(sql,(str(totaled[0]),totaled[1],totaled[3],totaled[2],totaled[1]-totaled[2]-totaled[3]))
    db.commit()
except Exception as ex:
    print("error:")
    print("出現(xiàn)如下異常%s"%ex)
    db.rollback()
print("導(dǎo)入totaled成功-------------")
cursor.close()#先關(guān)閉游標(biāo)
db.close()#再關(guān)閉數(shù)據(jù)庫(kù)連接

腳本運(yùn)行效果

數(shù)據(jù)庫(kù)可以看到以下表和數(shù)據(jù)

最后我們的數(shù)據(jù)就已經(jīng)有了，此時(shí)的數(shù)據(jù)處理的格式還是參照我之前整的新冠肺炎疫情的數(shù)據(jù)采集和可視化分析系統(tǒng)對(duì)接的，集體后臺(tái)和可視化的實(shí)現(xiàn)可以參考：https://qkongtao.cn/?p=514

到此這篇關(guān)于python清洗疫情歷史數(shù)據(jù)的文章就介紹到這了,更多相關(guān)python疫情歷史數(shù)據(jù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: