基于Python實(shí)現(xiàn)對(duì)比Exce的工具
目的:設(shè)計(jì)一個(gè)應(yīng)用GUI用于對(duì)比兩個(gè)Excel文件
思路
1.參數(shù)
- 同一個(gè)excel文件兩個(gè)sheet頁(yè)其中一個(gè)ODS(老數(shù)據(jù)),一個(gè)DWH(新數(shù)據(jù))
- 生成對(duì)比文件
- 設(shè)計(jì)兩個(gè)主鍵 輸入主鍵1 輸入主鍵2
(默認(rèn)新舊文件列名一致)
2.效果
- 生成的文件
- 數(shù)據(jù)量一樣、取每個(gè)字段不一致的數(shù)據(jù)前10
- 數(shù)據(jù)量不一樣、取兩邊不一樣的數(shù)據(jù)前10、排除不一樣的數(shù)據(jù)、每個(gè)字段不一致的數(shù)據(jù)前10
3.實(shí)現(xiàn)
- 循環(huán)對(duì)比組合列(主鍵+對(duì)比列)
- pandas處理差異數(shù)據(jù)、openpyxl 處理生成的sheet的數(shù)據(jù)格式. (先生成數(shù)據(jù),然后調(diào)整格式)
配置
import pandas as pd from openpyxl import load_workbook #選擇文件路徑 path=r"C:\Users\小管同學(xué)\Desktop\Migration_Data_Compari\對(duì)比文件.xls" #input("選擇文件路徑:") TargetPath=r"C:\Users\小管同學(xué)\Desktop\Migration_Data_Comparison_Tool\目標(biāo)文件\對(duì)比結(jié)果.xlsx" DATA_ODS=pd.read_excel(r"C:\Users\小管同學(xué)\Desktop\Migration_Data_Comparison_Tool\對(duì)比文件.xls",sheet_name="ODS") DATA_DWH=pd.read_excel(r"C:\Users\小管同學(xué)\Desktop\Migration_Data_Comparison_Tool\對(duì)比文件.xls",sheet_name="DWH") #選擇主鍵 Primarykey="員工編號(hào)"#input("選擇主鍵1:") Primarykey # 員工編號(hào)
一、數(shù)據(jù)量
輸出表格1–數(shù)據(jù)量
def write_to_excel_DataVolume(Data,TargetPath): # cor_df 為要保存的 dataframe writer = pd.ExcelWriter(TargetPath, engine='xlsxwriter') # 這里用 Data.to_excel(writer,sheet_name='Sheet1', encoding='utf8', header=False, startcol=0, startrow=2) # 把dataframe的數(shù)據(jù)從第2行開(kāi)始 workbook = writer.book format1 = workbook.add_format({ # 先把樣式打包,然后之后賦值即可 'bold': True, # 字體加粗 'text_wrap': True, # 是否自動(dòng)換行 'valign': 'bottom', #垂直對(duì)齊方式 'align': 'center', # 水平對(duì)齊方式 'fg_color': '#C5D9F1', # 單元格背景顏色 'border': 1,# 邊框 }) writer_sheet = writer.sheets['Sheet1'] # 設(shè)置寬度 writer_sheet.set_column("A:I", 16) writer_sheet.set_column('C:C',30) writer_sheet.merge_range(0,0,0,2,'對(duì)比結(jié)果',format1) writer_sheet.merge_range(4,2,4,0,'數(shù)據(jù)量差異',format1) writer_sheet.write(1,0,'',format1) writer_sheet.write(1,1,'ODS',format1) writer_sheet.write(1,2,'DWH',format1) writer.save() writer.close() DataFrame_DataVolume=pd.DataFrame([[DATA_ODS.shape[0]],[DATA_DWH.shape[0]]]).T DataFrame_DataVolume.columns =["ODS","DWH"] DataFrame_DataVolume.index=["數(shù)據(jù)量"] DataFrame_DataVolume #writeFileDataVolume(DataFrame_DataVolume,TargetPath) write_to_excel_DataVolume(DataFrame_DataVolume,TargetPath)
輸出表格2–數(shù)據(jù)量差異合同
if DATA_ODS.shape[0]==DATA_DWH.shape[0]: pass else: DATA_ODS_Primarykey=pd.DataFrame(DATA_ODS[Primarykey]) DATA_DWH_Primarykey=pd.DataFrame(DATA_DWH[Primarykey]) df_union = pd.concat([DATA_ODS_Primarykey,DATA_DWH_Primarykey]) # 實(shí)現(xiàn)1 df_diff_ODS = df_union.append(DATA_ODS_Primarykey).drop_duplicates(subset=df_union.columns.to_list(), keep=False) df_diff_DWH = df_union.append(DATA_DWH_Primarykey).drop_duplicates(subset=df_union.columns.to_list(), keep=False) #DWH多的合同 df_diff_ODS #DWH少的合同 df_diff_DWH df_diff_DWH_Data=[] df_diff_ODS_Data=[] for i in df_diff_ODS.head(10).values.tolist(): for n in i: df_diff_ODS_Data.append(n) for i in df_diff_DWH.head(10).values.tolist(): df_diff_DWH_Data.append(n) while True: if len(df_diff_DWH_Data)>len(df_diff_ODS_Data): df_diff_ODS_Data.append("-") elif len(df_diff_DWH_Data)< len(df_diff_ODS_Data): df_diff_DWH_Data.append("-") elif len(df_diff_DWH_Data)== len(df_diff_ODS_Data): break DataFrame_DataVolume_Count_result=pd.DataFrame(df_diff_DWH_Data,df_diff_ODS_Data).reset_index() DataFrame_DataVolume_Count_result.columns=['DWH多的合同','DWH少的的合同'] DataFrame_DataVolume_Count_result=DataFrame_DataVolume_Count_result.reset_index() DataFrame_DataVolume_Count_result.columns=['序號(hào)','DWH多的合同','DWH少的的合同'] DataFrame_DataVolume_Count_result from openpyxl import load_workbook def write_to_excel_Count_result(Data,TargetPath): df_Old = pd.DataFrame(pd.read_excel(TargetPath)) #讀取原數(shù)據(jù)文件和表 writer = pd.ExcelWriter(TargetPath,engine='openpyxl') book=load_workbook(TargetPath) writer.book = book writer.sheets = dict((ws.title, ws) for ws in book.worksheets) df_rows = df_Old.shape[0] #獲取原數(shù)據(jù)的行數(shù) Data.to_excel(writer,startrow=df_rows+1, index=False,startcol=0,header=True)#將數(shù)據(jù)寫(xiě)入excel中的aa表,從第一個(gè)空行開(kāi)始寫(xiě) writer.save()#保存 write_to_excel_Count_result(DataFrame_DataVolume_Count_result,TargetPath)
到此這篇關(guān)于基于Python實(shí)現(xiàn)對(duì)比Excel的小工具 【實(shí)現(xiàn)中】的文章就介紹到這了,更多相關(guān)Python對(duì)比Excel的小工具內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
python爬蟲(chóng)用request庫(kù)處理cookie的實(shí)例講解
在本篇內(nèi)容里小編給大家整理的是一篇關(guān)于python爬蟲(chóng)用request庫(kù)處理cookie的實(shí)例講解內(nèi)容,有需要的朋友們可以學(xué)習(xí)參考下。2021-02-02

Pyspark獲取并處理RDD數(shù)據(jù)代碼實(shí)例

Python入門(mén)教程(十六)Python的if邏輯判斷分支

2018年P(guān)ython值得關(guān)注的開(kāi)源庫(kù)、工具和開(kāi)發(fā)者(總結(jié)篇)

Python自定義一個(gè)類(lèi)實(shí)現(xiàn)字典dict功能的方法

Python中的yeild關(guān)鍵字提高代碼執(zhí)行效率場(chǎng)景實(shí)例探究

用python做一個(gè)搜索引擎(Pylucene)的實(shí)例代碼