使用Python實現(xiàn)將多表分批次從數(shù)據(jù)庫導(dǎo)出到Excel

更新時間：2020年05月15日 15:24:12 作者：幸福丶如此

這篇文章主要介紹了使用Python實現(xiàn)將多表分批次從數(shù)據(jù)庫導(dǎo)出到Excel，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧

一、應(yīng)用場景

為了避免反復(fù)的手手工從后臺數(shù)據(jù)庫導(dǎo)出某些數(shù)據(jù)表到Excel文件、高效率到多份離線數(shù)據(jù)。

二、功能事項

支持一次性導(dǎo)出多個數(shù)據(jù)源表、自動獲取各表的字段名。

支持控制批次的寫入速率。例如：每5000行一個批次寫入到excel。

支持結(jié)構(gòu)相同的表導(dǎo)入到同一個Excel文件?？蛇m用于經(jīng)過水平切分后的分布式表。

三、主要實現(xiàn)

1、概覽

A[創(chuàng)建類] -->|方法1| B(創(chuàng)建數(shù)據(jù)庫連接)
A[創(chuàng)建類] -->|方法2| C(取查詢結(jié)果集)
A[創(chuàng)建類] -->|方法3| D(利用句柄寫入Excel)
A[創(chuàng)建類] -->|方法4| E(讀取多個源表)

B(創(chuàng)建數(shù)據(jù)庫連接) -->U(調(diào)用示例)
C(取查詢結(jié)果集) -->U(調(diào)用示例)
D(利用句柄寫入Excel) -->U(調(diào)用示例)
E(讀取多個源表) -->U(調(diào)用示例)

2、主要方法

首先需要安裝第三方庫pymssql實現(xiàn)對SQLServer的連接訪問，自定義方法__getConn()需要指定如下五個參數(shù)：服務(wù)器host、登錄用戶名user、登錄密碼pwd、指定的數(shù)據(jù)庫db、字符編碼charset。連接成功后，通過cursor()獲取游標(biāo)對象，它將用來執(zhí)行數(shù)據(jù)庫腳本，并得到返回結(jié)果集和數(shù)據(jù)總量。

創(chuàng)建數(shù)據(jù)庫連接和執(zhí)行SQL的源碼：

def __init__(self,host,user,pwd,db):
    self.host = host
    self.user = user
    self.pwd = pwd
    self.db = db

  def __getConn(self):
    if not self.db:
      raise(NameError,'沒有設(shè)置數(shù)據(jù)庫信息')
    self.conn = pymssql.connect(host=self.host, user=self.user, password=self.pwd, database=self.db, charset='utf8')
    cur = self.conn.cursor()
    if not cur:
      raise(NameError,'連接數(shù)據(jù)庫失敗')
    else:
      return cur

3、方法3中寫入Excel時，注意一定要用到Pandas中的公共句柄ExcelWriter對象writer。當(dāng)數(shù)據(jù)被分批多次寫入同一個文件時，如果直接使用to_excel()方法，則前面批次的結(jié)果集將會被后續(xù)結(jié)果覆蓋。增加了這個公共句柄限制后，后面的寫入會累加到前面寫入的數(shù)據(jù)尾部行，而不是全部覆蓋。

writer = pd.ExcelWriter(file)
df_fetch_data[rs_startrow:i*N].to_excel(writer, header=isHeader, index=False, startrow=startRow)

分批次寫入到目標(biāo)Excel時的另一個要注意的參數(shù)是寫入行startrow的設(shè)置。每次寫入完成后需要重新指下一批次數(shù)據(jù)的初始位置值。每個批次的數(shù)據(jù)會記錄各自的所屬批次信息。

利用關(guān)鍵字參數(shù)**args 指定多個數(shù)據(jù)源表和數(shù)據(jù)庫連接。

def exportToExcel(self, **args):
  for sourceTB in args['sourceTB']:    
    arc_dict = dict(
      sourceTB = sourceTB,
      path=args['path'],
      startRow=args['startRow'],
      isHeader=args['isHeader'],
      batch=args['batch']
    )
    print('\n當(dāng)前導(dǎo)出的數(shù)據(jù)表為：%s' %(sourceTB))
    self.writeToExcel(**arc_dict)
  return 'success'

四、先用類MSSQL創(chuàng)建對象，再定義關(guān)鍵字參數(shù)args，最終調(diào)用方法導(dǎo)出到文件即完成數(shù)據(jù)導(dǎo)出。

#!/usr/bin/env python
# coding: utf-8

# 主要功能：分批次導(dǎo)出大數(shù)據(jù)量、結(jié)構(gòu)相同的數(shù)據(jù)表到excel 
# 導(dǎo)出多個表的數(shù)據(jù)到各自的文件， 
# 目前問題：to_excel 雖然設(shè)置了分批寫入，但先前的數(shù)據(jù)會被下一次寫入覆蓋，
# 利用Pandas包中的ExcelWriter()方法增加一個公共句柄，在寫入新的數(shù)據(jù)之時保留原來寫入的數(shù)據(jù)，等到把所有的數(shù)據(jù)都寫進去之后關(guān)閉這個句柄
import pymssql 
import pandas as pd 
import datetime 
import math
 
class MSSQL(object):
  def __init__(self,host,user,pwd,db):
    self.host = host
    self.user = user
    self.pwd = pwd
    self.db = db
 
  def __getConn(self):
    if not self.db:
      raise(NameError,'沒有設(shè)置數(shù)據(jù)庫信息')
    self.conn = pymssql.connect(host=self.host, user=self.user, password=self.pwd, database=self.db, charset='utf8')
    cur = self.conn.cursor()
    if not cur:
      raise(NameError,'連接數(shù)據(jù)庫失敗')
    else:
      return cur
   
  def executeQuery(self,sql):
    cur = self.__getConn()
    cur.execute(sql)
    # 獲取所有數(shù)據(jù)集
    # fetchall()獲取結(jié)果集中的剩下的所有行
    # 如果數(shù)據(jù)量太大，是否需要分批插入 
    resList, rowcount = cur.fetchall(),cur.rowcount
    self.conn.close()
    return (resList, rowcount)
 
  # 導(dǎo)出單個數(shù)據(jù)表到excel 
  def writeToExcel(self,**args):
    sourceTB = args['sourceTB']
    columns = args.get('columns')
    path=args['path']
    fname=args.get('fname')
    startRow=args['startRow']
    isHeader=args['isHeader']
    N=args['batch']
     
    # 獲取指定源數(shù)據(jù)列
    if columns is None:
      columns_select = ' * '
    else:
      columns_select = ','.join(columns)
     
    if fname is None:
      fname=sourceTB+'_exportData.xlsx'
     
    file = path + fname
    # 增加一個公共句柄，寫入新數(shù)據(jù)時，保留原數(shù)據(jù) 
    writer = pd.ExcelWriter(file)
     
    sql_select = 'select '+ columns_select + ' from '+ sourceTB
    fetch_data, rowcount = self.executeQuery(sql_select)
    # print(rowcount)
     
    df_fetch_data = pd.DataFrame(fetch_data)
    # 一共有roucount行數(shù)據(jù)，每N行一個batch提交寫入到excel 
    times = math.floor(rowcount/N)
    i = 1
    rs_startrow = 0
    # 當(dāng)總數(shù)據(jù)量 > 每批插入的數(shù)據(jù)量時 
    print(i, times)
    is_while=0
    while i <= times:
      is_while = 1
      # 如果是首次，且指定輸入標(biāo)題，則有標(biāo)題
      if i==1:
        # isHeader = True
        startRow = 1
      else:
        # isHeader = False
        startRow+=N
      # 切片取指定的每個批次的數(shù)據(jù)行 ,前閉后開 
      # startrow: 寫入到目標(biāo)文件的起始行。0表示第1行，1表示第2行。。。
      df_fetch_data['batch'] = 'batch'+str(i)
      df_fetch_data[rs_startrow:i*N].to_excel(writer, header=isHeader, index=False, startrow=startRow)
      print('第',str(i),'次循環(huán)，取源數(shù)據(jù)第',rs_startrow,'行至',i*N,'行','寫入到第',startRow,'行')
      print('第',str(i),'次寫入數(shù)據(jù)為：',df_fetch_data[rs_startrow:i*N])
      # 重新指定源數(shù)據(jù)的讀取起始行
      rs_startrow =i * N
      i+=1
 
    # 寫入文件的開始行數(shù)
    # 當(dāng)沒有做任何循環(huán)時，仍然從第一行開始寫入
    if is_while == 0:
      startRow = startRow
    else:
      startRow+=N
    df_fetch_data['batch'] = 'batch'+str(i)
    print('第{0}次讀取數(shù)據(jù)，從第{1}行開始，寫入到第{2}行！'.format(str(i), str(rs_startrow), str(startRow)))
    print('第',str(i),'寫入數(shù)據(jù)為：',df_fetch_data[rs_startrow:i*N])
    df_fetch_data[rs_startrow:i*N].to_excel(writer, header=isHeader, index=False, startrow=startRow)
     
    # 注： 這里一定要saver()將數(shù)據(jù)從緩存寫入磁盤?。。。。。。。。。。。。。。。。。。。。?
    writer.save()
     
    start_time=datetime.datetime.now()
  # 導(dǎo)出結(jié)構(gòu)相同的多個表到同一樣excel
  def exportToExcel(self, **args):
    for sourceTB in args['sourceTB']:    
      arc_dict = dict(
        sourceTB = sourceTB,
        path=args['path'],
        startRow=args['startRow'],
        isHeader=args['isHeader'],
        batch=args['batch']
      )
      print('\n當(dāng)前導(dǎo)出的數(shù)據(jù)表為：%s' %(sourceTB))
      self.writeToExcel(**arc_dict)
       
    return 'success'
    start_time=datetime.datetime.now()
 
if __name__ == "__main__":
  ms = MSSQL(host="localhost",user="test",pwd="test",db="db_jun")
   
  args = dict(
   sourceTB = ['tb2', 'tb1'],# 待導(dǎo)出的表
   path='D:\\myPC\\Python\\',# 導(dǎo)出到指定路徑
   startRow=1,#設(shè)定寫入文件的首行，第2行為數(shù)據(jù)首行
   isHeader=False,# 是否包含源數(shù)據(jù)的標(biāo)題
   batch=5
  )
  # 導(dǎo)出多個文件
  ms.exportToExcel(**args)

以上這篇使用Python實現(xiàn)將多表分批次從數(shù)據(jù)庫導(dǎo)出到Excel就是小編分享給大家的全部內(nèi)容了，希望能給大家一個參考，也希望大家多多支持腳本之家。

您可能感興趣的文章: