Python批量生成Excel案例數(shù)據(jù)集的方法詳解

更新時間：2024年12月04日 08:45:08 作者：大話數(shù)據(jù)分析

在數(shù)據(jù)分析的世界里,數(shù)據(jù)是核心,而如何高效地生成和處理數(shù)據(jù)則成為每位數(shù)據(jù)分析師必備的技能之一,今天,我們要探討一個有趣的話題——“造數(shù)”,所以本文給大家介紹了Python辦公自動化,批量生成Excel案例數(shù)據(jù)集,需要的朋友可以參考下

但這里的“造數(shù)”并非意味著編造數(shù)據(jù)，而是指在確保數(shù)據(jù)安全的前提下，模擬生成一些用于測試的數(shù)據(jù)。在眾多工具中，Faker庫以其強大的功能和易用性脫穎而出，成為數(shù)據(jù)分析師們手中的得力助手。

接下來，讓我們一起走進Faker庫的世界，看看它是如何幫助數(shù)據(jù)分析師們輕松“造數(shù)”的？

1.常規(guī)數(shù)據(jù)模擬

常規(guī)數(shù)據(jù)模擬，比如我們生成一組范圍在100到1000的31個數(shù)字，就可以使用一行代碼np.random.randint(100,1000,31)，如下使用隨機數(shù)字生成sale隨日期變化的折線圖。

import pandas as pd 
import numpy as np 
import datetime  
 
df=pd.DataFrame(data=np.random.randint(100,1000,31),
                 index=pd.date_range(datetime.datetime(2022,12,1),periods=31),
                 columns=['sale']).plot(figsize=(9,6))

2.Faker模擬數(shù)據(jù)

使用Faker模擬數(shù)據(jù)需要提前下載Faker庫，在命令行使用pip install Faker命令即可下載，當出現(xiàn)Successfully installed的字樣時表明庫已經(jīng)安裝完成。

!pip install Faker -i https://pypi.tuna.tsinghua.edu.cn/simple

導入Faker庫可以用來模擬生成數(shù)據(jù)，其中，locale="zh_CN"用來顯示中文，如下生成包含姓名、手機號、身份證號、出生年月日、郵箱、地址、公司、職位這幾個字段的數(shù)據(jù)。

#多行顯示運行結(jié)果 
from IPython.core.interactiveshell 
import InteractiveShell InteractiveShell.ast_node_interactivity = "all" 
from faker import Faker 
faker=Faker(locale="zh_CN")
 
#模擬生成數(shù)據(jù)  
faker.name() 
faker.phone_number() 
faker.ssn() 
faker.ssn()[6:14] 
faker.email() 
faker.address() 
faker.company() 
faker.job()

除了上面的字段，F(xiàn)aker庫還可以生成如下幾類常用的數(shù)據(jù)，地址類、人物類、公司類、信用卡類、時間日期類、文件類、互聯(lián)網(wǎng)類、工作類、亂數(shù)假文類、電話號碼類、身份證號類。

#address 地址 
faker.country()  # 國家 
faker.city()  # 城市 
faker.city_suffix()  # 城市的后綴,中文是：市或縣 
faker.address()  # 地址 
faker.street_address()  # 街道 
faker.street_name()  # 街道名 
faker.postcode()  # 郵編 
faker.latitude()  # 維度 
faker.longitude()  # 經(jīng)度

#person 人物 
faker.name() # 姓名 
faker.last_name() # 姓 
faker.first_name() # 名 
faker.name_male() # 男性姓名 
faker.last_name_male() # 男性姓 
faker.first_name_male() # 男性名 
faker.name_female() # 女性姓名

#company 公司 
faker.company() # 公司名 
faker.company_suffix() # 公司名后綴

#credit_card 銀行信用卡 
faker.credit_card_number(card_type=None) # 卡號

#date_time 時間日期 
faker.date_time(tzinfo=None) # 隨機日期時間 
faker.date_time_this_month(before_now=True, after_now=False, tzinfo=None) # 本月的某個日期 
faker.date_time_this_year(before_now=True, after_now=False, tzinfo=None) # 本年的某個日期 
faker.date_time_this_decade(before_now=True, after_now=False, tzinfo=None)  # 本年代內(nèi)的一個日期 
faker.date_time_this_century(before_now=True, after_now=False, tzinfo=None)  # 本世紀一個日期 
faker.date_time_between(start_date="-30y", end_date="now", tzinfo=None)  # 兩個時間間的一個隨機時間 
faker.time(pattern="%H:%M:%S") # 時間（可自定義格式） 
faker.date(pattern="%Y-%m-%d") # 隨機日期（可自定義格式）

#file 文件 
faker.file_name(category="image", extension="png") # 文件名（指定文件類型和后綴名） 
faker.file_name() # 隨機生成各類型文件 
faker.file_extension(category=None) # 文件后綴

#internet 互聯(lián)網(wǎng) 
faker.safe_email() # 安全郵箱 
faker.free_email() # 免費郵箱 
faker.company_email()  # 公司郵箱 
faker.email() # 郵箱

#job 工作 
faker.job()#工作職位

#lorem 亂數(shù)假文 
faker.text(max_nb_chars=200) # 隨機生成一篇文章 
faker.word() # 隨機單詞 
faker.words(nb=10)  # 隨機生成幾個字 
faker.sentence(nb_words=6, variable_nb_words=True)  # 隨機生成一個句子 
faker.sentences(nb=3) # 隨機生成幾個句子 
faker.paragraph(nb_sentences=3, variable_nb_sentences=True)  # 隨機生成一段文字(字符串) 
faker.paragraphs(nb=3)  # 隨機生成成幾段文字(列表)

#phone_number 
電話號碼 faker.phone_number() # 手機號碼 
faker.phonenumber_prefix() # 運營商號段，手機號碼前三位

#ssn 身份證 faker.ssn() # 隨機生成身份證號(18位)

3.模擬數(shù)據(jù)并導出Excel

使用Faker庫模擬一組數(shù)據(jù)，并導出到Excel中，包含姓名、手機號、身份證號、出生日期、郵箱、詳細地址等字段，先生成一個帶有表頭的空sheet表，使用Faker庫生成對應字段，并用append命令逐一添加至sheet表中，最后進行保存導出。

from faker import Faker 
from openpyxl import Workbook  
wb=Workbook()#生成workbook 和工作表 
sheet=wb.active  
 
title_list=["姓名","手機號","身份證號","出生日期","郵箱","詳細地址","公司名稱","從事行業(yè)"]#設置excel的表頭 
sheet.append(title_list)  
faker=Faker(locale="zh_CN")#模擬生成數(shù)據(jù) 
for i in range(100):
       sheet.append([faker.name(),#生成姓名
                      faker.phone_number(),#生成手機號
                      faker.ssn(), #生成身份證號
                      faker.ssn()[6:14],#出生日期
                      faker.email(), #生成郵箱
                      faker.address(), #生成詳細地址
                      faker.company(), #生成所在公司名稱
                      faker.job(), #生成從事行業(yè)
                     ])                      
 
wb.save(r'D:\系統(tǒng)桌面(勿刪)\Desktop\模擬數(shù)據(jù).xlsx')

4.添加數(shù)值型數(shù)據(jù)

如果要生成一些可計算的隨機數(shù)據(jù)，可以使用pandas、numpy這兩個庫，以上面的數(shù)據(jù)為例，添加可計算數(shù)據(jù)。

import pandas as pd   
import numpy as np    # 讀取Excel文件   
df = pd.read_excel(r'D:\系統(tǒng)桌面(勿刪)\Desktop\模擬數(shù)據(jù).xlsx')   
df.head()  # 設置隨機數(shù)種子以確保結(jié)果可重復   
 
np.random.seed(0)    # 隨機生成新字段的數(shù)據(jù)   
df['銷售數(shù)量'] = np.random.randint(1, 100, size=len(df))   
df['銷售單價'] = np.random.uniform(5, 50, size=len(df)).round(2)   
df['銷售收入'] = df['銷售數(shù)量'] * df['銷售單價']   
df['銷售成本'] = np.random.uniform(5, 50, size=len(df)).round(2)   
df['銷售利潤'] = df['銷售收入'] - df['銷售成本']   
df['利潤率'] = (df['銷售利潤'] / df['銷售收入']).round(4)    
 
df.to_excel(r'D:\系統(tǒng)桌面(勿刪)\Desktop\Faker模擬數(shù)據(jù).xlsx')

通過本節(jié)的分享，我們不難發(fā)現(xiàn)，F(xiàn)aker庫在數(shù)據(jù)分析中扮演著舉足輕重的角色。它不僅能夠模擬生成各種類型的數(shù)據(jù)，還能幫助我們快速構(gòu)建測試數(shù)據(jù)集，從而提高數(shù)據(jù)分析的效率和準確性。更重要的是，F(xiàn)aker庫的使用也非常簡單，只需幾行代碼，就能生成我們所需的數(shù)據(jù)。

以上就是Python批量生成Excel案例數(shù)據(jù)集的方法詳解的詳細內(nèi)容，更多關(guān)于Python生成Excel數(shù)據(jù)集的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: