Python?Pandas中的分組聚合操作詳解

更新時間：2023年11月16日 08:46:19 作者：懸崖上的金魚

Pandas是Python中用于數(shù)據(jù)分析的重要工具,提供了豐富的數(shù)據(jù)操作方法,本文將介紹?Pandas?中的數(shù)據(jù)分組方法以及不同的聚合操作,感興趣的小伙伴可以學習一下

Pandas 是 Python 中用于數(shù)據(jù)分析的重要工具，它提供了豐富的數(shù)據(jù)操作方法。在數(shù)據(jù)分析過程中，經(jīng)常需要對數(shù)據(jù)進行分組聚合操作。本文將介紹 Pandas 中的數(shù)據(jù)分組方法以及不同的聚合操作，并結合代碼示例進行說明。

完整Excel數(shù)據(jù)

讀取數(shù)據(jù)并進行簡單分組

首先，我們通過 Pandas 讀取 Excel 文件，并使用單個列進行分組，并應用聚合函數(shù)。示例代碼如下：

df1 = pd.read_excel('C:\\Users\\liuchunlin2\\Desktop\\數(shù)據(jù)1.xlsx')
df = df1.groupby('店鋪名稱', as_index=False).sum()
print(df)

多列分組及聚合函數(shù)應用

接著，我們演示了如何使用多個列進行分組，并應用聚合函數(shù)：

df2 = df1.groupby(['店鋪名稱','訂單號'], as_index=False).sum()
print(df2)

自定義聚合函數(shù)的應用

在這個示例中，我們定義了一個自定義聚合函數(shù) custom_agg，并將其應用在分組聚合操作中：

def custom_agg(x):
    return x.max() - x.min()

result = df1.groupby('店鋪名稱', as_index=False)['銷售數(shù)量'].agg(custom_agg)
print(result)

同時應用多個聚合函數(shù)

我們還可以同時應用多個聚合函數(shù)，示例如下：

df3 = df1.groupby('店鋪名稱', as_index=False).agg({'銷售數(shù)量': 'sum', '銷售金額': 'mean'})
print(df3)

迭代分組

Pandas 支持迭代分組的操作，通過以下示例可以看到迭代分組的效果：

for group, data in df1.groupby('店鋪名稱'):
    print(group)  # 分組的鍵值
    print(data)  # 所有屬于該分組的數(shù)據(jù)

條件過濾

根據(jù)條件過濾分組：

df4 = df1.groupby('店鋪名稱').filter(lambda x: x['銷售金額'].sum() > 300)
print(df4)

轉換分組及分組排序

最后，我們演示了分組數(shù)據(jù)的轉換以及分組排序的操作：

df1['NewColumn'] = df1.groupby('店鋪名稱')['銷售數(shù)量'].transform(lambda x:x.sum())
print(df1)

排序

df5 = df1.groupby('店鋪名稱').sum().sort_values('銷售數(shù)量', ascending=True)
print(df5)

以上就是關于 Pandas 分組聚合操作的詳細介紹，通過這些示例代碼和解釋，相信讀者對 Pandas 中的分組聚合操作有了更深入的理解。

總結：在數(shù)據(jù)分析中，對數(shù)據(jù)進行分組聚合是一項常見且重要的操作，Pandas 提供了豐富的功能來實現(xiàn)這一目的，包括單列分組、多列分組、自定義聚合函數(shù)、迭代分組、數(shù)據(jù)導出、條件過濾、分組轉換以及分組排序等操作，能夠滿足大部分數(shù)據(jù)分析需求。

完整代碼

import pandas as pd
import numpy as np

# 讀取兩個 Excel 文件
df1 = pd.read_excel('C:\\Users\\liuchunlin2\\Desktop\\數(shù)據(jù)1.xlsx')

#使用單個列進行分組，并應用聚合函數(shù)
df=df1.groupby('店鋪名稱', as_index=False).sum()
#df=df1.groupby('店鋪名稱', as_index=False).aggregate({'銷售數(shù)量': 'sum'})
print(df)

#使用多個列進行分組，并應用聚合函數(shù)：
df2=df1.groupby(['店鋪名稱','訂單號'], as_index=False).sum()
print(df2)

# 定義自定義聚合函數(shù)
def custom_agg(x):
    return x.max() - x.min()
# 使用自定義聚合函數(shù)對 'Column2' 進行聚合
result = df1.groupby('店鋪名稱', as_index=False)['銷售數(shù)量'].agg(custom_agg)
print(result)

# 同時應用多個聚合函數(shù)
df3=df1.groupby('店鋪名稱', as_index=False).agg({'銷售數(shù)量': 'sum', '銷售金額': 'mean'})
print(df3)

# 迭代分組
for group, data in df1.groupby('店鋪名稱'):
    print(group)  # 分組的鍵值
    print(data)  # 所有屬于該分組的數(shù)據(jù)

df3.to_excel('merged.xlsx', index=False)
print('這是一條數(shù)據(jù)分割線')

#根據(jù)條件過濾分組
df4=df1.groupby('店鋪名稱').filter(lambda x: x['銷售金額'].sum() > 300)
print(df4)

#轉換分組
df1['NewColumn'] = df1.groupby('店鋪名稱')['銷售數(shù)量'].transform(lambda x:x.sum())  # 對 'Column2' 在每個分組內(nèi)進行轉換操作
#df=df1.groupby('店鋪名稱', as_index=False)['銷售數(shù)量'].transform('sum')
print(df1)

#分組排序
df5=df1.groupby('店鋪名稱').sum().sort_values('銷售數(shù)量', ascending=True)  # ascending=True 升序 ascending=False 降序
print(df5)

到此這篇關于Python Pandas中的分組聚合操作詳解的文章就介紹到這了,更多相關Pandas分組聚合內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: