Pandas DataFrame進行數(shù)據(jù)拼接方法詳解

更新時間：2025年11月05日 15:24:41 作者：Humbunklung

這篇文章主要為大家詳細(xì)介紹了Pandas DataFrame進行數(shù)據(jù)拼接多種方法,文中的示例代碼講解詳細(xì),感興趣的小伙伴可以跟隨小編一起學(xué)習(xí)一下

1.concat()：沿軸拼接多個DataFrame

適用于結(jié)構(gòu)相似的數(shù)據(jù)集（相同列或相同索引），支持縱向（行）或橫向（列）拼接。

參數(shù)關(guān)鍵點

axis=0（默認(rèn)）：縱向拼接（增加行）；axis=1：橫向拼接（增加列）。
join='outer'：保留所有行列（缺值填NaN）；join='inner'：僅保留共有行列。
ignore_index=True：重置索引，避免重復(fù)。
keys：添加分層索引，標(biāo)記來源。

代碼示例

import pandas as pd

# 縱向拼接（行追加）
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
result_vertical = pd.concat([df1, df2], ignore_index=True)

結(jié)果如下：

	A	B
0	A0	B0
1	A1	B1
2	A2	B2
3	A3	B3

# 橫向拼接（列合并）
df3 = pd.DataFrame({'C': ['C0', 'C1'], 'D': ['D0', 'D1']}, index=[0,1])
result_horizontal = pd.concat([df1, df3], axis=1)

結(jié)果如下：

	A	B	C	D
0	A0	B0	C0	D0
1	A1	B1	C1	D1

2.merge()：基于鍵值合并（類似SQL JOIN）

適用于關(guān)聯(lián)不同結(jié)構(gòu)的數(shù)據(jù)集，通過共享列（鍵）連接。

參數(shù)關(guān)鍵點

how：連接方式（inner、left、right、outer）。
on：指定連接鍵列名；left_on/right_on：左右表鍵名不同時使用。
left_index/right_index=True：用索引作為連接鍵。

代碼示例

# 內(nèi)連接（保留共有鍵）
left = pd.DataFrame({'key': ['K0', 'K1'], 'A': ['A0', 'A1']})
right = pd.DataFrame({'key': ['K0', 'K2'], 'B': ['B0', 'B2']})
result_inner = pd.merge(left, right, on='key', how='inner')

結(jié)果如下：

	key	A	B
0	K0	A0	B0

# 外連接（保留所有鍵，缺值填NaN）
result_outer = pd.merge(left, right, on='key', how='outer')

result_outer

結(jié)果如下：

	key	A	B
0	K0	A0	B0
1	K1	A1	NaN
2	K2	NaN	B2

3.join()：基于索引快速連接

merge的簡化版，默認(rèn)按索引拼接，適合索引對齊的場景。

參數(shù)關(guān)鍵點

how：連接方式（默認(rèn)左連接）。
lsuffix/rsuffix：左右表列名沖突時添加后綴。

代碼示例

left_df = pd.DataFrame({'value': [1, 2]}, index=['A', 'B'])
right_df = pd.DataFrame({'value': [7, 8]}, index=['A', 'C'])
joined = left_df.join(right_df, how='inner', lsuffix='_left', rsuffix='_right')

joined

結(jié)果如下：

	value_left	value_right
A	1	7

4.combine_first()：填充缺失值

用第二個DataFrame的非空值補全第一個DataFrame的NaN，適合數(shù)據(jù)補全。

代碼示例

df1 = pd.DataFrame({'A': [1, None, 3], 'B': [4, 5, None]}, index=['X', 'Y', 'Z'])
df2 = pd.DataFrame({'A': [None, 10, 11], 'B': [7, 8, 9]}, index=['Y', 'Z', 'W'])
filled = df1.combine_first(df2)

filled

結(jié)果如下：

	A	B
W	11.0	9.0
X	1.0	4.0
Y	NaN	5.0
Z	3.0	8.0

性能與場景對比

方法	適用場景	性能優(yōu)勢
`concat()`	同構(gòu)數(shù)據(jù)批量拼接（行/列擴展）	????（高效批處理）
`merge()`	異構(gòu)數(shù)據(jù)關(guān)聯(lián)（類似SQL JOIN）	???（靈活但稍慢）
`join()`	索引對齊的快速合并	????（索引優(yōu)化）
`combine_first()`	缺失值填充（非拼接主導(dǎo)）	??

高效建議：

批量縱向拼接優(yōu)先用concat(ignore_index=True)；
關(guān)聯(lián)查詢用merge并明確指定on鍵；
避免已棄用的append()，用concat替代。

通過以上方法，可靈活應(yīng)對DataFrame拼接需求，平衡效率與功能。

到此這篇關(guān)于Pandas DataFrame進行數(shù)據(jù)拼接方法詳解的文章就介紹到這了,更多相關(guān)Pandas DataFrame數(shù)據(jù)拼接內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: