快捷導(dǎo)航

pandas數(shù)據(jù)探索之合并數(shù)據(jù)示例詳解

更新時(shí)間：2023年10月08日 11:53:36 作者：海貍大大

這篇文章主要為大家介紹了pandas數(shù)據(jù)探索之合并數(shù)據(jù)示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪

探索虛擬姓名數(shù)據(jù)
步驟1 導(dǎo)入必要的庫
步驟2 按照如下的元數(shù)據(jù)內(nèi)容創(chuàng)建數(shù)據(jù)框
步驟3 將上述的數(shù)據(jù)框分別命名為data1, data2, data3
步驟4 將data1和data2兩個(gè)數(shù)據(jù)框按照行的維度進(jìn)行合并
步驟5 將data1和data2兩個(gè)數(shù)據(jù)框按照列的維度進(jìn)行合并
步驟6 打印data3
步驟7 按照subject_id的值對all_data和data3作合并
步驟8 對data1和data2按照subject_id作連接
步驟9 找到 data1 和 data2 合并之后的所有匹配結(jié)果

總結(jié)

用pandas探索你的數(shù)據(jù)-合并數(shù)據(jù)

在數(shù)據(jù)處理和分析中，數(shù)據(jù)的合并是一項(xiàng)關(guān)鍵任務(wù)。Pandas 提供了豐富的工具來處理不同來源的數(shù)據(jù)，并將它們合并成一個(gè)更大的數(shù)據(jù)集。在這篇文章中，我們將深入探討 Pandas 中兩個(gè)重要的數(shù)據(jù)合并函數(shù)：pd.concat() 和 pd.merge()。

首先，我們將通過一系列的步驟和示例來學(xué)習(xí)如何使用這些函數(shù)。然后，我們將深入解釋每個(gè)函數(shù)的詳細(xì)用法，包括參數(shù)和常見的用例。無論您是數(shù)據(jù)科學(xué)家、數(shù)據(jù)分析師還是對數(shù)據(jù)處理感興趣的任何人，這篇文章都將為您提供處理和合并數(shù)據(jù)的實(shí)用技能。

探索虛擬姓名數(shù)據(jù)

步驟1 導(dǎo)入必要的庫

# 運(yùn)行以下代碼
import numpy as np
import pandas as pd

步驟2 按照如下的元數(shù)據(jù)內(nèi)容創(chuàng)建數(shù)據(jù)框

# 運(yùn)行以下代碼
raw_data_1 = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
raw_data_2 = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
raw_data_3 = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}

步驟3 將上述的數(shù)據(jù)框分別命名為data1, data2, data3

# 運(yùn)行以下代碼
data1 = pd.DataFrame(raw_data_1, columns = ['subject_id', 'first_name', 'last_name'])
data2 = pd.DataFrame(raw_data_2, columns = ['subject_id', 'first_name', 'last_name'])
data3 = pd.DataFrame(raw_data_3, columns = ['subject_id','test_id'])

步驟4 將data1和data2兩個(gè)數(shù)據(jù)框按照行的維度進(jìn)行合并

命名為all_data

# 運(yùn)行以下代碼
all_data = pd.concat([data1, data2])
all_data

style scoped

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}

	subject_id	first_name	last_name
0	1	Alex	Anderson
1	2	Amy	Ackerman
2	3	Allen	Ali
3	4	Alice	Aoni
4	5	Ayoung	Atiches
0	4	Billy	Bonder
1	5	Brian	Black
2	6	Bran	Balwner
3	7	Bryce	Brice
4	8	Betty	Btisan

步驟5 將data1和data2兩個(gè)數(shù)據(jù)框按照列的維度進(jìn)行合并

命名為all_data_col

# 運(yùn)行以下代碼
all_data_col = pd.concat([data1, data2], axis = 1)
all_data_col

style scoped

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}

	subject_id	first_name	last_name	subject_id	first_name	last_name
0	1	Alex	Anderson	4	Billy	Bonder
1	2	Amy	Ackerman	5	Brian	Black
2	3	Allen	Ali	6	Bran	Balwner
3	4	Alice	Aoni	7	Bryce	Brice
4	5	Ayoung	Atiches	8	Betty	Btisan

步驟6 打印data3

# 運(yùn)行以下代碼
data3

style scoped

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}

	subject_id	test_id
0	1	51
1	2	15
2	3	15
3	4	61
4	5	16
5	7	14
6	8	15
7	9	1
8	10	61
9	11	16

步驟7 按照subject_id的值對all_data和data3作合并

# 運(yùn)行以下代碼
pd.merge(all_data, data3, on='subject_id')

style scoped

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}

	subject_id	first_name	last_name	test_id
0	1	Alex	Anderson	51
1	2	Amy	Ackerman	15
2	3	Allen	Ali	15
3	4	Alice	Aoni	61
4	4	Billy	Bonder	61
5	5	Ayoung	Atiches	16
6	5	Brian	Black	16
7	7	Bryce	Brice	14
8	8	Betty	Btisan	15

步驟8 對data1和data2按照subject_id作連接

# 運(yùn)行以下代碼
pd.merge(data1, data2, on='subject_id', how='inner')

style scoped

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}

	subject_id	first_name_x	last_name_x	first_name_y	last_name_y
0	4	Alice	Aoni	Billy	Bonder
1	5	Ayoung	Atiches	Brian	Black

步驟9 找到 data1 和 data2 合并之后的所有匹配結(jié)果

# 運(yùn)行以下代碼
pd.merge(data1, data2, on='subject_id', how='outer')

style scoped

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}
.dataframe tbody tr th {
    vertical-align: top;
}
.dataframe thead th {
    text-align: right;
}

	subject_id	first_name_x	last_name_x	first_name_y	last_name_y
0	1	Alex	Anderson	NaN	NaN
1	2	Amy	Ackerman	NaN	NaN
2	3	Allen	Ali	NaN	NaN
3	4	Alice	Aoni	Billy	Bonder
4	5	Ayoung	Atiches	Brian	Black
5	6	NaN	NaN	Bran	Balwner
6	7	NaN	NaN	Bryce	Brice
7	8	NaN	NaN	Betty	Btisan

總結(jié)

在本練習(xí)中，我們使用Pandas進(jìn)行了合并操作，主要涉及以下要點(diǎn)：

使用pd.concat函數(shù)可以按行維度合并兩個(gè)數(shù)據(jù)框。例如，將data1和data2合并為all_data，使用pd.concat([data1, data2])。
使用pd.concat函數(shù)的axis參數(shù)可以按列維度合并兩個(gè)數(shù)據(jù)框。例如，將data1和data2按列維度合并為all_data_col，使用pd.concat([data1, data2], axis=1)。
使用pd.merge函數(shù)可以按照指定的列（如subject_id）對兩個(gè)數(shù)據(jù)框進(jìn)行合并。例如，按照subject_id對all_data和data3合并，使用pd.merge(all_data, data3, on='subject_id')。
在合并操作中，可以使用how參數(shù)指定合并的方式，包括inner（內(nèi)連接，保留兩個(gè)數(shù)據(jù)框的交集）、outer（外連接，保留兩個(gè)數(shù)據(jù)框的并集）等。
合并操作可以幫助我們根據(jù)共享的列值將不同數(shù)據(jù)框中的信息整合在一起，從而進(jìn)行更復(fù)雜的數(shù)據(jù)分析和處理。

pd.concat() 是 Pandas 中用于合并數(shù)據(jù)的函數(shù)之一，它通常用于按行或列方向?qū)⒍鄠€(gè)數(shù)據(jù)框連接在一起。以下是對 pd.concat() 函數(shù)的詳細(xì)解釋：

pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

參數(shù)說明：

objs：要合并的對象，通常是一個(gè)包含多個(gè)數(shù)據(jù)框的列表或元組。
axis：指定合并的方向，可以是 0（默認(rèn)，按行方向）或 1（按列方向）。
join：指定合并時(shí)的連接方式，可以是 'outer'（默認(rèn)，取并集）或 'inner'（取交集）。
ignore_index：如果為 True，則在合并時(shí)重置索引，默認(rèn)為 False，保留原始索引。
keys：創(chuàng)建一個(gè)層次化索引，用于標(biāo)識每個(gè)原始數(shù)據(jù)框的來源。
levels：指定多層索引的級別名稱。
names：為多層索引的級別指定名稱。
verify_integrity：如果為 True，則檢查合并后的數(shù)據(jù)是否唯一，如果有重復(fù)的索引，將引發(fā)異常，默認(rèn)為 False。
sort：如果為 True，則對合并后的數(shù)據(jù)進(jìn)行排序，默認(rèn)為 False。
copy：如果為 True，則復(fù)制數(shù)據(jù)而不修改原始對象，默認(rèn)為 True。

pd.concat() 返回一個(gè)合并后的新數(shù)據(jù)框，不會(huì)修改原始數(shù)據(jù)框。

使用示例：

合并兩個(gè)數(shù)據(jù)框按行方向（默認(rèn)方式）：

result = pd.concat([df1, df2])

合并兩個(gè)數(shù)據(jù)框按列方向：

result = pd.concat([df1, df2], axis=1)

創(chuàng)建多層索引：

result = pd.concat([df1, df2], keys=['df1', 'df2'])

重置索引：

result = pd.concat([df1, df2], ignore_index=True)

pd.concat() 是一個(gè)非常有用的函數(shù)，用于在數(shù)據(jù)處理中將多個(gè)數(shù)據(jù)框合并在一起，以便進(jìn)行分析和操作。

pd.merge() 是 Pandas 中用于合并數(shù)據(jù)的函數(shù)之一，它通常用于將兩個(gè)數(shù)據(jù)框（DataFrame）按照指定的列或索引進(jìn)行連接操作。以下是對 pd.merge() 函數(shù)的詳細(xì)解釋：

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

參數(shù)說明：

left：左側(cè)的數(shù)據(jù)框（DataFrame）。
right：右側(cè)的數(shù)據(jù)框（DataFrame）。
how：連接方式，可選值有 'left'（左連接，默認(rèn)），'right'（右連接），'outer'（外連接），'inner'（內(nèi)連接）。
on：連接列名，如果左右兩側(cè)的數(shù)據(jù)框都有相同列名，可以使用這個(gè)參數(shù)指定列名進(jìn)行連接。
left_on：左側(cè)數(shù)據(jù)框的連接列名，用于指定左側(cè)數(shù)據(jù)框的連接列。
right_on：右側(cè)數(shù)據(jù)框的連接列名，用于指定右側(cè)數(shù)據(jù)框的連接列。
left_index：如果為 True，則使用左側(cè)數(shù)據(jù)框的索引進(jìn)行連接。
right_index：如果為 True，則使用右側(cè)數(shù)據(jù)框的索引進(jìn)行連接。
sort：如果為 True，則在連接之前對數(shù)據(jù)進(jìn)行排序，默認(rèn)為 False。
suffixes：如果左右兩側(cè)數(shù)據(jù)框有相同列名，可以使用 suffixes 參數(shù)添加后綴以區(qū)分這些列，默認(rèn)為 ('_x', '_y')。
copy：如果為 True，則復(fù)制數(shù)據(jù)而不修改原始對象，默認(rèn)為 True。
indicator：如果為 True，則在結(jié)果中添加一個(gè)特殊的列 _merge，用于表示每行的合并方式，默認(rèn)為 False。
validate：用于驗(yàn)證連接操作的有效性，可選值有 'one_to_one'，'one_to_many'，'many_to_one'，'many_to_many'。

pd.merge() 返回一個(gè)合并后的新數(shù)據(jù)框，不會(huì)修改原始數(shù)據(jù)框。

使用示例：

內(nèi)連接兩個(gè)數(shù)據(jù)框，使用相同列名連接：

result = pd.merge(left_df, right_df, on='key_column', how='inner')

左連接兩個(gè)數(shù)據(jù)框，指定左側(cè)數(shù)據(jù)框的連接列和右側(cè)數(shù)據(jù)框的連接列：

result = pd.merge(left_df, right_df, left_on='left_key', right_on='right_key', how='left')

連接時(shí)使用左側(cè)數(shù)據(jù)框的索引：

result = pd.merge(left_df, right_df, left_index=True, right_on='key_column', how='inner')

添加后綴以區(qū)分相同列名的列：

result = pd.merge(left_df, right_df, on='key_column', suffixes=('_left', '_right'))

pd.merge() 是一個(gè)強(qiáng)大的數(shù)據(jù)連接工具，可用于合并不同來源的數(shù)據(jù)，進(jìn)行數(shù)據(jù)分析和處理。根據(jù)不同的連接需求，可以選擇不同的連接方式和參數(shù)。

以上就是pandas數(shù)據(jù)探索之合并數(shù)據(jù)示例詳解的詳細(xì)內(nèi)容，更多關(guān)于pandas合并數(shù)據(jù)的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

pandas數(shù)據(jù)探索之合并數(shù)據(jù)示例詳解

目錄

用pandas探索你的數(shù)據(jù)-合并數(shù)據(jù)

探索虛擬姓名數(shù)據(jù)

步驟1 導(dǎo)入必要的庫

步驟2 按照如下的元數(shù)據(jù)內(nèi)容創(chuàng)建數(shù)據(jù)框

步驟3 將上述的數(shù)據(jù)框分別命名為data1, data2, data3

步驟4 將data1和data2兩個(gè)數(shù)據(jù)框按照行的維度進(jìn)行合并

步驟5 將data1和data2兩個(gè)數(shù)據(jù)框按照列的維度進(jìn)行合并

步驟6 打印data3

步驟7 按照subject_id的值對all_data和data3作合并

步驟8 對data1和data2按照subject_id作連接

步驟9 找到 data1 和 data2 合并之后的所有匹配結(jié)果

總結(jié)

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具