腳本之家服務(wù)器常用軟件

快捷導(dǎo)航

軟件下載

android MAC 驅(qū)動(dòng)下載字體下載 DLL

源碼下載

PHP ASP.NET ASP JSP

軟件編程

C# JAVA C 語言 Delphi Android

網(wǎng)絡(luò)編程

PHP ASP.NET ASP JavaScript

在線工具

CSS格式化 JS格式化 Html轉(zhuǎn)化為Js

數(shù)據(jù)庫

MYSQL MSSQL oracle DB2 MARIADB

CMS

PHPCMS DEDECMS 帝國CMS WordPress

常用工具

PHP開發(fā)工具 python Photoshop 必備軟件

python Dataframe 合并與去重詳情

更新時(shí)間：2022年08月09日 15:19:22 作者：Coderusher???????

這篇文章主要介紹了python Dataframe 合并與去重詳情，文章圍繞主題展開詳細(xì)的內(nèi)容介紹，具有一定參考價(jià)值，需要的朋友可以參考一下

1.合并

1.1 結(jié)構(gòu)合并

將兩個(gè)結(jié)構(gòu)相同的數(shù)據(jù)合并

1.1.1 concat函數(shù)

函數(shù)配置：

concat([dataFrame1, dataFrame2,…], index_ingore=False)

參數(shù)說明：index_ingore=False（表示合并的索引不延續(xù)），index_ingore=True（表示合并的索引可延續(xù)）

實(shí)例：

import pandas as pd
import numpy as np

# 創(chuàng)建一個(gè)十行兩列的二維數(shù)據(jù)
df = pd.DataFrame(np.random.randint(0, 10, (3, 2)), columns=['A', 'B'])

# 將數(shù)據(jù)拆分成兩份，并保存在列表中
data_list = [df[0:2], df[3:]]

# 索引值不延續(xù) 
df1 = pd.concat(data_list, ignore_index=False)

# 索引值延續(xù)
df2 = pd.concat(data_list, ignore_index=True)

返回結(jié)果：

----------------df--------------------------
A B
0 7 8
1 7 3
2 5 9
3 4 0
4 1 8
----------------df1--------------------------
A B
0 7 8
1 7 3
3 4 0# -------------->這里并沒有2出現(xiàn)，索引不連續(xù)
4 1 8
----------------df2--------------------------
A B
0 7 8
1 7 3
2 4 0
3 1 8

1.1.2 append函數(shù)

函數(shù)配置：

df.append(df1, index_ignore=True)

參數(shù)說明：index_ingore=False（表示索引不延續(xù)），index_ingore=True（表示索引延續(xù)）

實(shí)例：

import pandas as pd
import numpy as np

# 創(chuàng)建一個(gè)五行兩列的二維數(shù)組
df = pd.DataFrame(np.random.randint(0, 10, (5, 2)), columns=['A', 'B'])

# 創(chuàng)建要追加的數(shù)據(jù)
narry = np.random.randint(0, 10, (3, 2))
data_list = pd.DataFrame(narry, columns=['A', 'B'])

# 合并數(shù)據(jù)
df1 = df.append(data_list, ignore_index=True)

返回結(jié)果：

----------------df--------------------------
A B
0 5 6
1 1 2
2 5 3
3 1 8
4 1 2
----------------df1--------------------------
A B
0 5 6
1 1 2
2 5 3
3 1 8
4 1 2
5 8 1
6 3 5
7 1 1

1.2 字段合并

將同一個(gè)數(shù)據(jù)不同列合并

參數(shù)配置：

pd.merge( left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, )

參數(shù)說明：

參數(shù)	說明
how	連接方式：inner、left、right、outer，默認(rèn)為 inner
on	用于連接的列名
left_on	左表用于連接的列名
right_on	右表用于連接的列名
Left_index	是否使用左表的行索引作為連接鍵，默認(rèn)為False
Right_index	是否使用右表的行索引作為連接鍵，默認(rèn)為False
sort	默認(rèn)為False，將合并的數(shù)據(jù)進(jìn)行排序
copy	默認(rèn)為True?？偸菍?shù)據(jù)復(fù)制到數(shù)據(jù)結(jié)構(gòu)中，設(shè)置為False可以提高性能
suffixes	存在相同列名時(shí)在列名后面添加的后綴，默認(rèn)為（’_x’, ‘_y’）
indicator	顯示合并數(shù)據(jù)中數(shù)據(jù)來自哪個(gè)表

實(shí)例1：

import pandas as pd
 
df1 = pd.DataFrame({'key':['a','b','c'], 'data1':range(3)})
df2 = pd.DataFrame({'key':['a','b','c'], 'data2':range(3)})
df = pd.merge(df1, df2) # 合并時(shí)默認(rèn)以重復(fù)列并作為合并依據(jù)

結(jié)果展示：

----------------df1--------------------------
key data1
0 a 0
1 b 1
2 c 2
----------------df2--------------------------
key data2
0 a 0
1 b 1
2 c 2
----------------df---------------------------
key data1 data2
0 a 0 0
1 b 1 1
2 c 2 2

實(shí)例2：

# 多鍵連接時(shí)將連接鍵組成列表傳入
 
right=DataFrame({'key1':['foo','foo','bar','bar'],  
         'key2':['one','one','one','two'],  
         'lval':[4,5,6,7]})  
 
left=DataFrame({'key1':['foo','foo','bar'],  
         'key2':['one','two','one'],  
         'lval':[1,2,3]})  
  
pd.merge(left,right,on=['key1','key2'],how='outer')

結(jié)果展示：

----------------right-------------------------
key1 key2 lval
0 foo one 4
1 foo one 5
2 bar one 6
3 bar two 7
----------------left--------------------------
key1 key2 lval
0 foo one 1
1 foo two 2
2 bar one 3
----------------df---------------------------
key1 key2 lval_x lval_y
0 foo one 1.0 4.0
1 foo one 1.0 5.0
2 foo two 2.0 NaN
3 bar one 3.0 6.0
4 bar two NaN 7.0

2.去重

參數(shù)配置：

data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)

參數(shù)說明：

參數(shù)	說明
subset	列名，可選，默認(rèn)為None
keep	{‘first’, ‘last’, False}, 默認(rèn)值 ‘first’
first	保留第一次出現(xiàn)的重復(fù)行，刪除后面的重復(fù)行
last	刪除重復(fù)項(xiàng)，除了最后一次出現(xiàn)
False	刪除所有重復(fù)項(xiàng)
inplace	布爾值，默認(rèn)為False，是否直接在原數(shù)據(jù)上刪除重復(fù)項(xiàng)或刪除重復(fù)項(xiàng)后返回副本。（inplace=True表示直接在原來的DataFrame上刪除重復(fù)項(xiàng)，而默認(rèn)值False表示生成一個(gè)副本。）

實(shí)例：

去除完全重復(fù)的行數(shù)據(jù)

data.drop_duplicates(inplace=True)

df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})

df.drop_duplicates()

結(jié)果展示：

---------------去重前的df---------------------------
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
---------------去重后的df---------------------------
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

使用subset 去除某幾列重復(fù)的行數(shù)據(jù)

data.drop_duplicates(subset=[‘A’,‘B’],keep=‘first’,inplace=True)

df.drop_duplicates(subset=['brand'])

結(jié)果展示：

brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5

使用 keep刪除重復(fù)項(xiàng)并保留最后一次出現(xiàn)

df.drop_duplicates(subset=['brand', 'style'], keep='last')

結(jié)果展示：

brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0

到此這篇關(guān)于python Dataframe 合并與去重詳情的文章就介紹到這了,更多相關(guān)python Dataframe內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

python Dataframe 合并與去重詳情

目錄

1.合并

1.1 結(jié)構(gòu)合并

1.1.1 concat函數(shù)

1.1.2 append函數(shù)

1.2 字段合并

2.去重

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具