pandas之?dāng)?shù)據(jù)修改與基本運(yùn)算方式

更新時(shí)間：2024年02月19日 09:56:49 作者：螞蟻*漫步

這篇文章主要介紹了pandas之?dāng)?shù)據(jù)修改與基本運(yùn)算方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

1.數(shù)據(jù)復(fù)制

直接賦值

直接賦值的話，只是復(fù)制的元數(shù)據(jù)(行列索引)，但是元素還是存儲(chǔ)在相同內(nèi)存位置對(duì)元素進(jìn)行修改會(huì)影響另外一個(gè)。

import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(12).reshape(4,3),index=list("abcd"),columns=['w','y','z'])
print(df)
print(df.iloc[1,2])
df.iloc[1,2]=20
print(df.iloc[1,2])
 
 
 
out：
   w   y   z
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
5-------->>賦值之前
20-------->>賦值之后

copy()函數(shù)。

copy函數(shù)，復(fù)制原數(shù)據(jù)(行列索引)，還創(chuàng)建新的存儲(chǔ)位置對(duì)元素進(jìn)行修改不影響另外一個(gè)。

df=pd.DataFrame(np.arange(12).reshape(4,3),index=list("abcd"),columns=['w','y','z'])
print(df)
df1=df.copy()
print(df1.iloc[1,2])
df1.iloc[1,2]=20
print(df.iloc[1,2])
 
 
 
out：
   w   y   z
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
5
5

2.增加行和列

通過(guò)[]操作符+列名方式增加多列新增列在最后 df[['new_column1','new_column2',...]] =

通過(guò)loc+列名新增一列，不能新增多列新增列在最后 pd.loc[:, 'new_column'] =

insert(loc, column, value, allow_duplicates=False)

loc位置參數(shù)：0 <= loc <= len(columns)
column：列名新增列在中間，一次只能增加一列

import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(12).reshape(4,3),index=list("abcd"),columns=['w','y','z'])
print(df)
df['n']=[3,7,9,11]
df[['x','k']]=df[['w','z']]
df.loc[:,'r']=[12,13,15,16]
df.insert(1,'t',[31,56,78,5])------>>增加在第一列
print(df)
 
 
 
out：
   w   y   z
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
   w   t   y   z   n  x   k   r
a  0  31   1   2   3  0   2  12
b  3  56   4   5   7  3   5  13
c  6  78   7   8   9  6   8  15
d  9   5  10  11  11  9  11  16

增加列。

新增行在最后通過(guò)loc函數(shù)新增一行，不能新增多行 pd.loc['new-index'] =

import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(12).reshape(4,3),index=list("abcd"),columns=['w','y','z'])
print(df)
df.loc['r']=[12,13,15]
print(df)
 
 
 
out：
   w   y   z
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
    w   y   z
a   0   1   2
b   3   4   5
c   6   7   8
d   9  10  11
r  12  13  15

3.行列刪除

Del 只能刪除一列，語(yǔ)法：del df['column-name']

pd.drop() 可以刪除多列

pd.drop(labels,axis=1, inplace=False)

labels：行列名稱(chēng)列表
axis：0表示刪除行(默認(rèn))，1表示刪除列 ,
inplace：False表示源DataFrame不變（默認(rèn)，True表示原DataFrame改變

pd.pop() 只能刪除一列并把刪除的一列賦值給新的對(duì)象。

import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(40).reshape(5,8),index=list("abcde"),columns=['w','y','z',
                                                                     'l','m','n','o','p'])
print(df)
del df['w']
df.drop(labels=['y','l'],axis=1,inplace=True)
print(df)
data=df.pop('n')
print(df)
print(data)
 
 
out：
    w   y   z   l   m   n   o   p
a   0   1   2   3   4   5   6   7
b   8   9  10  11  12  13  14  15
c  16  17  18  19  20  21  22  23
d  24  25  26  27  28  29  30  31
e  32  33  34  35  36  37  38  39
    z   m   n   o   p
a   2   4   5   6   7
b  10  12  13  14  15
c  18  20  21  22  23
d  26  28  29  30  31
e  34  36  37  38  39
    z   m   o   p
a   2   4   6   7
b  10  12  14  15
c  18  20  22  23
d  26  28  30  31
e  34  36  38  39
a     5
b    13
c    21
d    29
e    37
Name: n, dtype: int32

重復(fù)值刪除。

重復(fù)值查看 duplicated(subset=None, keep='first’) ，Subset 是否只需要檢查某幾列

KeepFirst：支持從前向后，將后出現(xiàn)的相同行判斷為重復(fù)值
Last：和從后向前

重復(fù)值刪除

drop_duplicates(subset=None, keep=’first’, inplace=False）

import pandas as pd
import numpy as np
data=pd.DataFrame({'qu1':[1,3,4,3,4],
                   'qu2':[1,3,4,3,4],
                   'qu3':[1,3,2,3,3]})
print(data)
print(data.duplicated(keep='first'))
print(data.duplicated(keep='last'))
print(data.drop_duplicates())
 
 
 
out:
   qu1  qu2  qu3
0    1    1    1
1    3    3    3
2    4    4    2
3    3    3    3
4    4    4    3
0    False
1    False
2    False
3     True
4    False
dtype: bool
0    False
1     True
2    False
3    False
4    False
dtype: bool
   qu1  qu2  qu3
0    1    1    1
1    3    3    3
2    4    4    2
4    4    4    3

4.改變索引

索引的不可變性，不能對(duì)索引的某個(gè)值直接進(jìn)行修改。

整體重命名 pd.index =， pd.columns =

import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(40).reshape(5,8),index=list("abcde"),columns=['w','y','z',
                                                                     'l','m','n','o','p'])
print(df.index)
print(df.columns)
df.index='new_'+df.index
df.columns='new'+df.columns
print(df.index)
print(df.columns)
 
 
out：
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Index(['w', 'y', 'z', 'l', 'm', 'n', 'o', 'p'], dtype='object')
Index(['new_a', 'new_b', 'new_c', 'new_d', 'new_e'], dtype='object')
Index(['neww', 'newy', 'newz', 'newl', 'newm', 'newn', 'newo', 'newp'], dtype='object')

行列同時(shí)修改

rename(index=None, columns=None, **kwargs)

index：修改行索引名稱(chēng)，dict示例為{‘oldname’:‘newname’, ...}
columns：修改列索引名稱(chēng)
inplace : boolean, default False(生成新對(duì)象)
copy：inplace為False時(shí)生效，表示是否為新對(duì)象創(chuàng)建新的存儲(chǔ)位置，否則只是生成元數(shù)據(jù)(行列索引)

 
import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(40).reshape(5,8),index=list("abcde"),columns=['w','y','z',
                                                                     'l','m','n','o','p'])
print(df.index)
print(df.columns)
df.rename(index={'a':'a1','b':'b1'},columns={'w':'w1','l':'l1'},inplace=True)
print(df.index)
print(df.columns)
 
 
 
out：
 
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Index(['w', 'y', 'z', 'l', 'm', 'n', 'o', 'p'], dtype='object')
Index(['a1', 'b1', 'c', 'd', 'e'], dtype='object')
Index(['w1', 'y', 'z', 'l1', 'm', 'n', 'o', 'p'], dtype='object')

同時(shí)調(diào)整行或者列

reindex(index=None, columns=None, **kwargs)

index：調(diào)整后的行索引名稱(chēng)列表
columns：調(diào)整后的列索引名稱(chēng)列表
fill_value：出現(xiàn)新的索引時(shí)默認(rèn)的，method = ffill ffill/pad 前向填充，bfill/backfill 后向填充。

# series reindex
data1 = pd.Series(np.arange(4), index=list('ABCD'))
print(s1)
'''
A    1
B    2
C    3
D    4
dtype: int64
'''
 
 
# 重新指定 index， 多出來(lái)的index，可以使用fill_value 填充
print(s1.reindex(index=['A', 'B', 'C', 'D', 'E'], fill_value = 10))
'''
A     1
B     2
C     3
D     4
E    10
dtype: int64
'''
 
s2 = Series(['A', 'B', 'C'], index = [1, 5, 10])
print(s2)
'''
1     A
5     B
10    C
dtype: object
'''
 
# 修改索引，
# 將s2的索引增加到15個(gè)
# 如果新增加的索引值不存在，默認(rèn)為 Nan
print(s2.reindex(index=range(15)))
'''
0     NaN
1       A
2     NaN
3     NaN
4     NaN
5       B
6     NaN
7     NaN
8     NaN
9     NaN
10      C
11    NaN
12    NaN
13    NaN
14    NaN
dtype: object
'''
 
# ffill ： foreaward fill 向前填充，
# 如果新增加索引的值不存在，那么按照前一個(gè)非nan的值填充進(jìn)去
print(s2.reindex(index=range(15), method='ffill'))
'''
0     NaN
1       A
2       A
3       A
4       A
5       B
6       B
7       B
8       B
9       B
10      C
11      C
12      C
13      C
14      C
dtype: object
'''
 
# reindex dataframe
df1 = DataFrame(np.random.rand(25).reshape([5, 5]), index=['A', 'B', 'D', 'E', 'F'], columns=['c1', 'c2', 'c3', 'c4', 'c5'])
print(df1)
'''
         c1        c2        c3        c4        c5
A  0.700437  0.844187  0.676514  0.727858  0.951458
B  0.012703  0.413588  0.048813  0.099929  0.508066
D  0.200248  0.744154  0.192892  0.700845  0.293228
E  0.774479  0.005109  0.112858  0.110954  0.247668
F  0.023236  0.727321  0.340035  0.197503  0.909180
'''
 
# 為 dataframe 添加一個(gè)新的索引
# 可以看到 自動(dòng) 擴(kuò)充為 nan
print(df1.reindex(index=['A', 'B', 'C', 'D', 'E', 'F']))
''' 自動(dòng)填充為 nan
         c1        c2        c3        c4        c5
A  0.700437  0.844187  0.676514  0.727858  0.951458
B  0.012703  0.413588  0.048813  0.099929  0.508066
C       NaN       NaN       NaN       NaN       NaN
D  0.200248  0.744154  0.192892  0.700845  0.293228
E  0.774479  0.005109  0.112858  0.110954  0.247668
F  0.023236  0.727321  0.340035  0.197503  0.909180
'''
 
#　擴(kuò)充列，　也是一樣的
print(df1.reindex(columns=['c1', 'c2', 'c3', 'c4', 'c5', 'c6']))
'''
         c1        c2        c3        c4        c5  c6
A  0.700437  0.844187  0.676514  0.727858  0.951458 NaN
B  0.012703  0.413588  0.048813  0.099929  0.508066 NaN
D  0.200248  0.744154  0.192892  0.700845  0.293228 NaN
E  0.774479  0.005109  0.112858  0.110954  0.247668 NaN
F  0.023236  0.727321  0.340035  0.197503  0.909180 NaN
'''
 
# 減小 index
print(s1.reindex(['A', 'B']))
''' 相當(dāng)于一個(gè)切割效果
A    1
B    2
dtype: int64
'''
 
print(df1.reindex(index=['A', 'B']))
''' 同樣是一個(gè)切片的效果
         c1        c2        c3        c4        c5
A  0.601977  0.619927  0.251234  0.305101  0.491200
B  0.244261  0.734863  0.569936  0.889996  0.017936
————————————————

5.數(shù)據(jù)排序

索引排序。

pd.sort_index(axis=1, ascending=False, inplace=True)

import pandas as pd
 
 
import numpy as np
 
df=pd.DataFrame(np.arange(9).reshape(3,3),index=list("acb"),columns=['w','m','z',])
print(df)
df.sort_index(axis=0,ascending=True,inplace=True)
print(df)
 
 
out：
   w  m  z
a  0  1  2
c  3  4  5
b  6  7  8
   w  m  z
a  0  1  2
b  6  7  8
c  3  4  5
 
df.sort_index(axis=1,ascending=True,inplace=True)
print(df)
 
 
out:
   m  w  z
a  1  0  2
c  4  3  5
b  7  6  8

列值排序。

pd.sort_values(by='b', ascending=False, inplace=True)

import pandas as pd
import numpy as np
data=pd.DataFrame({'qu1':[1,7,41,3,4],
                   'qu2':[1,9,4,37,4],
                   'qu3':[1,12,25,3,37]})
print(data)
data.sort_values(by='qu1',ascending=True,inplace=True)
print(data)
 
 
out：
   qu1  qu2  qu3
0    1    1    1
1    7    9   12
2   41    4   25
3    3   37    3
4    4    4   37
   qu1  qu2  qu3
0    1    1    1
3    3   37    3
4    4    4   37
1    7    9   12
2   41    4   25