pandas中關(guān)于nan的處理方式

更新時(shí)間：2024年02月02日 08:51:08 作者：我是小螞蟻

這篇文章主要介紹了pandas中關(guān)于nan的處理方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

pandas關(guān)于nan的處理

在pandas中有個(gè)另類的存在就是nan

解釋是

not a number，不是一個(gè)數(shù)字，但是它的類型確是一個(gè)float類型。

numpy中也存在關(guān)于nan的方法

如：np.nan

對(duì)于pandas中nan的處理，簡(jiǎn)單的說有以下幾個(gè)方法。

查看是否是nan， s1.isnull() 和 s1.notnull()
丟棄有nan的索引項(xiàng)，s1.dropna()
將nan填充為其他值，df2.fillna()

import numpy as np
import pandas as pd
from pandas import Series, DataFrame

n = np.nan
print(type(n)) # <class 'float'>

m = 1
print(n+m) # nan 任何數(shù)字和nan進(jìn)行計(jì)算，都是nan

# nan in series
s1 = Series([1, 2, np.nan, 3, 4], index=['A', 'B', 'C', 'D', 'E'])
print(s1)
'''
A    1.0
B    2.0
C    NaN
D    3.0
E    4.0
dtype: float64
'''

print(s1.isnull()) # 返回 bool值，是 nan 的話，返回true
'''
A    False
B    False
C     True
D    False
E    False
dtype: bool
'''

print(s1.notnull()) #　非 nan ，　返回ｔｒｕｅ
'''
A     True
B     True
C    False
D     True
E     True
dtype: bool
'''

# 去掉 有 nan 的索引項(xiàng)
print(s1.dropna())
'''
A    1.0
B    2.0
D    3.0
E    4.0
dtype: float64
'''

# nan in dataframe
df = DataFrame([[1, 2, 3], [np.nan, 5, 6], [7, np.nan, 9], [np.nan, np.nan, np.nan]])
print(df)
'''
     0    1    2
0  1.0  2.0  3.0
1  NaN  5.0  6.0
2  7.0  NaN  9.0
3  NaN  NaN  NaN
'''

print(df.isnull()) # df.notnull() 同理
'''
       0      1      2
0  False  False  False
1   True  False  False
2  False   True  False
3   True   True   True
'''

# 去掉　所有　有　nan 的　行,　axis = 0 表示　行方向
df1 = df.dropna(axis=0)
print(df1)
'''
     0    1    2
0  1.0  2.0  3.0
'''

# 表示在　列　的方向上。
df1 = df.dropna(axis=1)
print(df1)
'''
mpty DataFrame
Columns: []
Index: [0, 1, 2, 3]
'''

# any 只要有 nan 就會(huì)刪掉。 all　是必須全是nan才刪除
df1 = df.dropna(axis=0, how='any')
print(df1)
'''
     0    1    2
0  1.0  2.0  3.0
'''

# any 只要有 nan 就會(huì)刪掉。 all 全部是nan，才會(huì)刪除
df1 = df.dropna(axis=0, how='all')
print(df1)
'''
     0    1    2
0  1.0  2.0  3.0
1  NaN  5.0  6.0
2  7.0  NaN  9.0
'''

df2 = DataFrame([[1, 2, 3, np.nan], [2, np.nan, 5, 6], [np.nan, 7, np.nan, 9], [1, np.nan, np.nan, np.nan]])
print(df2)
'''
     0    1    2    3
0  1.0  2.0  3.0  NaN
1  2.0  NaN  5.0  6.0
2  NaN  7.0  NaN  9.0
3  1.0  NaN  NaN  NaN
'''

print(df2.dropna(thresh=None))
'''
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
'''

print(df2.dropna(thresh=2)) #  thresh 表示一個(gè)范圍，如：每一行的nan > 2，就刪除
'''
     0    1    2    3
0  1.0  2.0  3.0  NaN
1  2.0  NaN  5.0  6.0
2  NaN  7.0  NaN  9.0
'''

# 將nan進(jìn)行填充
print(df2.fillna(value=1))
'''
     0    1    2    3
0  1.0  2.0  3.0  1.0
1  2.0  1.0  5.0  6.0
2  1.0  7.0  1.0  9.0
3  1.0  1.0  1.0  1.0
'''

# 可以 為指定列 填充不同的 數(shù)值
print(df2.fillna(value={0: 0, 1: 1, 2: 2, 3: 3})) # 指定每一列 填充的數(shù)值
'''
     0    1    2    3
0  1.0  2.0  3.0  3.0
1  2.0  1.0  5.0  6.0
2  0.0  7.0  2.0  9.0
3  1.0  1.0  2.0  3.0
'''


# 以下兩個(gè)例子需要說明的是：對(duì)dataframe進(jìn)行dropna，原來的dataframe不會(huì)改變
print(df1.dropna())
'''
     0    1    2
0  1.0  2.0  3.0
'''
print(df1)
'''
     0    1    2
0  1.0  2.0  3.0
1  NaN  5.0  6.0
2  7.0  NaN  9.0
'''