快捷導(dǎo)航

pandas中的DataFrame數(shù)據(jù)遍歷解讀

更新時(shí)間：2022年12月13日 15:31:31 作者：大蝦飛哥哥

這篇文章主要介紹了pandas中的DataFrame數(shù)據(jù)遍歷解讀，具有很好的參考價(jià)值，希望對大家有所幫助。如有錯(cuò)誤或未考慮完全的地方，望不吝賜教

pandas DataFrame數(shù)據(jù)遍歷

讀取csv內(nèi)容，格式與數(shù)據(jù)類型如下

data = pd.read_csv('save\LH8888.csv')
print(type(data))
print(data)

輸出結(jié)果如下：

960664c.png)

按行遍歷數(shù)據(jù)：iterrows

獲取行名：名字、年齡、身高、體重

for i, line in data.iterrows():
	print(i)
    print(line)
    print(line['date'])

輸出結(jié)果如下：

i：是數(shù)據(jù)的索引，表示第幾行數(shù)據(jù)
line：是每一行的具體數(shù)據(jù)
line[‘date’]：通過字典的方式，能夠讀取數(shù)據(jù)

按行遍歷數(shù)據(jù)：itertuples

for line in data.itertuples():
    print(line)

輸出結(jié)果如下：

訪問date方式如下：

for line in data.itertuples():
    print(line)
    print(getattr(line, 'date'))
    print(line[1])

輸出結(jié)果如下：

按列遍歷數(shù)據(jù)：iteritems

for i, index in data.iteritems():
    print(index)

輸出結(jié)果如下，使用方式同iterrows。

讀取和修改某一個(gè)數(shù)據(jù)

例如：我們想要讀取行索引為：1，列索引為：volume的值 27，代碼如下：

iloc：需要輸入索引值，索引從0開始
loc：需要輸入對應(yīng)的行名和列名

print(data.iloc[1, 5])
print(data.loc[1, 'volume'])

例如：我們想要將行索引為：1，列索引為：volume的值 27 修改為10，代碼如下：

data.iloc[1, 5] = 10
print(data.loc[1, 'volume'])
print(data)

輸出結(jié)果如下：

遍歷dataframe中每一個(gè)數(shù)據(jù)

for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        print(data.iloc[i, j])

輸出結(jié)果如下，按行依次打印：

dataframe遍歷效率對比

構(gòu)建數(shù)據(jù)

import pandas as pd
import numpy as np

# 生成樣例數(shù)據(jù)
def gen_sample():
? ? aaa = np.random.uniform(1,1000,3000)
? ? bbb = np.random.uniform(1,1000,3000)
? ? ccc = np.random.uniform(1,1000,3000)
? ? ddd = np.random.uniform(1,1000,3000)
? ? return pd.DataFrame({'aaa':aaa,'bbb':bbb, 'ccc': ccc, 'ddd': ddd})

9種遍歷方法

# for + iloc 定位
def method0_sum(DF):
    for i in range(len(DF)):
        a = DF.iloc[i,0] + DF.iloc[i,1]

# for + iat 定位
def method1_sum(DF):
    for i in range(len(DF)):
        a = DF.iat[i,0] + DF.iat[i,1]

# pandas.DataFrame.iterrows() 迭代器
def method2_sum(DF):
    for index, rows in DF.iterrows():
        a = rows['aaa'] + rows['bbb']

# pandas.DataFrame.apply 迭代
def method3_sum(DF):
    a = DF.apply(lambda x: x.aaa + x.bbb, axis=1)

# pandas.DataFrame.apply 迭代 
def method4_sum(DF):
    a = DF[['aaa','bbb']].apply(lambda x: x.aaa + x.bbb, axis=1)
    
# 列表
def method5_sum(DF):
    a = [ a+b for a,b in zip(DF['aaa'],DF['bbb']) ]

# pandas  
def method6_sum(DF):
    a = DF['aaa'] + DF['bbb']

# numpy 
def method7_sum(DF):
    a = DF['aaa'].values + DF['bbb'].values
    
# for + itertuples
def method8_sum(DF):
    for row in DF.itertuples():
        a = getattr(row, 'aaa') + getattr(row, 'bbb')

效率對比

df = gen_sample()
print('for + iloc 定位:')
%timeit method0_sum(df)

df = gen_sample()
print('for + iat 定位:')
%timeit method1_sum(df)

df = gen_sample()
print('apply 迭代:')
%timeit method3_sum(df)

df = gen_sample()
print('apply 迭代 + 兩列:')
%timeit method4_sum(df)

df = gen_sample()
print('列表:')
%timeit method5_sum(df)

df = gen_sample()
print('pandas 數(shù)組操作:')
%timeit method6_sum(df)

df = gen_sample()
print('numpy 數(shù)組操作：')
%timeit method7_sum(df)

df = gen_sample()
print('for itertuples')
%timeit method8_sum(df)

df = gen_sample()
print('for iteritems')
%timeit method9_sum(df)

df = gen_sample()
print('for iterrows:')
%timeit method2_sum(df)

結(jié)果：

for + iloc 定位:
225 ms ± 9.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
for + iat 定位:
201 ms ± 6.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
apply 迭代:
88.3 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
apply 迭代 + 兩列:
91.2 ms ± 5.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
列表:
1.12 ms ± 54.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
pandas 數(shù)組操作:
262 µs ± 9.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
numpy 數(shù)組操作：
14.4 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
for itertuples
6.4 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
for iterrows:
330 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)