python常見統(tǒng)計(jì)分析處理函數(shù)解讀

更新時(shí)間：2024年07月19日 11:07:39 作者：DB_UP

這篇文章主要介紹了python常見統(tǒng)計(jì)分析處理函數(shù),具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

1、對(duì)數(shù)據(jù)框下的列進(jìn)行統(tǒng)計(jì)分析：apply+lambda函數(shù)
2、數(shù)據(jù)類型轉(zhuǎn)換(float—int)：map()函數(shù)
3、數(shù)據(jù)篩選filter()函數(shù)
4、np.where和query
5、replace替換（兩種：對(duì)每個(gè)記錄修改，記錄中數(shù)據(jù)修改）
6、reduce函數(shù)

三、行/列相關(guān)操作

1、刪除列/行操作
2、查找某列特征數(shù)據(jù)(字符型)
3、列重命名
4、包含某個(gè)特殊字符的列：contains()
5、篩選需要的行業(yè)數(shù)據(jù)：存在isin()/不存在
6、根據(jù)某幾列數(shù)據(jù)排序sort_values()
7、統(tǒng)計(jì)gram列每種語(yǔ)言出現(xiàn)的次數(shù)
8、將空值用上下值的平均值進(jìn)行填充
9、交換兩列的位置
10、新增一列根據(jù)salary將數(shù)據(jù)分為三組
11、將兩列合成一列
12、拆分某列,生成新的Dataframe
13、多列減去一列

四、數(shù)據(jù)框

1、對(duì)數(shù)據(jù)框按照分組進(jìn)行標(biāo)準(zhǔn)化
2、數(shù)據(jù)透視表
3、統(tǒng)計(jì)指標(biāo)0和空值數(shù)據(jù)
4、檢查數(shù)據(jù)中是否含有任何缺失值
5、數(shù)據(jù)向前/后移動(dòng)5天
6、按周采樣，取一周最大值
7、計(jì)算前一天與后一天變化率
8、將小數(shù)轉(zhuǎn)化為百分?jǐn)?shù)
9、列表生成式：新增數(shù)據(jù)列
10、新建excel寫入數(shù)據(jù)

五、時(shí)間數(shù)據(jù)

1、時(shí)間范圍區(qū)間選擇：date_range
2、將time列時(shí)間轉(zhuǎn)換為月-日(月、年月日)
忽略警告

總結(jié)

一、分組統(tǒng)計(jì)

1、分組后對(duì)不同指標(biāo)統(tǒng)計(jì)分析

df=data.groupby(['city','date']).agg({'tem':np.mean, "tem_max":np.max, "tem_min":np.min})
la=launch.groupby(['user_id','launch_day'],as_index=False).agg({'launch':'sum'})#as_index得到的表格就沒有使用group_id作為索引
_funcs = ['mean', 'std', 'sum']
##遍歷每一種統(tǒng)計(jì)指標(biāo)
for _func in _funcs:
    # 對(duì)每一個(gè)樣本計(jì)算各項(xiàng)指標(biāo)
    df[f'P2_C2-C5_{_func}'] = raw[['A10', 'A12', 'A15', 'A17']].agg(_func, axis=1)

2、數(shù)據(jù)框中多列同時(shí)乘以同一列值計(jì)算

def calculate_profit(load_price):
    """計(jì)算利潤(rùn)"""
    # 定義利潤(rùn)計(jì)算函數(shù)
    # 計(jì)算各個(gè)季節(jié)的利潤(rùn)
    for season in ['春季', '夏季', '秋季', '冬季']:
        column_name = f'{season}充放電負(fù)荷'
        profit_column_name = f'{season}利潤(rùn)'
        load_price[profit_column_name] = load_price.apply(lambda row: row[column_name] * row['電價(jià)'], axis=1)
    return load_price

二、重要函數(shù)（apply、map、filter、query、replace、reduce）

1、對(duì)數(shù)據(jù)框下的列進(jìn)行統(tǒng)計(jì)分析：apply+lambda函數(shù)

Weekday_Spring_Autumn['weekday_mean']=Weekday_Spring_Autumn.apply(lambda x: x.mean(),axis=1)

2、數(shù)據(jù)類型轉(zhuǎn)換(float—int)：map()函數(shù)

map 函數(shù)，它接收兩個(gè)參數(shù)，第一個(gè)參數(shù)是一個(gè)函數(shù)對(duì)象 (當(dāng)然也可以是一個(gè)lambda表達(dá)式)，第二個(gè)參數(shù)是一個(gè)序列。

map(lambda x: x*2，[1,2,3,4,5])
#輸出：[2，4，6，8，10]
#可以很清楚地看到，它可以將后面序列中的每一個(gè)元素做為參數(shù)傳入lambda中。

normal.iloc[:,0]=normal.iloc[:,0].map(int)
df['time_interval_begin'] = pd.to_datetime(df['time_interval'].map(lambda x: x[1:20]))
df_trn['A25'] = df_trn['A25'].replace('1900/3/10 0:00', 70).astype(int)

2.1、刪除數(shù)據(jù)間空格 map+strip

使用strip函數(shù)：#刪除左右兩邊空格
df2['Chinese']=df2['Chinese'].map(str.strip)
#刪除左邊空格
df2['Chinese']=df2['Chinese'].map(str.lstrip)
#刪除右邊空格
df2['Chinese']=df2['Chinese'].map(str.rstrip)
有某個(gè)特殊的符號(hào)，$,我們想把這個(gè)刪掉
df2['Chinese']=df2['Chinese'].str.strip($S)
result_owner.card=result_owner.card.map(lambda x:x.lower()) #身份證字母大寫改小寫
result_card=result_owner[result_owner['card'].str.len()==18] #刪除身份證不是18位，匹配位18位
result_card_new['年齡']=[(2021-int(float(x[6:10]))) for x in result_card_new['card']] #年齡

3、數(shù)據(jù)篩選filter()函數(shù)

flter 函數(shù)，和map 函數(shù)相似。同樣也是接收兩個(gè)參數(shù)，一個(gè)lambda 表達(dá)式，一個(gè)序列。

它會(huì)遍歷后面序列中每一個(gè)元素，并將其做為參數(shù)傳入lambda表達(dá)式中，當(dāng)表達(dá)式返回 True，則元素會(huì)被保留下來，當(dāng)表達(dá)式返回 False，則元素會(huì)被丟棄。

cols_timer = list(filter(lambda x: x.endswith('t'), df_trn_tst.columns))

4、np.where和query

np.where(condition, x, y) #滿足條件(condition),輸出x,不滿足輸出y,只有條件 (condition),沒有x和y,則輸出滿足條件 (即非0) 元素的坐標(biāo)
duration = np.where(duration < 0, duration + 24*60, duration)
df_trn.query('收率 > 0.8671')

5、replace替換（兩種：對(duì)每個(gè)記錄修改，記錄中數(shù)據(jù)修改）

'''G3i修改為G3'''
result_dfa['vehicles']=[result_dfa['vehicles'].iloc[i].replace('G3i','小鵬G3') for i in range(len(result_dfa))]
result_dfa['brand']=result_dfa['brand'].replace(['天津一汽豐田','四川一汽豐田（長(zhǎng)春豐越）'],'一汽豐田')

6、reduce函數(shù)

reduce 函數(shù)，也是類似的。它的作用是先對(duì)序列中的第 1、2 個(gè)元素進(jìn)行操作，得到的結(jié)果再與第三個(gè)數(shù)據(jù)用 lambda 函數(shù)運(yùn)算，將其得到的結(jié)果再與第四個(gè)元素進(jìn)行運(yùn)算，以此類推下去直到后面沒有元素了。

reduce(lambda x,y:x+y,[1,2,3,4])
#輸出：15

三、行/列相關(guān)操作

1、刪除列/行操作

year_elect_data=year_elect_data.drop(['ind','年'],axis=1) #刪除兩列，axis=0行操作
del df_trn['收率']

industry_data_new.drop(industry_data_new.columns[1:3], axis=1, inplace=True) #刪除某幾列數(shù)據(jù)

df2=df2.drop(columns=['Chinese']) #我們想把“語(yǔ)文”這列刪掉
df2=df2.drop(index=["ZhangFei'']) #想把“張飛”這行刪掉。

result_vin=result_df.drop_duplicates(subset='vin',keep='first')  #刪除vin重復(fù)數(shù)據(jù)，保留第一個(gè)出現(xiàn)的

'''刪除EC60'''
result_dfa=result_dfa[~result_dfa['vehicles'].isin(['EC60'])]
result_dfa

2、查找某列特征數(shù)據(jù)(字符型)

obj_columns=df.select_dtypes(['object'])
col=obj_columns.columns
df.drop(columns=col,inplace=True) #刪除字符串?dāng)?shù)據(jù)

篩選某列數(shù)據(jù)不為空的數(shù)據(jù)集：notnull()

ind=df[df.data.notnull()]  
ind=df[df['data'].notnull()]

'''刪除EC60'''
result_dfa=result_dfa[~result_dfa['vehicles'].isin(['EC60'])]
result_dfa

3、列重命名

Year_Elect=Year_elect.rename(columns={"index":"指標(biāo)名稱"})
Year_elect=Year_elect.rename(columns={_col: _col + '_t' for _col in cols_timer},inplace=True)
# 批量更改列名
df.rename(columns=lambda x: x + '_1')

4、包含某個(gè)特殊字符的列：contains()

Industry_Data.columns.str.contains('_x')
#提取含有字符串'hello'的行
#方法一：
df[df['ass']=='hello']
#方法二：
re=df['ass'].str.contains('hello')
df[re]

5、篩選需要的行業(yè)數(shù)據(jù)：存在isin()/不存在

ind_data=ind[ind.行業(yè)名稱.isin(industry_division.指標(biāo)名稱)] #industry_division.指標(biāo)名稱--industry_division對(duì)應(yīng)指標(biāo)名稱列下的行業(yè)
Spring_Autumn_weekend=Spring_Autumn_weekend[~Spring_Autumn_weekend.isin(Statutory_holidays)]  #刪除法定節(jié)假日
#取出滿足條件的數(shù)據(jù)
result_vin_mobile=result_vin[result_vin['mobile'].str.len()==11]#手機(jī)號(hào)長(zhǎng)度為11位

6、根據(jù)某幾列數(shù)據(jù)排序sort_values()

dff=One_Weather_data.sort_values(['最高溫度','行業(yè)'],inplace=True)

7、統(tǒng)計(jì)gram列每種語(yǔ)言出現(xiàn)的次數(shù)

df['gram'].value_counts()

8、將空值用上下值的平均值進(jìn)行填充

df['score']=df['score'].fillna(df['score'].interpolate())

9、交換兩列的位置

#方式一：
temp=df['gram']
df.drop(labels=['gram'],axis=1,inplace=True)
df.insert(0,'gram',temp)
#方式二：
cols=df.columns[[1,0]]
df=df[cols]

10、新增一列根據(jù)salary將數(shù)據(jù)分為三組

bins=[0,5000,20000,50000]
group_name=['低','中','高']
df['categrories']=pd.cut(df['salary'],bins,labels=group_name)

11、將兩列合成一列

df['test']=df['edf']+df['crt']  #兩列都是字符型
df['test1']=df['salary'].map(str)+df['crt'] #salary 為int類型

12、拆分某列,生成新的Dataframe

df1 = df['行業(yè)'].str.split('-',expand=True)
df1.columns = ['編號(hào)','行業(yè)']

13、多列減去一列

Spring_Air=Spring_Air_Elec-Spring_Air_Reference_Elec.values[:, None] #

四、數(shù)據(jù)框

1、對(duì)數(shù)據(jù)框按照分組進(jìn)行標(biāo)準(zhǔn)化

df['data_standardized'] = df.groupby('group')['data'].transform(lambda x: (x - np.mean(x)) / np.std(x))
##對(duì)數(shù)據(jù)框按照分組，取該組最后一個(gè)值
df['data_last'] = df.groupby('group')['data'].transform('last')
#對(duì)數(shù)據(jù)框按照分組，對(duì)空值進(jìn)行平均值填充
df['data_filled'] = df.groupby('group')['data'].transform(lambda x: x.fillna(x.mean()))

2、數(shù)據(jù)透視表

#單索引
df_pivot = pd.pivot_table(df,values ='商業(yè)銷量', index = ['年月'], aggfunc=np.sum, fill_value=0) #當(dāng)我們未設(shè)置aggfunc時(shí)，它默認(rèn)aggfunc='mean'計(jì)算均值。fill_value，用0填充空值
#多索引
df_pivot = pd.pivot_table(df,values ='商業(yè)銷量', index = ['品牌','經(jīng)銷商','年月'], columns=['系列'],aggfunc=np.sum, fill_value=0)#aggfunc可以多參數(shù)，如aggfunc=[np.mean,len,np.sum]),aggfunc={'數(shù)量':len,'價(jià)格':np.sum}
#數(shù)據(jù)透視表過濾
df_pivot = pd.pivot_table(df,values ='商業(yè)銷量', index = ['品牌','經(jīng)銷商', '年月'], columns=['系列'],aggfunc=np.sum, fill_value=0)
df=df_pivot.reset_index()  #重置索引

df_pivot.query("品牌==['奔馳','寶馬']")
#get_level_values來獲得不同級(jí)別索引
df_pivot.index.get_level_values(0).unique()  #各品牌數(shù)據(jù)
df_pivot.index.get_level_values(1).unique()  #各經(jīng)銷商數(shù)據(jù)
df_pivot.xs(('奔馳','寶馬'),level=0)  #數(shù)據(jù)轉(zhuǎn)存excel

3、統(tǒng)計(jì)指標(biāo)0和空值數(shù)據(jù)

null_per=100*(df.isnull()).sum(axis=0)/len(df)
zero_per=100*(df==0).sum(axis=0)/len(df)
new=null_per+zero_per

4、檢查數(shù)據(jù)中是否含有任何缺失值

df.isnull().values.any()

5、數(shù)據(jù)向前/后移動(dòng)5天

data.shift(-5)
data.shift(5)

6、按周采樣，取一周最大值

data['收盤價(jià)'].resample('W').max()

7、計(jì)算前一天與后一天變化率

data['收盤價(jià)'].pct_change()

8、將小數(shù)轉(zhuǎn)化為百分?jǐn)?shù)

df.style.format({'data':'{0:.2%}'.format})

9、列表生成式：新增數(shù)據(jù)列

Season_Weekday['Time_label']=[str(i) +'_'+ 'weekday' for i in pd.to_datetime(Season_Weekday.index).month]

#從一個(gè)列表中
[ i for i in range(9) if i %2 == 0]
#從兩個(gè)列表中進(jìn)行推導(dǎo)
[(i,j) for i in range(5) for j in range(5)]

10、新建excel寫入數(shù)據(jù)

write=pd.ExcelWriter(r'各行業(yè)重點(diǎn)企業(yè)清單\\%s.xlsx'%(hy[0]))
jl.to_excel(write,sheet_name='xx')
write.save()

五、時(shí)間數(shù)據(jù)

1、時(shí)間范圍區(qū)間選擇：date_range

from pandas import date_range
Start=pd.to_datetime('%s-04-15'%Year)
End=pd.to_datetime('%s-05-15'%Year)
Tyday=pd.date_range(start=Start,end=End)

2、將time列時(shí)間轉(zhuǎn)換為月-日(月、年月日)

for i in range(len(df)):
	df.iloc[i,0]=df.iloc[i,0].to_pydatetime().strftime("%m-%d")
Weather['Year_Month']=Weather['Date'].map(lambda x: x.to_pydatetime().strftime("%Y-%m"))
Industry_Factor_Data["Date"]=pd.to_datetime(Industry_Factor_Data["Date"], format='%Y%m')
df['order_date'] = pd.to_datetime(df.order_dt,format="%Y%m%d"）
df['month'] = df.order_date.values.astype('datetime64[M]')

忽略警告

import warnings
warnings.filterwarnings("ignore")

總結(jié)

以上為個(gè)人經(jīng)驗(yàn)，希望能給大家一個(gè)參考，也希望大家多多支持腳本之家。

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python常見統(tǒng)計(jì)分析處理函數(shù)解讀

目錄

一、分組統(tǒng)計(jì)

1、分組后對(duì)不同指標(biāo)統(tǒng)計(jì)分析

2、數(shù)據(jù)框中多列同時(shí)乘以同一列值計(jì)算

二、重要函數(shù)（apply、map、filter、query、replace、reduce）

1、對(duì)數(shù)據(jù)框下的列進(jìn)行統(tǒng)計(jì)分析：apply+lambda函數(shù)

2、數(shù)據(jù)類型轉(zhuǎn)換(float—int)：map()函數(shù)

3、數(shù)據(jù)篩選filter()函數(shù)

4、np.where和query

5、replace替換（兩種：對(duì)每個(gè)記錄修改，記錄中數(shù)據(jù)修改）

6、reduce函數(shù)

三、行/列相關(guān)操作

1、刪除列/行操作

2、 查找某列特征數(shù)據(jù)(字符型)

3、列重命名

4、包含某個(gè)特殊字符的列：contains()

5、篩選需要的行業(yè)數(shù)據(jù)：存在isin()/不存在

6、根據(jù)某幾列數(shù)據(jù)排序sort_values()

7、統(tǒng)計(jì)gram列每種語(yǔ)言出現(xiàn)的次數(shù)

8、將空值用上下值的平均值進(jìn)行填充

9、交換兩列的位置

10、新增一列根據(jù)salary將數(shù)據(jù)分為三組

11、將兩列合成一列

12、拆分某列,生成新的Dataframe

13、多列減去一列

四、數(shù)據(jù)框

1、對(duì)數(shù)據(jù)框按照分組進(jìn)行標(biāo)準(zhǔn)化