使用Pandas選擇數(shù)據(jù)子集的方法示例
數(shù)據(jù)分析-Pandas如何選擇數(shù)據(jù)子集
Dataframe的數(shù)據(jù)中,選擇某一列,某一行,或者某個(gè)子區(qū)域,該怎么辦呢?
選擇一個(gè)屬性列維度
比如,Titanic 數(shù)據(jù)表中,如果僅僅對(duì)乘客感興趣,可以這樣操作:
In [4]: ages = titanic["Age"] In [5]: ages.head() Out[5]: 0 22.0 1 38.0 2 26.0 3 35.0 4 35.0 Name: Age, dtype: float64 In [6]: type(titanic["Age"]) Out[6]: pandas.core.series.Series In [7]: titanic["Age"].shape Out[7]: (891,)
選擇多個(gè)屬性列維度
比如,Titanic 數(shù)據(jù)表中,想選擇多個(gè)屬性進(jìn)行組合研究,不僅僅對(duì)乘客感興趣,還需要知道性別,可以這樣操作:
In [8]: age_sex = titanic[["Age", "Sex"]] In [9]: age_sex.head() Out[9]: Age Sex 0 22.0 male 1 38.0 female 2 26.0 female 3 35.0 female 4 35.0 male In [10]: type(titanic[["Age", "Sex"]]) Out[10]: pandas.core.frame.DataFrame In [11]: titanic[["Age", "Sex"]].shape Out[11]: (891, 2)
篩選屬性值集合
比如,Titanic 數(shù)據(jù)表中,對(duì)乘客的年齡大于35歲的集合感興趣
In [12]: above_35 = titanic[titanic["Age"] > 35] In [13]: above_35.head() Out[13]: PassengerId Survived Pclass ... Fare Cabin Embarked 1 2 1 1 ... 71.2833 C85 C 6 7 0 1 ... 51.8625 E46 S 11 12 1 1 ... 26.5500 C103 S 13 14 0 3 ... 31.2750 NaN S 15 16 1 2 ... 16.0000 NaN S [5 rows x 12 columns] In [15]: above_35.shape Out[15]: (217, 12)
事實(shí)上,括號(hào)內(nèi)的條件其實(shí)是一個(gè)真值列表:
In [14]: titanic["Age"] > 35 Out[14]: 0 False 1 True 2 False 3 False 4 False ... 886 False 887 False 888 False 889 False 890 False Name: Age, Length: 891, dtype: bool
此外,還對(duì)乘客的座艙等級(jí)感興趣,篩選等級(jí)2,3的,可以這樣操作:
In [16]: class_23 = titanic[titanic["Pclass"].isin([2, 3])] In [17]: class_23.head() Out[17]: PassengerId Survived Pclass ... Fare Cabin Embarked 0 1 0 3 ... 7.2500 NaN S 2 3 1 3 ... 7.9250 NaN S 4 5 0 3 ... 8.0500 NaN S 5 6 0 3 ... 8.4583 NaN Q 7 8 0 3 ... 21.0750 NaN S [5 rows x 12 columns] # 等價(jià)于: In [18]: class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)] In [19]: class_23.head() Out[19]: PassengerId Survived Pclass ... Fare Cabin Embarked 0 1 0 3 ... 7.2500 NaN S 2 3 1 3 ... 7.9250 NaN S 4 5 0 3 ... 8.0500 NaN S 5 6 0 3 ... 8.4583 NaN Q 7 8 0 3 ... 21.0750 NaN S [5 rows x 12 columns]
此外,在數(shù)據(jù)清洗中經(jīng)常用到,把NA值或者非NA值篩選出來,另做處理,可以這樣操作:
In [20]: age_no_na = titanic[titanic["Age"].notna()] In [21]: age_no_na.head() Out[21]: PassengerId Survived Pclass ... Fare Cabin Embarked 0 1 0 3 ... 7.2500 NaN S 1 2 1 1 ... 71.2833 C85 C 2 3 1 3 ... 7.9250 NaN S 3 4 1 1 ... 53.1000 C123 S 4 5 0 3 ... 8.0500 NaN S [5 rows x 12 columns] In [22]: age_no_na.shape Out[22]: (714, 12)
篩選特定行和列維度集合
比如,Titanic 數(shù)據(jù)表中,對(duì)乘客的年齡大于35歲的名字感興趣,
In [23]: adult_names = titanic.loc[titanic["Age"] > 35, "Name"] In [24]: adult_names.head() Out[24]: 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 6 McCarthy, Mr. Timothy J 11 Bonnell, Miss. Elizabeth 13 Andersson, Mr. Anders Johan 15 Hewlett, Mrs. (Mary D Kingcome) Name: Name, dtype: object
如果對(duì)第10-25行,3到5列感興趣,可以這樣操作:
In [25]: titanic.iloc[9:25, 2:5] Out[25]: Pclass Name Sex 9 2 Nasser, Mrs. Nicholas (Adele Achem) female 10 3 Sandstrom, Miss. Marguerite Rut female 11 1 Bonnell, Miss. Elizabeth female 12 3 Saundercock, Mr. William Henry male 13 3 Andersson, Mr. Anders Johan male .. ... ... ... 20 2 Fynney, Mr. Joseph J male 21 2 Beesley, Mr. Lawrence male 22 3 McGowan, Miss. Anna "Annie" female 23 1 Sloper, Mr. William Thompson male 24 3 Palsson, Miss. Torborg Danira female [16 rows x 3 columns]
以上代碼只是一個(gè)簡單示例,示例代碼中的表達(dá)式和變量范圍也可以根據(jù)實(shí)際問題進(jìn)行修改。
到此這篇關(guān)于使用Pandas選擇數(shù)據(jù)子集的方法示例的文章就介紹到這了,更多相關(guān)Pandas選擇數(shù)據(jù)子集內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
python smtplib發(fā)送多個(gè)email聯(lián)系人的實(shí)現(xiàn)
這篇文章主要介紹了python smtplib發(fā)送多個(gè)email聯(lián)系人的實(shí)現(xiàn),文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2020-10-10Keras中的兩種模型:Sequential和Model用法
這篇文章主要介紹了Keras中的兩種模型:Sequential和Model用法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2020-06-06python的endswith()的使用方法及實(shí)例
這篇文章主要介紹了python的endswith()的使用方法及實(shí)例,文章圍繞主題展開詳細(xì)的內(nèi)容介紹,具有一定的參考價(jià)值,需要的小伙伴可以參考一下2022-07-07Django + Uwsgi + Nginx 實(shí)現(xiàn)生產(chǎn)環(huán)境部署的方法
Django的部署可以有很多方式,采用nginx+uwsgi的方式是其中比較常見的一種方式。這篇文章主要介紹了Django + Uwsgi + Nginx 實(shí)現(xiàn)生產(chǎn)環(huán)境部署,感興趣的小伙伴們可以參考一下2018-06-06