Pandas數(shù)據(jù)類(lèi)型之category的用法
創(chuàng)建category
使用Series創(chuàng)建
在創(chuàng)建Series的同時(shí)添加dtype="category"就可以創(chuàng)建好category了。category分為兩部分,一部分是order,一部分是字面量:
In [1]: s = pd.Series(["a", "b", "c", "a"], dtype="category") In [2]: s Out[2]: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): ['a', 'b', 'c']
可以將DF中的Series轉(zhuǎn)換為category:
In [3]: df = pd.DataFrame({"A": ["a", "b", "c", "a"]}) In [4]: df["B"] = df["A"].astype("category") In [5]: df["B"] Out[32]: 0 a 1 b 2 c 3 a Name: B, dtype: category Categories (3, object): [a, b, c]
可以創(chuàng)建好一個(gè)pandas.Categorical
,將其作為參數(shù)傳遞給Series:
In [10]: raw_cat = pd.Categorical( ....: ["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=False ....: ) ....: In [11]: s = pd.Series(raw_cat) In [12]: s Out[12]: 0 NaN 1 b 2 c 3 NaN dtype: category Categories (3, object): ['b', 'c', 'd']
使用DF創(chuàng)建
創(chuàng)建DataFrame的時(shí)候,也可以傳入 dtype="category":
In [17]: df = pd.DataFrame({"A": list("abca"), "B": list("bccd")}, dtype="category") In [18]: df.dtypes Out[18]: A category B category dtype: object
DF中的A和B都是一個(gè)category:
In [19]: df["A"] Out[19]: 0 a 1 b 2 c 3 a Name: A, dtype: category Categories (3, object): ['a', 'b', 'c'] In [20]: df["B"] Out[20]: 0 b 1 c 2 c 3 d Name: B, dtype: category Categories (3, object): ['b', 'c', 'd']
或者使用df.astype("category")將DF中所有的Series轉(zhuǎn)換為category:
In [21]: df = pd.DataFrame({"A": list("abca"), "B": list("bccd")}) In [22]: df_cat = df.astype("category") In [23]: df_cat.dtypes Out[23]: A category B category dtype: object
創(chuàng)建控制
默認(rèn)情況下傳入dtype='category' 創(chuàng)建出來(lái)的category使用的是默認(rèn)值:
1.Categories是從數(shù)據(jù)中推斷出來(lái)的。
2.Categories是沒(méi)有大小順序的。
可以顯示創(chuàng)建CategoricalDtype來(lái)修改上面的兩個(gè)默認(rèn)值:
In [26]: from pandas.api.types import CategoricalDtype In [27]: s = pd.Series(["a", "b", "c", "a"]) In [28]: cat_type = CategoricalDtype(categories=["b", "c", "d"], ordered=True) In [29]: s_cat = s.astype(cat_type) In [30]: s_cat Out[30]: 0 NaN 1 b 2 c 3 NaN dtype: category Categories (3, object): ['b' < 'c' < 'd']
同樣的CategoricalDtype還可以用在DF中:
In [31]: from pandas.api.types import CategoricalDtype In [32]: df = pd.DataFrame({"A": list("abca"), "B": list("bccd")}) In [33]: cat_type = CategoricalDtype(categories=list("abcd"), ordered=True) In [34]: df_cat = df.astype(cat_type) In [35]: df_cat["A"] Out[35]: 0 a 1 b 2 c 3 a Name: A, dtype: category Categories (4, object): ['a' < 'b' < 'c' < 'd'] In [36]: df_cat["B"] Out[36]: 0 b 1 c 2 c 3 d Name: B, dtype: category Categories (4, object): ['a' < 'b' < 'c' < 'd']
轉(zhuǎn)換為原始類(lèi)型
使用Series.astype(original_dtype)
或者 np.asarray(categorical)
可以將Category轉(zhuǎn)換為原始類(lèi)型:
In [39]: s = pd.Series(["a", "b", "c", "a"]) In [40]: s Out[40]: 0 a 1 b 2 c 3 a dtype: object In [41]: s2 = s.astype("category") In [42]: s2 Out[42]: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): ['a', 'b', 'c'] In [43]: s2.astype(str) Out[43]: 0 a 1 b 2 c 3 a dtype: object In [44]: np.asarray(s2) Out[44]: array(['a', 'b', 'c', 'a'], dtype=object)
categories的操作
獲取category的屬性
Categorical數(shù)據(jù)有 categories
和 ordered
兩個(gè)屬性。可以通過(guò)s.cat.categories
和 s.cat.ordered
來(lái)獲?。?/p>
In [57]: s = pd.Series(["a", "b", "c", "a"], dtype="category") In [58]: s.cat.categories Out[58]: Index(['a', 'b', 'c'], dtype='object') In [59]: s.cat.ordered Out[59]: False
重排category的順序:
In [60]: s = pd.Series(pd.Categorical(["a", "b", "c", "a"], categories=["c", "b", "a"])) In [61]: s.cat.categories Out[61]: Index(['c', 'b', 'a'], dtype='object') In [62]: s.cat.ordered Out[62]: False
重命名categories
通過(guò)給s.cat.categories賦值可以重命名categories:
In [67]: s = pd.Series(["a", "b", "c", "a"], dtype="category") In [68]: s Out[68]: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): ['a', 'b', 'c'] In [69]: s.cat.categories = ["Group %s" % g for g in s.cat.categories] In [70]: s Out[70]: 0 Group a 1 Group b 2 Group c 3 Group a dtype: category Categories (3, object): ['Group a', 'Group b', 'Group c']
使用rename_categories可以達(dá)到同樣的效果:
In [71]: s = s.cat.rename_categories([1, 2, 3]) In [72]: s Out[72]: 0 1 1 2 2 3 3 1 dtype: category Categories (3, int64): [1, 2, 3]
或者使用字典對(duì)象:
# You can also pass a dict-like object to map the renaming In [73]: s = s.cat.rename_categories({1: "x", 2: "y", 3: "z"}) In [74]: s Out[74]: 0 x 1 y 2 z 3 x dtype: category Categories (3, object): ['x', 'y', 'z']
使用add_categories添加category
可以使用add_categories來(lái)添加category:
In [77]: s = s.cat.add_categories([4]) In [78]: s.cat.categories Out[78]: Index(['x', 'y', 'z', 4], dtype='object') In [79]: s Out[79]: 0 x 1 y 2 z 3 x dtype: category Categories (4, object): ['x', 'y', 'z', 4]
使用remove_categories刪除category
In [80]: s = s.cat.remove_categories([4]) In [81]: s Out[81]: 0 x 1 y 2 z 3 x dtype: category Categories (3, object): ['x', 'y', 'z']
刪除未使用的cagtegory
In [82]: s = pd.Series(pd.Categorical(["a", "b", "a"], categories=["a", "b", "c", "d"])) In [83]: s Out[83]: 0 a 1 b 2 a dtype: category Categories (4, object): ['a', 'b', 'c', 'd'] In [84]: s.cat.remove_unused_categories() Out[84]: 0 a 1 b 2 a dtype: category Categories (2, object): ['a', 'b']
重置cagtegory
使用set_categories()
可以同時(shí)進(jìn)行添加和刪除category操作:
In [85]: s = pd.Series(["one", "two", "four", "-"], dtype="category") In [86]: s Out[86]: 0 one 1 two 2 four 3 - dtype: category Categories (4, object): ['-', 'four', 'one', 'two'] In [87]: s = s.cat.set_categories(["one", "two", "three", "four"]) In [88]: s Out[88]: 0 one 1 two 2 four 3 NaN dtype: category Categories (4, object): ['one', 'two', 'three', 'four']
category排序
如果category創(chuàng)建的時(shí)候帶有 ordered=True , 那么可以對(duì)其進(jìn)行排序操作:
In [91]: s = pd.Series(["a", "b", "c", "a"]).astype(CategoricalDtype(ordered=True)) In [92]: s.sort_values(inplace=True) In [93]: s Out[93]: 0 a 3 a 1 b 2 c dtype: category Categories (3, object): ['a' < 'b' < 'c'] In [94]: s.min(), s.max() Out[94]: ('a', 'c')
可以使用 as_ordered() 或者 as_unordered() 來(lái)強(qiáng)制排序或者不排序:
In [95]: s.cat.as_ordered() Out[95]: 0 a 3 a 1 b 2 c dtype: category Categories (3, object): ['a' < 'b' < 'c'] In [96]: s.cat.as_unordered() Out[96]: 0 a 3 a 1 b 2 c dtype: category Categories (3, object): ['a', 'b', 'c']
重排序
使用Categorical.reorder_categories() 可以對(duì)現(xiàn)有的category進(jìn)行重排序:
In [103]: s = pd.Series([1, 2, 3, 1], dtype="category") In [104]: s = s.cat.reorder_categories([2, 3, 1], ordered=True) In [105]: s Out[105]: 0 1 1 2 2 3 3 1 dtype: category Categories (3, int64): [2 < 3 < 1]
多列排序
sort_values 支持多列進(jìn)行排序:
In [109]: dfs = pd.DataFrame( .....: { .....: "A": pd.Categorical( .....: list("bbeebbaa"), .....: categories=["e", "a", "b"], .....: ordered=True, .....: ), .....: "B": [1, 2, 1, 2, 2, 1, 2, 1], .....: } .....: ) .....: In [110]: dfs.sort_values(by=["A", "B"]) Out[110]: A B 2 e 1 3 e 2 7 a 1 6 a 2 0 b 1 5 b 1 1 b 2 4 b 2
比較操作
如果創(chuàng)建的時(shí)候設(shè)置了ordered==True ,那么category之間就可以進(jìn)行比較操作。支持 ==
, !=
, >
, >=
, <
, 和 <=
這些操作符。
In [113]: cat = pd.Series([1, 2, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) In [114]: cat_base = pd.Series([2, 2, 2]).astype(CategoricalDtype([3, 2, 1], ordered=True)) In [115]: cat_base2 = pd.Series([2, 2, 2]).astype(CategoricalDtype(ordered=True)) In [119]: cat > cat_base Out[119]: 0 True 1 False 2 False dtype: bool In [120]: cat > 2 Out[120]: 0 True 1 False 2 False dtype: bool
其他操作
Cagetory本質(zhì)上來(lái)說(shuō)還是一個(gè)Series,所以Series的操作category基本上都可以使用,比如: Series.min(), Series.max() 和 Series.mode()。
value_counts:
In [131]: s = pd.Series(pd.Categorical(["a", "b", "c", "c"], categories=["c", "a", "b", "d"])) In [132]: s.value_counts() Out[132]: c 2 a 1 b 1 d 0 dtype: int64
DataFrame.sum():
In [133]: columns = pd.Categorical( .....: ["One", "One", "Two"], categories=["One", "Two", "Three"], ordered=True .....: ) .....: In [134]: df = pd.DataFrame( .....: data=[[1, 2, 3], [4, 5, 6]], .....: columns=pd.MultiIndex.from_arrays([["A", "B", "B"], columns]), .....: ) .....: In [135]: df.sum(axis=1, level=1) Out[135]: One Two Three 0 3 3 0 1 9 6 0
Groupby:
In [136]: cats = pd.Categorical( .....: ["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"] .....: ) .....: In [137]: df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]}) In [138]: df.groupby("cats").mean() Out[138]: values cats a 1.0 b 2.0 c 4.0 d NaN In [139]: cats2 = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"]) In [140]: df2 = pd.DataFrame( .....: { .....: "cats": cats2, .....: "B": ["c", "d", "c", "d"], .....: "values": [1, 2, 3, 4], .....: } .....: ) .....: In [141]: df2.groupby(["cats", "B"]).mean() Out[141]: values cats B a c 1.0 d 2.0 b c 3.0 d 4.0 c c NaN d NaN
Pivot tables:
In [142]: raw_cat = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"]) In [143]: df = pd.DataFrame({"A": raw_cat, "B": ["c", "d", "c", "d"], "values": [1, 2, 3, 4]}) In [144]: pd.pivot_table(df, values="values", index=["A", "B"]) Out[144]: values A B a c 1 d 2 b c 3 d 4
到此這篇關(guān)于Pandas數(shù)據(jù)類(lèi)型之category的用法的文章就介紹到這了,更多相關(guān)category的用法內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
- Python?Pandas?修改表格數(shù)據(jù)類(lèi)型?DataFrame?列的順序案例
- Pandas中Series的創(chuàng)建及數(shù)據(jù)類(lèi)型轉(zhuǎn)換
- pandas數(shù)據(jù)類(lèi)型之Series的具體使用
- Pandas數(shù)據(jù)類(lèi)型轉(zhuǎn)換df.astype()及數(shù)據(jù)類(lèi)型查看df.dtypes的使用
- python-pandas創(chuàng)建Series數(shù)據(jù)類(lèi)型的操作
- Pandas數(shù)據(jù)類(lèi)型自行變換及數(shù)據(jù)類(lèi)型轉(zhuǎn)換失敗問(wèn)題分析與解決
相關(guān)文章
PIL對(duì)上傳到Django的圖片進(jìn)行處理并保存的實(shí)例
今天小編就為大家分享一篇PIL對(duì)上傳到Django的圖片進(jìn)行處理并保存的實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-08-08python GUI庫(kù)圖形界面開(kāi)發(fā)之PyQt5瀏覽器控件QWebEngineView詳細(xì)使用方法
這篇文章主要介紹了python GUI庫(kù)圖形界面開(kāi)發(fā)之PyQt5瀏覽器控件QWebEngineView詳細(xì)使用方法,需要的朋友可以參考下2020-02-02python GUI庫(kù)圖形界面開(kāi)發(fā)之PyQt5滑塊條控件QSlider詳細(xì)使用方法與實(shí)例
這篇文章主要介紹了python GUI庫(kù)圖形界面開(kāi)發(fā)之PyQt5滑塊條控件QSlider詳細(xì)使用方法與實(shí)例,需要的朋友可以參考下2020-02-02深入理解Python中的函數(shù)參數(shù)傳遞機(jī)制
在Python中,對(duì)于函數(shù)的參數(shù)傳遞,有兩種主要的方式:傳值和傳引用。事實(shí)上,Python的參數(shù)傳遞是一種“傳對(duì)象引用”的方式,本文呢我們將詳細(xì)介紹Python的函數(shù)參數(shù)傳遞機(jī)制,這對(duì)理解Python編程語(yǔ)言的底層實(shí)現(xiàn)以及優(yōu)化你的代碼都非常有幫助2023-07-07pyqt5 QlistView列表顯示的實(shí)現(xiàn)示例
這篇文章主要介紹了pyqt5 QlistView列表顯示的實(shí)現(xiàn)示例,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2020-03-03十個(gè)Python練手的實(shí)戰(zhàn)項(xiàng)目,學(xué)會(huì)這些Python就基本沒(méi)問(wèn)題了(推薦)
這篇文章主要介紹了Python實(shí)戰(zhàn)項(xiàng)目,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2019-04-04