腳本之家服務(wù)器常用軟件

快捷導(dǎo)航

軟件下載

android MAC 驅(qū)動(dòng)下載字體下載 DLL

源碼下載

PHP ASP.NET ASP JSP

軟件編程

C# JAVA C 語言 Delphi Android

網(wǎng)絡(luò)編程

PHP ASP.NET ASP JavaScript

在線工具

CSS格式化 JS格式化 Html轉(zhuǎn)化為Js

數(shù)據(jù)庫

MYSQL MSSQL oracle DB2 MARIADB

CMS

PHPCMS DEDECMS 帝國CMS WordPress

常用工具

PHP開發(fā)工具 python Photoshop 必備軟件

Python Pandas處理CSV文件的常用技巧分享

更新時(shí)間：2022年06月08日 11:31:54 作者：SpikeKing

這篇文章主要和大家分享幾個(gè)Python Pandas中處理CSV文件的常用技巧，如：統(tǒng)計(jì)列值出現(xiàn)的次數(shù)、篩選特定列值、遍歷數(shù)據(jù)行等，需要的可以參考一下

讀取Pandas文件

df = pd.read_csv(file_path, encoding='GB2312')
print(df.info())

注意：Pandas的讀取格式默認(rèn)是UTF-8，在中文CSV中會(huì)報(bào)錯(cuò)：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 2: invalid continuation byte

修改編碼為 GB2312 ，即可，或者忽略encode轉(zhuǎn)義錯(cuò)誤，如下：

df = pd.read_csv(file_path, encoding='GB2312')
df = pd.read_csv(file_path, encoding='unicode_escape')

df.info()顯示df的基本信息，例如：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3840 entries, 0 to 3839
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 實(shí)驗(yàn)時(shí)間批次 3840 non-null object
1 物鏡倍數(shù) 3840 non-null object
2 板子編號(hào) 3840 non-null object
3 板子編號(hào)及物鏡倍數(shù) 3840 non-null object
4 圖名稱 3840 non-null object
5 細(xì)胞類型 3840 non-null object
6 板子孔位置 3840 non-null object
7 孔拍攝位置 3840 non-null int64
8 細(xì)胞培養(yǎng)基 3840 non-null object
9 細(xì)胞培養(yǎng)時(shí)間（小時(shí)） 3840 non-null int64
10 擾動(dòng)類別 3840 non-null object
11 擾動(dòng)處理時(shí)間（小時(shí)） 3840 non-null int64
12 擾動(dòng)處理濃度（ug/ml） 3840 non-null float64
13 標(biāo)注激活(1/0) 3840 non-null int64
14 unique 3840 non-null object
15 tvt 3840 non-null int64
dtypes: float64(1), int64(5), object(10)
memory usage: 480.1+ KB

統(tǒng)計(jì)列值出現(xiàn)的次數(shù)

df[列名].value_counts()，如df["擾動(dòng)類別"].value_counts()：

df["擾動(dòng)類別"].value_counts()

輸出：

coated OKT3 720
OKT3 720
coated OKT3+anti-CD28 576
DMSO 336
anti-CD28 288
PBS 288
Nivo 288
Pemb 288
empty 192
coated OKT3 + anti-CD28 144
Name: 擾動(dòng)類別, dtype: int64

直接繪制value_counts()的柱形圖，參考Pandas - Chart Visualization：

import matplotlib.pyplot as plt
%matplotlib inline

plt.close("all")
plt.figure(figsize=(20, 8))
df["擾動(dòng)類別"].value_counts().plot(kind="bar")
# plt.xticks(rotation='vertical', fontsize=10)
plt.show()

柱形圖：

篩選特定列值

df.loc[篩選條件]，篩選特定列值之后，重新賦值，只處理篩選值，也可以寫入csv文件。

df_plate1 = df.loc[df["板子編號(hào)"] == "plate1"]
df_plate1.info()
# df.loc[df["板子編號(hào)"] == "plate1"].to_csv("batch3_IOStrain_klasses_utf8_plate1.csv")  # 存儲(chǔ)CSV文件

注意：篩選的內(nèi)外兩個(gè)df需要相同，否則報(bào)錯(cuò)

pandas loc IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

輸出，數(shù)據(jù)量由3840下降為1280。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1280 entries, 0 to 1279
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 實(shí)驗(yàn)時(shí)間批次 1280 non-null object
1 物鏡倍數(shù) 1280 non-null object
2 板子編號(hào) 1280 non-null object
3 板子編號(hào)及物鏡倍數(shù) 1280 non-null object
4 圖名稱 1280 non-null object
5 細(xì)胞類型 1280 non-null object
6 板子孔位置 1280 non-null object
7 孔拍攝位置 1280 non-null int64
8 細(xì)胞培養(yǎng)基 1280 non-null object
9 細(xì)胞培養(yǎng)時(shí)間（小時(shí)） 1280 non-null int64
10 擾動(dòng)類別 1280 non-null object
11 擾動(dòng)處理時(shí)間（小時(shí)） 1280 non-null int64
12 擾動(dòng)處理濃度（ug/ml） 1280 non-null float64
13 標(biāo)注激活(1/0) 1280 non-null int64
14 unique 1280 non-null object
15 tvt 1280 non-null int64
dtypes: float64(1), int64(5), object(10)
memory usage: 170.0+ KB

遍歷數(shù)據(jù)行

for idx, row in df_plate1_lb0.iterrows():，通過row[“列名”]，輸出具體的值，如下：

for idx, row in df_plate1_lb0.iterrows():
    img_name = row["圖名稱"]
    img_ch_format = img_format.format(img_name, "{}")
    for i in range(1, 7):
        img_path = os.path.join(plate1_img_folder, img_ch_format.format(i))
        img = cv2.imread(img_path)
        print('[Info] img shape: {}'.format(img.shape))
    break

輸出：

[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)

繪制直方圖(柱狀圖)

統(tǒng)計(jì)去除背景顏色的灰度圖字典

# 去除背景顏色
pix_bkg = np.argmax(np.bincount(img_gray.ravel()))
img_gray = np.where(img_gray <= pix_bkg + 2, 0, img_gray)
img_gray = img_gray.astype(np.uint8)

# 生成數(shù)值數(shù)組
hist = cv2.calcHist([img_gray], [0], None, [256], [0, 256]) 
hist = hist.ravel()

# 數(shù)值字典
hist_dict = collections.defaultdict(int)
for i, v in enumerate(hist):
    hist_dict[i] += int(v)

# 去除背景顏色，已經(jīng)都統(tǒng)計(jì)到0，所以0值非常大，刪除0值，觀察分布
hist_dict[0] = 0

繪制柱狀圖：

plt.subplots：設(shè)置多個(gè)子圖，figsize背景尺寸，facecolor背景顏色
ax.set_title：設(shè)置標(biāo)題
ax.bar：x軸的值，y軸的值
ax.set_xticks：x軸的顯示間隔
plt.savefig：存儲(chǔ)圖像
plt.show：展示

fig, ax = plt.subplots(1, 1, figsize=(10, 8), facecolor='white')
ax.set_title('channel {}'.format(ci))
n_bins = 100
ax.bar(range(n_bins+1), [hist_dict.get(xtick, 0) for xtick in range(n_bins+1)])
ax.set_xticks(range(0, n_bins, 5))

plt.savefig(res_path)
plt.show()

效果：