python讀寫數(shù)據(jù)讀寫csv文件(pandas用法)

更新時間：2020年12月14日 10:22:43 作者：小朱小朱絕不服輸

這篇文章主要介紹了python讀寫數(shù)據(jù)讀寫csv文件(pandas用法)，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧

python中數(shù)據(jù)處理是比較方便的，經(jīng)常用的就是讀寫文件，提取數(shù)據(jù)等，本博客主要介紹其中的一些用法。Pandas是一個強大的分析結(jié)構(gòu)化數(shù)據(jù)的工具集;它的使用基礎(chǔ)是Numpy(提供高性能的矩陣運算);用于數(shù)據(jù)挖掘和數(shù)據(jù)分析,同時也提供數(shù)據(jù)清洗功能。

一、pandas讀取csv文件

數(shù)據(jù)處理過程中csv文件用的比較多。

import pandas as pd
data = pd.read_csv('F:/Zhu/test/test.csv')

下面看一下pd.read_csv常用的參數(shù)：

pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None）

常用參數(shù)解釋：read_csv與read_table常用的參數(shù)（更多參數(shù)查看官方手冊）：

filepath_or_buffer #需要讀取的文件及路徑
sep / delimiter 列分隔符，普通文本文件，應該都是使用結(jié)構(gòu)化的方式來組織，才能使用dataframe
header 文件中是否需要讀取列名的一行，header=None(使用names自定義列名,否則默認0,1,2,...)，header=0（將首行設(shè)為列名）
names 如果header=None，那么names必須制定！否則就沒有列的定義了。
shkiprows= 10 # 跳過前十行 
nrows = 10 # 只去前10行 
usecols=[0,1,2,...] #需要讀取的列，可以是列的位置編號，也可以是列的名稱
parse_dates = ['col_name'] # 指定某行讀取為日期格式 
index_col = None /False /0，重新生成一列成為index值，0表示第一列，用作行索引的列編號或列名?？梢允菃蝹€名稱/數(shù)字或由多個名稱/數(shù)宇組成的列表（層次化索引）
error_bad_lines = False # 當某行數(shù)據(jù)有問題時，不報錯，直接跳過，處理臟數(shù)據(jù)時使用 
na_values = 'NULL' # 將NULL識別為空值
encoding='utf-8' #指明讀取文件的編碼，默認utf-8

讀取csv/txt/tsv文件，返回一個DataFrame類型的對象。

舉例：

在這里插入圖片描述

import pandas as pd
data = pd.read_csv('F:/Zhu/test/test.csv')
print(data)

  name age    birth
0  zhu  20  2000.1.5
1  wang  20  2000.6.18
2 zhang  21 1999.11.11
3  zhu  22 1998.10.24

pandas用iloc,loc提取數(shù)據(jù)

提取行數(shù)據(jù)：

loc函數(shù)：通過行索引 “Index” 中的具體值來取行數(shù)據(jù)（如取"Index"為"A"的行）

iloc函數(shù)：通過行號來取行數(shù)據(jù)（如取第2行的數(shù)據(jù)）

import pandas as pd
import numpy as np
#創(chuàng)建一個Dataframe
data = pd.DataFrame(np.arange(16).reshape(4, 4), index=list('abcd'), columns=list('ABCD'))
print(data)

  A  B  C  D
a  0  1  2  3
b  4  5  6  7
c  8  9 10 11
d 12 13 14 15

loc提取'a'的行：

print(data.loc['a'])

A  0
B  1
C  2
D  3
Name: a, dtype: int32

iloc提取第2行：

print(data.iloc[2])

A   8
B   9
C  10
D  11
Name: c, dtype: int32

提取列數(shù)據(jù)：

print(data.loc[:, ['A']])#取'A'列所有行，多取幾列格式為 data.loc[:,['A','B']]

  A
a  0
b  4
c  8
d 12

print(data.iloc[:, [0]])

  A
a  0
b  4
c  8
d 12

提取指定行，指定列：

print(data.loc[['a','b'],['A','B']]) #提取index為'a','b',列名為'A','B'中的數(shù)據(jù)

  A B
a 0 1
b 4 5

print(data.iloc[[0,1],[0,1]]) #提取第0、1行，第0、1列中的數(shù)據(jù)

  A B
a 0 1
b 4 5

提取所有行所有列：

print(data.loc[:,:])#取A,B,C,D列的所有行
print(data.iloc[:,:])

  A  B  C  D
a  0  1  2  3
b  4  5  6  7
c  8  9 10 11
d 12 13 14 15

根據(jù)某個指定數(shù)據(jù)提取行：

print(data.loc[data['A']==0])#提取data數(shù)據(jù)(篩選條件: A列中數(shù)字為0所在的行數(shù)據(jù))

  A B C D
a 0 1 2 3

二、pandas寫入csv文件

pandas將多組列表寫入csv

import pandas as pd

#任意的多組列表
a = [1,2,3]
b = [4,5,6]  

#字典中的key值即為csv中列名
dataframe = pd.DataFrame({'a_name':a,'b_name':b})

#將DataFrame存儲為csv,index表示是否顯示行名，default=True
dataframe.to_csv("test.csv",index=False,sep=',')

結(jié)果：

在這里插入圖片描述

如果你想寫入一行，就是你存儲的一個列表是一行數(shù)據(jù)，你想把這一行數(shù)據(jù)寫入csv文件。

這個時候可以使用csv方法，一行一行的寫

import csv

with open("test.csv","w") as csvfile: 
  writer = csv.writer(csvfile)

  #先寫入columns_name
  writer.writerow(["index","a_name","b_name"])
  #寫入一行用writerow
  #write.writerow([0,1,2])
  #寫入多行用writerows
  writer.writerows([[0,1,3],[1,2,3],[2,3,4]])

在這里插入圖片描述

可以看到，每次寫一行，就自動空行，解決辦法就是在打開文件的時候加上參數(shù)newline=''

import csv

with open("F:/zhu/test/test.csv","w", newline='') as csvfile:
  writer = csv.writer(csvfile)

  #先寫入columns_name
  writer.writerow(["index","a_name","b_name"])
  #寫入多行用writerows
  writer.writerows([[0,1,3],[1,2,3],[2,3,4]])

在這里插入圖片描述

寫入txt文件類似

（1）創(chuàng)建txt數(shù)據(jù)文件，創(chuàng)建好文件記得要關(guān)閉文件，不然讀取不了文件內(nèi)容

（2）讀取txt文件

#讀取txt文件
file=open("G:\\info.txt",'r',encoding='utf-8')
userlines=file.readlines()
file.close()
for line in userlines:
  username=line.split(',')[0] #讀取用戶名
  password=line.split(',')[1] #讀取密碼
  print(username,password)

三、pandas查看數(shù)據(jù)表信息

1）查看維度：data.shape

import pandas as pd
data = pd.read_csv('F:/Zhu/test/test.csv')
print(data)
print(data.shape)

  index a_name b_name
0   0    1    3
1   1    2    3
2   2    3    4
(3, 3)

2）查看數(shù)據(jù)表基本信息：data.info

import pandas as pd
data = pd.read_csv('F:/Zhu/test/test.csv')
print(data)
print(data.info)

  index a_name b_name
0   0    1    3
1   1    2    3
2   2    3    4
<bound method DataFrame.info of  index a_name b_name
0   0    1    3
1   1    2    3
2   2    3    4>

3）查看每一行的格式：data.dtype

import pandas as pd
data = pd.read_csv('F:/Zhu/test/test.csv')
print(data.dtypes)

index   int64
a_name  int64
b_name  int64
dtype: object

4）查看前2行數(shù)據(jù)、后2行數(shù)據(jù)

df.head() #默認前10行數(shù)據(jù)，注意：可以在head函數(shù)中填寫參數(shù)，自定義要查看的行數(shù)
df.tail() #默認后10 行數(shù)據(jù)

import pandas as pd
data = pd.read_csv('F:/Zhu/test/test.csv')
print(data)
print(data.head(2))
print(data.tail(2))

  index a_name b_name
0   0    1    3
1   1    2    3
2   2    3    4
  index a_name b_name
0   0    1    3
1   1    2    3
  index a_name b_name
1   1    2    3
2   2    3    4

四、數(shù)據(jù)清洗

1）NaN數(shù)值的處理：用數(shù)字0填充空值

data.fillna(value=0,inplace=True)

注意：df.fillna不會立即生效，需要設(shè)置inplace=True

2）清除字符字段的字符空格

字符串(str)的頭和尾的空格，以及位于頭尾的\n \t之類給刪掉

data['customername']=data['customername'].map(str.strip)#如清除customername中出現(xiàn)的空格

3）大小寫轉(zhuǎn)換

data['customername']=data['customername'].str.lower()

4）刪除重復出現(xiàn)的值

data.drop_duplicates(['customername'],inplace=True)

5）數(shù)據(jù)替換

data['customername'].replace('111','qqq',inplace=True)

參考：

《Python之pandas簡介》
《Pandas中l(wèi)oc和iloc函數(shù)用法詳解（源碼+實例）》

到此這篇關(guān)于python讀寫數(shù)據(jù)讀寫csv文件(pandas用法)的文章就介紹到這了,更多相關(guān)python讀寫csv內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python讀寫數(shù)據(jù)讀寫csv文件(pandas用法)

一、pandas讀取csv文件

pandas用iloc,loc提取數(shù)據(jù)

二、pandas寫入csv文件

三、pandas查看數(shù)據(jù)表信息

四、數(shù)據(jù)清洗

參考：

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python讀寫數(shù)據(jù)讀寫csv文件(pandas用法)

一、pandas讀取csv文件

pandas用iloc,loc提取數(shù)據(jù)

二、pandas寫入csv文件

三、pandas查看數(shù)據(jù)表信息

四、數(shù)據(jù)清洗

參考：

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

二、pandas寫入csv文件

三、pandas查看數(shù)據(jù)表信息

四、數(shù)據(jù)清洗