Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫模塊

更新時間：2022年05月30日 10:34:18 作者：springsnow

這篇文章介紹了Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫模塊，文中通過示例代碼介紹的非常詳細。對大家的學(xué)習(xí)或工作具有一定的參考借鑒價值，需要的朋友可以參考下

支持文件存取操作，支持?jǐn)?shù)據(jù)庫(sql)、html、json、pickle、csv(txt、excel)、sas、stata、hdf等。
支持增刪改查、切片、高階函數(shù)、分組聚合等單表操作，以及和dict、list的互相轉(zhuǎn)換。
支持多表拼接合并操作。
支持簡單的繪圖操作。
支持簡單的統(tǒng)計分析操作。

一、Series數(shù)據(jù)結(jié)構(gòu)

Series是一種類似于一維數(shù)組的對象，由一組數(shù)據(jù)和一組與之相關(guān)的數(shù)據(jù)標(biāo)簽（索引）組成。

Series比較像列表（數(shù)組）和字典的結(jié)合體

import numpy as np
import pandas as pd

df = pd.Series(0, index=['a', 'b', 'c', 'd'])
print(df)
# a    0
# b    0
# c    0
# d    0
# dtype: int64

print(df.values) # 值
# [0 0 0 0]

print(df.index) # 索引
# Index(['a', 'b', 'c', 'd'], dtype='object')

1、Series的創(chuàng)建

import numpy as np
import pandas as pd

df = pd.Series(np.array([1, 2, 3, 4, np.nan]), index=['a', 'b', 'c', 'd', 'e']) # 1、從ndarray創(chuàng)建Series
print(df)
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# e    NaN
# dtype: float64

df = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': np.nan}) # 2、也可以從字典創(chuàng)建Series

dates = pd.date_range('20190101', periods=6, freq='M')
print(type(dates)) # 
print(dates)
# DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
#                '2019-05-31', '2019-06-30'],
#               dtype='datetime64[ns]', freq='M')

df=pd.Series(0,index=dates) # 3、時間序列索引
print(df)
# 2019-01-31    0
# 2019-02-28    0
# 2019-03-31    0
# 2019-04-30    0
# 2019-05-31    0
# 2019-06-30    0
# Freq: M, dtype: int64

產(chǎn)生時間對象數(shù)組：date_range參數(shù)詳解：

start:開始時間
end:結(jié)束時間
periods:時間長度
freq:時間頻率，默認(rèn)為'D'，可選H(our),W(eek),B(usiness),S(emi-)M(onth),(min)T(es), S(econd), A(year),…

2、Series屬性

print(df ** 2) # 3、與標(biāo)量運算
# a     1.0
# b     4.0
# c     9.0
# d    16.0
# e     NaN
# dtype: float64


print(df + df) # 4、兩個Series運算
# a    2.0
# b    4.0
# c    6.0
# d    8.0
# e    NaN
# dtype: float64

print(df[0] )  # 5、數(shù)字索引； 1.0
print(df[[0, 1, 2]]) # 行索引
# a    1.0
# b    2.0
# c    3.0
# dtype: float64
print(df['a'] )  #  6、鍵索引（行標(biāo)簽） ；1.0
  
print(df[['b','c']])

print('a' in df)  # 7、in運算；True

print(df[0:2] ) # 8、切片
# a    1.0
# b    2.0
# dtype: float64

print(np.sin(df)) # 9、通用函數(shù)
# a    0.841471
# b    0.909297
# c    0.141120
# d   -0.756802
# e         NaN
# dtype: float64

print(df[df > 1] ) # 10、布爾值過濾
# b    2.0
# c    3.0
# d    4.0
# dtype: float64

2、Series缺失數(shù)據(jù)處理

df = pd.Series([1, 2, 3, 4, np.nan], index=['a', 'b', 'c', 'd', 'e'])
print(df)
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# e    NaN
# dtype: float64

print(df.dropna() ) # 1、過濾掉值為NaN的行
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# dtype: float64

print(df.fillna(5) ) # 2、用指定值填充缺失數(shù)據(jù)
# a    1.0
# b    2.0
# c    3.0
# d    4.0
# e    5.0
# dtype: float64

print(df.isnull() ) #  3、返回布爾數(shù)組，缺失值對應(yīng)為True
# a    False
# b    False
# c    False
# d    False
# e     True
# dtype: bool

print(df.notnull() ) # 4、返回布爾數(shù)組，缺失值對應(yīng)為False
# a     True
# b     True
# c     True
# d     True
# e    False
# dtype: bool

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

DataFrame是一個表格型的數(shù)據(jù)結(jié)構(gòu)，含有一組有序的列。

DataFrame可以被看做是由Series組成的字典，并且共用一個索引。

1、DataFrame的創(chuàng)建

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.zeros((3, 4))) # 創(chuàng)建一個三行四列的DataFrame
print(df1)
#      0    1    2    3
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0

dates = pd.date_range('20190101', periods=6, freq='M')

np.random.seed(1)
arr = 10 * np.random.randn(6, 4)
print(arr)
# [[ 16.24345364  -6.11756414  -5.28171752 -10.72968622]
#  [  8.65407629 -23.01538697  17.44811764  -7.61206901]
#  [  3.19039096  -2.49370375  14.62107937 -20.60140709]
#  [ -3.22417204  -3.84054355  11.33769442 -10.99891267]
#  [ -1.72428208  -8.77858418   0.42213747   5.82815214]
#  [-11.00619177  11.4472371    9.01590721   5.02494339]]

df = pd.DataFrame(arr, index=dates, columns=['c1', 'c2', 'c3', 'c4']) # 自定義index和column
print(df)
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

2、DataFrame屬性

print(df.dtypes) # 1、查看數(shù)據(jù)類型
# 0    float64
# 1    float64
# 2    float64
# 3    float64
# dtype: object


print(df.index) # 2、查看行索引
# DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
#                '2019-05-31', '2019-06-30'],
#               dtype='datetime64[ns]', freq='M')

print(df.columns) # 3、查看各列的標(biāo)簽
# Index(['c1', 'c2', 'c3', 'c4'], dtype='object')

print(df.values) # 4、查看數(shù)據(jù)框內(nèi)的數(shù)據(jù)，也即不含行標(biāo)簽和列頭的數(shù)據(jù) 
# [[ 16.24345364  -6.11756414  -5.28171752 -10.72968622]
#  [  8.65407629 -23.01538697  17.44811764  -7.61206901]
#  [  3.19039096  -2.49370375  14.62107937 -20.60140709]
#  [ -3.22417204  -3.84054355  11.33769442 -10.99891267]
#  [ -1.72428208  -8.77858418   0.42213747   5.82815214]
#  [-11.00619177  11.4472371    9.01590721   5.02494339]]

print(df.describe()) # 5、查看數(shù)據(jù)每一列的極值，均值，中位數(shù)，只可用于數(shù)值型數(shù)據(jù)
#               c1         c2         c3         c4
# count   6.000000   6.000000   6.000000   6.000000
# mean    2.022213  -5.466424   7.927203  -6.514830
# std     9.580084  11.107772   8.707171  10.227641
# min   -11.006192 -23.015387  -5.281718 -20.601407
# 25%    -2.849200  -8.113329   2.570580 -10.931606
# 50%     0.733054  -4.979054  10.176801  -9.170878
# 75%     7.288155  -2.830414  13.800233   1.865690
# max    16.243454  11.447237  17.448118   5.828152

print(df.T) # 6、transpose轉(zhuǎn)置，也可用Ｔ來操作
#     2019-01-31  2019-02-28  2019-03-31  2019-04-30  2019-05-31  2019-06-30
# c1   16.243454    8.654076    3.190391   -3.224172   -1.724282  -11.006192

# c2   -6.117564  -23.015387   -2.493704   -3.840544   -8.778584   11.447237
# c3   -5.281718   17.448118   14.621079   11.337694    0.422137    9.015907
# c4  -10.729686   -7.612069  -20.601407  -10.998913    5.828152    5.024943


print(df.sort_index(axis=0)) # 7、排序，axis=0 可按行標(biāo)簽排序輸出； 按行標(biāo)簽][2019-01-01, 2019-01-02...]從大到小排序
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

print(df.sort_index(axis=1)) # 7、排序，axis=1 可按列頭標(biāo)簽排序輸出；按列標(biāo)簽[c1, c2, c3, c4從大到小排序
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

print(df.sort_values(by='c2')) # 8、按數(shù)據(jù)值來排序 ；按c2列的值從大到小排序
#                    c1         c2         c3         c4
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943

3、DataFrame取值

print(df['c2']) # 1、 通過columns標(biāo)簽取值
# 2019-01-31    -6.117564
# 2019-02-28   -23.015387
# 2019-03-31    -2.493704
# 2019-04-30    -3.840544
# 2019-05-31    -8.778584
# 2019-06-30    11.447237
# Freq: M, Name: c2, dtype: float64

print(df[['c2', 'c3']])
#                    c2         c3
# 2019-01-31  -6.117564  -5.281718
# 2019-02-28 -23.015387  17.448118
# 2019-03-31  -2.493704  14.621079
# 2019-04-30  -3.840544  11.337694
# 2019-05-31  -8.778584   0.422137
# 2019-06-30  11.447237   9.015907

print(df[0:3])  # 2、 通過columns索引取值
#                    c1         c2         c3         c4
# 2019-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2019-02-28   8.654076 -23.015387  17.448118  -7.612069
# 2019-03-31   3.190391  -2.493704  14.621079 -20.601407

print(df.loc['20200228':'20200430']) # 3、loc 通過行標(biāo)簽取值：
#                  c1         c2         c3         c3
# 2020-02-29  8.654076 -23.015387  17.448118  -7.612069
# 2020-03-31  3.190391  -2.493704  14.621079 -20.601407
# 2020-04-30 -3.224172  -3.840544  11.337694 -10.998913

print(df.iloc[1:3]) # 4、iloc 通過行索引選擇數(shù)據(jù),取第二行到三行。
#                  c1         c2         c3         c3
# 2020-02-29  8.654076 -23.015387  17.448118  -7.612069
# 2020-03-31  3.190391  -2.493704  14.621079 -20.601407

print(df.iloc[2, 1])  # 第三行第二列值：-2.493703754774101

print(df.iloc[1:4, 1:4]) # 第 2-4行與第2-4列：
#                    c2         c3         c4
# 2019-02-28 -23.015387  17.448118  -7.612069
# 2019-03-31  -2.493704  14.621079 -20.601407
# 2019-04-30  -3.840544  11.337694 -10.998913

print(df['c3'] > 10) # 5、 使用邏輯判斷取值
# 2020-01-31       False
# 2020-02-29         True
# 2020-03-31         True
# 2020-04-30         True
# 2020-05-31       False
# 2020-06-30       False
# Freq: M, Name: c3, dtype: bool
print(df[df['c3'] > 10]) # 5、 使用邏輯判斷取值
# c1 c2 c3 c4 
# 2020-02-29 8.654076 -23.015387 17.448118 -7.612069 
# 2020-03-31 3.190391 -2.493704 14.621079 -20.601407 
# 2020-04-30 -3.224172 -3.840544 11.337694 -10.998913
print(df[(df['c1'] > 0) & (df['c2'] > -8)])
#                    c1        c2         c3         c4
# 2019-01-31  16.243454 -6.117564  -5.281718 -10.729686
# 2019-03-31   3.190391 -2.493704  14.621079 -20.601407

4、DataFrame值替換

df.iloc[1:3]=5 # 將2-3行的值設(shè)為5
print(df)

#                    c1         c2         c3         c4
# 2020-01-31  16.243454  -6.117564  -5.281718 -10.729686
# 2020-02-29   5.000000   5.000000   5.000000   5.000000
# 2020-03-31   5.000000   5.000000   5.000000   5.000000
# 2020-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2020-05-31  -1.724282  -8.778584   0.422137   5.828152


df.iloc[0:3, 0:2] = 0 # 將1-3行1-2列的值設(shè)為0
print(df)
#                    c1         c2         c3         c4
# 2019-01-31   0.000000   0.000000  -5.281718 -10.729686
# 2019-02-28   0.000000   0.000000  17.448118  -7.612069
# 2019-03-31   0.000000   0.000000  14.621079 -20.601407
# 2019-04-30  -3.224172  -3.840544  11.337694 -10.998913
# 2019-05-31  -1.724282  -8.778584   0.422137   5.828152
# 2019-06-30 -11.006192  11.447237   9.015907   5.024943


# 針對行做處理
df[df['c3'] > 10] = 100 # 將C3列的大于10的行數(shù)值設(shè)為0
print(df)
#                     c1          c2          c3          c4
# 2019-01-31    0.000000    0.000000   -5.281718  -10.729686
# 2019-02-28  100.000000  100.000000  100.000000  100.000000
# 2019-03-31  100.000000  100.000000  100.000000  100.000000
# 2019-04-30  100.000000  100.000000  100.000000  100.000000
# 2019-05-31   -1.724282   -8.778584    0.422137    5.828152
# 2019-06-30  -11.006192   11.447237    9.015907    5.024943


# 針對行做處理
df = df.astype(np.int32)
df[df['c3'].isin([100])] = 1000 # 將C3列的等于100的行數(shù)值設(shè)為1000
print(df)
#               c1    c2    c3    c4
# 2019-01-31     0     0    -5   -10
# 2019-02-28  1000  1000  1000  1000
# 2019-03-31  1000  1000  1000  1000
# 2019-04-30  1000  1000  1000  1000
# 2019-05-31    -1    -8     0     5
# 2019-06-30   -11    11     9     5

5、處理丟失數(shù)據(jù)

print(df.isnull())
#       c1     c2     c3     c4
# 0  False   True  False  False
# 1  False  False  False  False
# 2  False  False   True  False
# 3  False  False  False  False
# 4  False  False  False  False
# 5  False  False  False   True
# 6   True   True   True   True

print(df.isnull().sum()) # 1、通過在isnull()方法后使用sum()方法即可獲得該數(shù)據(jù)集某個特征含有多少個缺失值
# c1    1
# c2    2
# c3    2
# c4    2
# dtype: int64

print(df.dropna(axis=0)) # 2、axis=0刪除有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(axis=1)) # 3、axis=1刪除有NaN值的列
# Empty DataFrame
# Columns: []
# Index: [0, 1, 2, 3, 4, 5, 6]

print(df.dropna(how='all')) # 4、刪除全為NaN值得行或列
#     c1   c2   c3   c4
# 0  5.1  NaN  1.4  0.2
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.dropna(thresh=4)) #5、 保留至少有4個非NaN數(shù)據(jù)的行，刪除行不為4個值的，
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(subset=['c2'])) # 6、刪除c2中有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.fillna(value=10)) # 7、用指定值填充nan值
#      c1    c2    c3    c4
# 0   5.1  10.0   1.4   0.2
# 1   4.9   3.0   1.4   0.2
# 2   4.7   3.2  10.0   0.2
# 3   7.0   3.2   4.7   1.4
# 4   6.4   3.2   4.5   1.5
# 5   6.9   3.1   4.9  10.0
# 6  10.0  10.0  10.0  10.0

6、合并數(shù)據(jù)

print(df.isnull())
#       c1     c2     c3     c4
# 0  False   True  False  False
# 1  False  False  False  False
# 2  False  False   True  False
# 3  False  False  False  False
# 4  False  False  False  False
# 5  False  False  False   True
# 6   True   True   True   True

print(df.isnull().sum()) # 1、通過在isnull()方法后使用sum()方法即可獲得該數(shù)據(jù)集某個特征含有多少個缺失值
# c1    1
# c2    2
# c3    2
# c4    2
# dtype: int64

print(df.dropna(axis=0)) # 2、axis=0刪除有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(axis=1)) # 3、axis=1刪除有NaN值的列
# Empty DataFrame
# Columns: []
# Index: [0, 1, 2, 3, 4, 5, 6]

print(df.dropna(how='all')) # 4、刪除全為NaN值得行或列
#     c1   c2   c3   c4
# 0  5.1  NaN  1.4  0.2
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.dropna(thresh=4)) #5、 保留至少有4個非NaN數(shù)據(jù)的行，刪除行不為4個值的，
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5

print(df.dropna(subset=['c2'])) # 6、刪除c2中有NaN值的行
#     c1   c2   c3   c4
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN

print(df.fillna(value=10)) # 7、用指定值填充nan值
#      c1    c2    c3    c4
# 0   5.1  10.0   1.4   0.2
# 1   4.9   3.0   1.4   0.2
# 2   4.7   3.2  10.0   0.2
# 3   7.0   3.2   4.7   1.4
# 4   6.4   3.2   4.5   1.5
# 5   6.9   3.1   4.9  10.0
# 6  10.0  10.0  10.0  10.0

二、讀取CSV文件

import pandas as pd
from io import StringIO

test_data = '''
5.1,,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,,0.2
7.0,3.2,4.7,1.4
6.4,3.2,4.5,1.5
6.9,3.1,4.9,
,,,
'''

test_data = StringIO(test_data)
df = pd.read_csv(test_data, header=None)
df.columns = ['c1', 'c2', 'c3', 'c4']
print(df)
#     c1   c2   c3   c4
# 0  5.1  NaN  1.4  0.2
# 1  4.9  3.0  1.4  0.2
# 2  4.7  3.2  NaN  0.2
# 3  7.0  3.2  4.7  1.4
# 4  6.4  3.2  4.5  1.5
# 5  6.9  3.1  4.9  NaN
# 6  NaN  NaN  NaN  NaN

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

pandas的讀寫Excel需要依賴xlrd模塊，所以我們需要去安裝一下，命令：pip install xlrd

使用df = pd.read_excel(filename)讀取文件，使用df.to_excel(filename)保存文件。

1、讀取文件導(dǎo)入數(shù)據(jù)

df = pd.read_excel(filename)

讀取文件導(dǎo)入數(shù)據(jù)函數(shù)主要參數(shù)：

sep ：指定分隔符，可用正則表達式如'\s+'
header=None ：指定文件無行名
name ：指定列名
index_col ：指定某列作為索引
skip_row ：指定跳過某些行
na_values ：指定某些字符串表示缺失值
parse_dates ：指定某些列是否被解析為日期，布爾值或列表

2、寫入文件導(dǎo)出數(shù)據(jù)

df.to_excel(filename)

寫入文件函數(shù)的主要參數(shù)：

sep 分隔符
na_rep 指定缺失值轉(zhuǎn)換的字符串，默認(rèn)為空字符串
header=False 不保存列名
index=False 不保存行索引
cols 指定輸出的列，傳入列表

3、實例

import pandas as pd
import numpy as np
df = pd.read_excel("http://pbpython.com/extras/excel-comp-data.xlsx")
print(df.head())
print(len(df.index)) # 行數(shù) （不包含表頭，且一下均如此）
print(df.index.values) # 行索引 
print(len(df.columns)) # 列數(shù)
print(df.columns.values) # 列索引

data = df.loc[0].values #  表示第0行數(shù)據(jù)
data = df.loc[[1, 2]].values #  讀取多行數(shù)據(jù)（這里是第1行和第2行）

data = df.iloc[:, 1].values #  讀第1列數(shù)據(jù)
data = df.iloc[:, [1, 2]].values #  讀取多列數(shù)據(jù)（這里是第1列和第2列）
data = df.iloc[1, 2] #  讀取指定單元格數(shù)據(jù)（這里是第1行第一列數(shù)據(jù)）
data = df.iloc[[1, 2], [1, 2]].values #  讀取多行多列數(shù)據(jù)（第1,2行1,2列的數(shù)據(jù)）

#  任務(wù)：輸出滿足成績大于等于90的數(shù)據(jù)
temp = []
for i in range(len(df.index.values)):
    if df.iloc[i, 3] >= 90:
        temp.append(df.iloc[i].values)
df2 = pd.DataFrame(data=temp, columns=df.columns.values)
writer = pd.ExcelWriter('out_test.xlsx')#  不寫index會輸出索引
df2.to_excel(writer, 'Sheet', index=False)
writer.save()

四、pandas讀取json文件

import pandas as pd

strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000},\
{"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000},\
{"ttery":"min","issue":"20130801-3389","code":"5,9,1,2,9","code1":"298329129","code2":null,"time":1013395346000},\
{"ttery":"min","issue":"20130801-3388","code":"3,8,7,3,3","code1":"298588733","code2":null,"time":1013395286000},\
{"ttery":"min","issue":"20130801-3387","code":"0,8,5,2,7","code1":"298818527","code2":null,"time":1013395226000}]'

df = pd.read_json(strtext, orient='records')
print(df)
#   ttery          issue       code      code1  code2           time
# 0   min  20130801-3391  8,4,5,2,9  297734529    NaN  1013395466000
# 1   min  20130801-3390  7,8,2,1,2  298058212    NaN  1013395406000
# 2   min  20130801-3389  5,9,1,2,9  298329129    NaN  1013395346000
# 3   min  20130801-3388  3,8,7,3,3  298588733    NaN  1013395286000
# 4   min  20130801-3387  0,8,5,2,7  298818527    NaN  1013395226000

df = pd.read_json(strtext, orient='records')
df.to_excel('pandas處理json.xlsx', index=False, columns=["ttery", "issue", "code", "code1", "code2", "time"])

orient參數(shù)的五種形式

orient是表明預(yù)期的json字符串格式。orient的設(shè)置有以下五個值：

1.'split' : dict like {index -> [index], columns -> [columns], data -> [values]}

這種就是有索引，有列字段,和數(shù)據(jù)矩陣構(gòu)成的json格式。key名稱只能是index,columns和data。

s = '{"index":[1,2,3],"columns":["a","b"],"data":[[1,3],[2,8],[3,9]]}'
df = pd.read_json(s, orient='split')
print(df)
#    a  b
# 1  1  3
# 2  2  8
# 3  3  9

2.'records' : list like [{column -> value}, ... , {column -> value}]

這種就是成員為字典的列表。如我今天要處理的json數(shù)據(jù)示例所見。構(gòu)成是列字段為鍵,值為鍵值,每一個字典成員就構(gòu)成了dataframe的一行數(shù)據(jù)。

strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000},\
{"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000}]'

df = pd.read_json(strtext, orient='records')
print(df)
#   ttery          issue       code      code1  code2           time
# # 0   min  20130801-3391  8,4,5,2,9  297734529    NaN  1013395466000
# # 1   min  20130801-3390  7,8,2,1,2  298058212    NaN  1013395406000

3.'index' : dict like {index -> {column -> value}}

以索引為key,以列字段構(gòu)成的字典為鍵值。如：

s = '{"0":{"a":1,"b":2},"1":{"a":9,"b":11}}'
df = pd.read_json(s, orient='index')
print(df)
#    a   b
# 0  1   2
# 1  9  11

4.'columns' : dict like {column -> {index -> value}}

這種處理的就是以列為鍵，對應(yīng)一個值字典的對象。這個字典對象以索引為鍵,以值為鍵值構(gòu)成的json字符串。如下圖所示:

s = '{"a":{"0":1,"1":9},"b":{"0":2,"1":11}}'
df = pd.read_json(s, orient='columns')
print(df)
#    a   b
# 0  1   2
# 1  9  11

5.'values' : just the values array。

values這種我們就很常見了。就是一個嵌套的列表。里面的成員也是列表，2層的。

s = '[["a",1],["b",2]]'
df = pd.read_json(s, orient='values')
print(df)
#    0  1
# 0  a  1
# 1  b  2

五、pandas讀取sql語句

import numpy as np
import pandas as pd
import pymysql


def conn(sql):
    # 連接到mysql數(shù)據(jù)庫
    conn = pymysql.connect(
        host="localhost",
        port=3306,
        user="root",
        passwd="123",
        db="db1",
    )
    try:
        data = pd.read_sql(sql, con=conn)
        return data
    except Exception as e:
        print("SQL is not correct!")
    finally:
        conn.close()


sql = "select * from test1 limit 0, 10"  # sql語句
data = conn(sql)
print(data.columns.tolist())  # 查看字段
print(data)  # 查看數(shù)據(jù)

到此這篇關(guān)于Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫模塊的文章就介紹到這了。希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫模塊

目錄

一、Series數(shù)據(jù)結(jié)構(gòu)

1、Series的創(chuàng)建

2、Series屬性

2、Series缺失數(shù)據(jù)處理

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

1、DataFrame的創(chuàng)建

2、DataFrame屬性

3、DataFrame取值

4、DataFrame值替換

5、處理丟失數(shù)據(jù)

6、合并數(shù)據(jù)

二、讀取CSV文件

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

1、讀取文件導(dǎo)入數(shù)據(jù)

2、寫入文件導(dǎo)出數(shù)據(jù)

3、實例

四、pandas讀取json文件

五、pandas讀取sql語句

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫模塊

目錄

一、Series數(shù)據(jù)結(jié)構(gòu)

1、Series的創(chuàng)建

2、Series屬性

2、Series缺失數(shù)據(jù)處理

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

1、DataFrame的創(chuàng)建

2、DataFrame屬性

3、DataFrame取值

4、DataFrame值替換

5、處理丟失數(shù)據(jù)

6、合并數(shù)據(jù)

二、讀取CSV文件

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

1、讀取文件導(dǎo)入數(shù)據(jù)

2、寫入文件導(dǎo)出數(shù)據(jù)

3、實例

四、pandas讀取json文件

五、pandas讀取sql語句

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

Python中的pandas表格模塊、文件模塊和數(shù)據(jù)庫模塊

一、Series數(shù)據(jù)結(jié)構(gòu)

1、Series的創(chuàng)建

2、Series屬性

2、Series缺失數(shù)據(jù)處理

二、DataFrame數(shù)據(jù)結(jié)構(gòu)

2、DataFrame屬性

3、DataFrame取值

4、DataFrame值替換

5、處理丟失數(shù)據(jù)

三、導(dǎo)入導(dǎo)出數(shù)據(jù)

1、讀取文件導(dǎo)入數(shù)據(jù)

2、寫入文件導(dǎo)出數(shù)據(jù)

五、pandas讀取sql語句