python+pandas分析nginx日志的實(shí)例

更新時(shí)間：2018年04月28日 15:17:24 作者：man8er

下面小編就為大家分享一篇python+pandas分析nginx日志的實(shí)例，具有很好的參考價(jià)值，希望對(duì)大家有所幫助。一起跟隨小編過來看看吧

需求

通過分析nginx訪問日志，獲取每個(gè)接口響應(yīng)時(shí)間最大值、最小值、平均值及訪問量。

實(shí)現(xiàn)原理

將nginx日志uriuriupstream_response_time字段存放到pandas的dataframe中，然后通過分組、數(shù)據(jù)統(tǒng)計(jì)功能實(shí)現(xiàn)。

實(shí)現(xiàn)

1.準(zhǔn)備工作

#創(chuàng)建日志目錄，用于存放日志
mkdir /home/test/python/log/log
#創(chuàng)建文件，用于存放從nginx日志中提取的$uri $upstream_response_time字段
touch /home/test/python/log/log.txt
#安裝相關(guān)模塊
conda create -n science numpy scipy matplotlib pandas
#安裝生成execl表格的相關(guān)模塊
pip install xlwt

2.代碼實(shí)現(xiàn)

#!/usr/local/miniconda2/envs/science/bin/python
#-*- coding: utf-8 -*-
#統(tǒng)計(jì)每個(gè)接口的響應(yīng)時(shí)間
#請(qǐng)?zhí)崆皠?chuàng)建log.txt并設(shè)置logdir
import sys
import os
import pandas as pd
mulu=os.path.dirname(__file__)
#日志文件存放路徑
logdir="/home/test/python/log/log"
#存放統(tǒng)計(jì)所需的日志相關(guān)字段
logfile_format=os.path.join(mulu,"log.txt")
print "read from logfile \n"
for eachfile in os.listdir(logdir):
 logfile=os.path.join(logdir,eachfile)
 with open(logfile, 'r') as fo:
  for line in fo:
   spline=line.split()
   #過濾字段中異常部分
   if spline[6]=="-":
    pass
   elif spline[6]=="GET":
    pass
   elif spline[-1]=="-":
    pass
   else:
    with open(logfile_format, 'a') as fw:
     fw.write(spline[6])
     fw.write('\t')
     fw.write(spline[-1])
     fw.write('\n')
print "output panda"
#將統(tǒng)計(jì)的字段讀入到dataframe中
reader=pd.read_table(logfile_format,sep='\t',engine='python',names=["interface","reponse_time"] ,header=None,iterator=True)
loop=True
chunksize=10000000
chunks=[]
while loop:
 try:
  chunk=reader.get_chunk(chunksize)
  chunks.append(chunk)
 except StopIteration:
  loop=False
  print "Iteration is stopped."
df=pd.concat(chunks)
#df=df.set_index("interface")
#df=df.drop(["GET","-"])
df_groupd=df.groupby('interface')
df_groupd_max=df_groupd.max()
df_groupd_min= df_groupd.min()
df_groupd_mean= df_groupd.mean()
df_groupd_size= df_groupd.size()
#print df_groupd_max
#print df_groupd_min
#print df_groupd_mean
df_ana=pd.concat([df_groupd_max,df_groupd_min,df_groupd_mean,df_groupd_size],axis=1,keys=["max","min","average","count"])
print "output excel"
df_ana.to_excel("test.xls")

3.打印的表格如下：