快捷導(dǎo)航

Python使用CRC32實(shí)現(xiàn)校驗(yàn)文件

更新時間：2023年10月30日 08:41:34 作者：lyshark

CRC文件校驗(yàn)是一種用于驗(yàn)證文件完整性的方法,通過計(jì)算文件的CRC值并與預(yù)先計(jì)算的CRC校驗(yàn)值進(jìn)行比較,來判斷文件是否發(fā)生變化,本文我們就來介紹一下Python如何利用CRC32實(shí)現(xiàn)校驗(yàn)文件吧

CRC文件校驗(yàn)是一種用于驗(yàn)證文件完整性的方法，通過計(jì)算文件的CRC值并與預(yù)先計(jì)算的CRC校驗(yàn)值進(jìn)行比較，來判斷文件是否發(fā)生變化，此類功能可以用于驗(yàn)證一個目錄中是否有文件發(fā)生變化，如果發(fā)生變化則我們可以將變化打印輸出，該功能可用于實(shí)現(xiàn)對特定目錄的驗(yàn)證。

首先實(shí)現(xiàn)文件與目錄的遍歷功能，遞歸輸出文件或目錄，在Python中有兩種實(shí)現(xiàn)方式，我們可以通過自帶的os.walk函數(shù)實(shí)現(xiàn)，也可以使用os.listdir實(shí)現(xiàn)，這里筆者依次封裝兩個函數(shù)，函數(shù)ordinary_all_file使用第一種方式，函數(shù)recursion_all_file使用第二種，這兩種方式都返回_file列表，讀者可使用列表接收輸出數(shù)據(jù)集。

import os,hashlib,time,datetime
from zlib import crc32
import argparse

# 遞歸版遍歷所有文件和目錄
def recursion_all_file(rootdir):
    _file = []
    root = os.listdir(rootdir)
    for item in range(0,len(root)):
           path = os.path.join(rootdir,root[item])
           if os.path.isdir(path):
              _file.extend(recursion_all_file(path))
           if os.path.isfile(path):
              _file.append(path)

    for item in range(0,len(_file)):
        _file[item] = _file[item].replace("\\","/")
    return _file

# 通過自帶OS庫中的函數(shù)實(shí)現(xiàn)的目錄遍歷
def ordinary_all_file(rootdir):
    _file = []
    for root, dirs, files in os.walk(rootdir, topdown=False):
        for name in files:
            _file.append(os.path.join(root, name))
        for name in dirs:
            _file.append(os.path.join(root, name))
            
        for item in range(0,len(_file)):
            _file[item] = _file[item].replace("\\","/")
    return _file

針對計(jì)算方法此處也提供兩種，第一種Calculation_md5sum使用hashlib模塊內(nèi)的md5()方法計(jì)算特定文件的MD5特征，第二種Calculation_crc32則使用zlib庫中的crc32方法計(jì)算特定文件的CRC32值，如下所示。

# 通過hashlib模塊讀取文件并計(jì)算MD5值
def Calculation_md5sum(filename):
    try:
        fp = open(filename, 'rb')
        md5 = hashlib.md5()
        while True:
            temp = fp.read(8096)
            if not temp:
                break
            md5.update(temp)
        fp.close()
        return (md5.hexdigest())
    except Exception:
        return 0

# 計(jì)算目標(biāo)CRC32
def Calculation_crc32(filename):
    try:
        with open(filename,"rb") as fp:
            crc = crc32(fp.read())
            while True:
                temp = fp.read(8196)
                if not temp:
                    break
            return crc
    except Exception:
        return 0
    return 0

在主函數(shù)中，我們通過argparse解析庫傳入?yún)?shù)，并分別實(shí)現(xiàn)三個功能，其中使用dump功能可以保存特定目錄內(nèi)文件的hash值到dump.json文件中，其次check功能可用于根據(jù)dump.json中的內(nèi)容檢查文件是否被改動過，最后的set則可用于批量設(shè)置文件的時間戳，這三類功能都屬于較為常用的。

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode",dest="mode",help="指定需要使用的方法名稱,(set/dump/check)")
    parser.add_argument("-d","--dir",dest="dir",help="指定一個需要遍歷的文件目錄(非文件)")
    parser.add_argument("-f","--files",dest="files",help="指定一個本地快照文件,或轉(zhuǎn)儲的文件名稱")
    parser.add_argument("-t","--time",dest="time",help="指定需要統(tǒng)一修改的文件時間")
    args = parser.parse_args()

    # 保存快照: main.py --mode=dump -d "D:/lyshark" -f dump.json
    if args.mode == "dump" and args.dir and args.files:
        file = recursion_all_file(args.dir)
        fp = open(args.files,"w+")
        for item in file:
            Single = []
            Single.append(Calculation_crc32(item))
            Single.append(item)
            fp.write(str(Single) + "\n")
            print("[+] CRC: {} ---> 路徑: {}".format(Single[0],Single[1]))
        fp.close()

    # 檢查文件完整性: main.py --mode=check -d "D:/lyshark" -f dump.json
    elif args.mode == "check" and args.dir and args.files:
        fp = open(args.files,"r")
        for item in fp.readlines():
            _list = eval(item)

            # 取出json文件里的目錄進(jìn)行MD5計(jì)算
            _md5 = Calculation_crc32(_list[1])
            # 如果該文件的md5與數(shù)據(jù)庫中的記錄不一致,說明被修改了
            if _list[0] != _md5 and _md5 != 0:
                print("[-] 異常文件: {}".format(_list[1]))
            elif _md5 == 0:
                print("[x] 文件丟失: {}".format(_list[1]))

    # 設(shè)置文件修改時間: main.py --mode=set -d "D:/lyshark" -t "2019-01-01 11:22:30"
    elif args.mode == "set" and args.dir and args.time:
        _list = ordinary_all_file(args.dir)
        _time = int(time.mktime(time.strptime(args.time,"%Y-%m-%d %H:%M:%S")))
        for item in _list:
            os.utime(item,(_time, _time))
            print("[+] 時間戳: {} ---> 路徑: {}".format(str(_time),item))
    else:
        parser.print_help()

指定mode模式為dump用于實(shí)現(xiàn)將特定文件計(jì)算CRC特征，并將該特征保存至dump.json文件內(nèi)，如下圖所示；

指定mode模式為check并指定轉(zhuǎn)存之前的dump.json文件，則可用于驗(yàn)證當(dāng)前目錄下是否存在異常文件，如果文件特征值發(fā)生了變化則會提示異常文件，而如果文件被刪除或被重命名則會輸出文件丟失，如下圖所示；

指定mode模式為set則可實(shí)現(xiàn)對特定目錄內(nèi)特定文件修改時間參數(shù)，例如將d://lyshark目錄內(nèi)的文件全部重置時間戳為2019-01-01 11:22:30則可執(zhí)行如下命令，執(zhí)行后讀者可自行觀察文件時間變化，如下圖所示；

文件與目錄遍歷功能，不僅可以用于對文件的特征掃描，還可以與fopen等函數(shù)實(shí)現(xiàn)對特定文件內(nèi)特定內(nèi)容的掃描，如下是一段實(shí)現(xiàn)對文件內(nèi)特定目錄的關(guān)鍵字掃描，運(yùn)行后讀者通過傳入需要掃描的路徑，掃描的關(guān)鍵字，以及需要掃描文件類型即可。

import os,re
import argparse

def spider(script_path,script_type):
    final_files = []
    for root, dirs, files in os.walk(script_path, topdown=False):
            for fi in files:
                dfile = os.path.join(root, fi)
                if dfile.endswith(script_type):
                    final_files.append(dfile.replace("\\","/"))
    print("[+] 共找到了 {} 個 {} 文件".format(len(final_files),script_type))
    return final_files

def scanner(files_list,func):
    for item in files_list:
        fp = open(item, "r",encoding="utf-8")
        data = fp.readlines()
        for line in data:
            Code_line = data.index(line) + 1
            Now_code = line.strip("\n")
            #for unsafe in ["system", "insert", "include", "eval","select \*"]:
            for unsafe in [func]:
                flag = re.findall(unsafe, Now_code)
                if len(flag) != 0:
                    print("函數(shù): {} ---> 函數(shù)所在行: {} ---> 路徑: {} " .\
                          format(flag,Code_line,item))

if __name__ == "__main__":
    # 使用方式: main.py -p "D://lyshark" -w eval -t .php
    parser = argparse.ArgumentParser()
    parser.add_argument("-p","--path",dest="path",help="設(shè)置掃描路徑")
    parser.add_argument("-w","--word",dest="func",help="設(shè)置檢索的關(guān)鍵字")
    parser.add_argument("-t","--type",dest="type",default=".php",help="設(shè)置掃描文件類型,默認(rèn)php")
    args = parser.parse_args()
    if args.path and args.func:
        ret = spider(args.path, args.type)
        scanner(ret, args.func)
    else:
        parser.print_help()

如下圖所示，我們通過傳入d://lyshark以及關(guān)鍵字gumbo_normalized_tagname并設(shè)置掃描后綴類型*.c當(dāng)程序運(yùn)行后，即可輸出該目錄下所有符合條件的文件，并輸出函數(shù)所在行，這有利于我們快速跳轉(zhuǎn)并分析數(shù)據(jù)。