python實(shí)現(xiàn)對(duì)文件進(jìn)行MD5校驗(yàn)
MD5校驗(yàn)(checksum)是通過(guò)對(duì)接收的傳輸數(shù)據(jù)執(zhí)行散列運(yùn)算來(lái)檢查數(shù)據(jù)的正確性。MD5校驗(yàn)可以應(yīng)用多個(gè)領(lǐng)域,比如說(shuō)機(jī)密資料的檢驗(yàn),下載文件的檢驗(yàn),明文密碼的加密等。
本文主要為大家介紹了如何使用python對(duì)文件進(jìn)行MD5校驗(yàn)并比對(duì)文件重復(fù)
下面是實(shí)現(xiàn)代碼,希望對(duì)大家有所幫助
import os import time import hashlib import re from concurrent.futures import ProcessPoolExecutor from functools import partial def generate_md5_for_file(file_path, block_size=4096): # Calculate the MD5 hash for a given file md5_hash = hashlib.md5() with open(file_path, "rb") as f: for byte_block in iter(partial(f.read, block_size), b""): md5_hash.update(byte_block) return file_path, md5_hash.hexdigest() def generate_md5_for_files_parallel(folder_path, block_size=4096): # Generate MD5 hashes for all files in a folder using parallel processing md5_dict = {} with ProcessPoolExecutor() as executor: # Get all file paths in the specified folder file_paths = [os.path.join(root, file) for root, _, files in os.walk(folder_path) for file in files] # Use parallel processing to calculate MD5 hashes for each file results = executor.map(partial(generate_md5_for_file, block_size=block_size), file_paths) # Update the dictionary with the calculated MD5 values md5_dict.update(results) return md5_dict def write_md5_to_file(md5_dict, output_file): # Write MD5 values and file paths to a text file with open(output_file, "w") as f: for file_path, md5_value in md5_dict.items(): f.write(f"{md5_value} {file_path}\n") def check_duplicate_md5(file_path): # Check for duplicate MD5 values in a text file md5_dict = {} with open(file_path, "r") as f: for line in f: line = line.strip() if line: md5_value, file_path = line.split(" ", 1) if md5_value in md5_dict: # Print information about duplicate MD5 values print(f"Duplicate MD5 found: {md5_value}") print(f"Original file: {md5_dict[md5_value]}") print(f"Duplicate file: {file_path}\n") else: md5_dict[md5_value] = file_path def split_and_check_duplicate_part(filename, part_index, seen_parts): # Split a filename using "_" and check for duplicate parts parts = filename.split("_") if len(parts) == 4: selected_part = parts[part_index] if selected_part in seen_parts: # Print information about duplicate parts print(f'Duplicate part found at index {part_index}: {selected_part}') else: seen_parts.add(selected_part) else: # Print information if the filename does not have four parts print(f'File "{filename}" does not have four parts.') def process_folder(folder_path, part_index): # Process all filenames in a folder files = os.listdir(folder_path) seen_parts = set() for filename in files: # Call the split_and_check_duplicate_part function split_and_check_duplicate_part(filename, part_index, seen_parts) def find_max_execution_time(file_path): # Find the maximum execution time from a log file try: with open(file_path, 'r') as file: numbers = [] pattern = re.compile(r'Program execution time: (\d+) microseconds') for line in file: match = pattern.search(line) if match: numbers.append(int(match.group(1))) if not numbers: raise ValueError("No execution time found in the file.") max_number = max(numbers) return max_number except FileNotFoundError: raise FileNotFoundError(f"Error: File '{file_path}' not found.") except Exception as e: raise Exception(f"An error occurred: {e}") if __name__ == "__main__": # Record the start time of the program start_time = time.time() # Set the folder path and log file path folder_path = r"D:/outputFile/bmp" file_path = r"D:/log.txt" try: # Try to find and print the maximum execution time max_execution_time = find_max_execution_time(file_path) print(f"The maximum execution time is: {max_execution_time} microseconds") except Exception as e: # Print an error message if an exception occurs print(e) # Set the index of the part to be compared selected_part_index = 1 # Call the process_folder function to handle filenames process_folder(folder_path, selected_part_index) # Set the MD5 file path and block size MD5_file = "D:/md5sums.txt" block_size = 8192 # Generate MD5 values for files in parallel and write them to a file md5_dict = generate_md5_for_files_parallel(folder_path, block_size=block_size) write_md5_to_file(md5_dict, MD5_file) # Print a message indicating successful MD5 generation print(f"MD5 values generated and saved to {MD5_file}") # Check for duplicate MD5 values in the generated file check_duplicate_md5(MD5_file) # Record the end time of the program end_time = time.time() # Calculate the total execution time in milliseconds execution_time = (end_time - start_time) * 1000 print(f"Function execution time: {execution_time} milliseconds")
到此這篇關(guān)于python實(shí)現(xiàn)對(duì)文件進(jìn)行MD5校驗(yàn)的文章就介紹到這了,更多相關(guān)python文件MD5校驗(yàn)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Python簡(jiǎn)單定義與使用字典dict的方法示例
這篇文章主要介紹了Python簡(jiǎn)單定義與使用字典的方法,結(jié)合簡(jiǎn)單實(shí)例形式分析了Python字典的原理、組成、定義及使用方法,需要的朋友可以參考下2017-07-07Python竟能畫(huà)這么漂亮的花,帥呆了(代碼分享)
這篇文章主要介紹了用Python作圖的一個(gè)簡(jiǎn)單實(shí)例,通過(guò)turtle模塊實(shí)現(xiàn)作圖,具有一定參考價(jià)值,需要的朋友可以了解下。2017-11-11利用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析與可視化的代碼示例
隨著時(shí)間序列數(shù)據(jù)在金融、氣象、生態(tài)等領(lǐng)域的廣泛應(yīng)用,利用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析和可視化已成為重要的技能之一,本文將介紹如何使用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析和可視化,并給出相應(yīng)的代碼示例,需要的朋友可以參考下2023-11-11利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化
這篇文章介紹了利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化,主要是一些Python的第三方庫(kù)來(lái)進(jìn)行社交網(wǎng)絡(luò)的可視化,利用領(lǐng)英(Linkedin)的社交關(guān)系數(shù)據(jù)展開(kāi)介紹,內(nèi)容可當(dāng)學(xué)習(xí)練習(xí)題有一定的參考價(jià)值,需要的小伙伴可以參考一下2022-06-06Python編程實(shí)現(xiàn)二叉樹(shù)及七種遍歷方法詳解
這篇文章主要介紹了Python編程實(shí)現(xiàn)二叉樹(shù)及七種遍歷方法,結(jié)合實(shí)例形式詳細(xì)分析了Python二叉樹(shù)的定義及常用遍歷操作技巧,需要的朋友可以參考下2017-06-06python用for循環(huán)求和的方法總結(jié)
在本篇文章里小編給各位分享了關(guān)于python用for循環(huán)求和的方法以及相關(guān)實(shí)例代碼,需要的朋友們參考學(xué)習(xí)下。2019-07-07