欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python實(shí)現(xiàn)對(duì)文件進(jìn)行MD5校驗(yàn)

 更新時(shí)間:2024年01月26日 08:55:08   作者:XXYBMOOO  
這篇文章主要為大家詳細(xì)介紹了如何使用python對(duì)文件進(jìn)行MD5校驗(yàn)并比對(duì)文件重復(fù),文中的示例代碼講解詳細(xì),感興趣的小伙伴可以跟隨小編一起學(xué)習(xí)一下

MD5校驗(yàn)(checksum)是通過(guò)對(duì)接收的傳輸數(shù)據(jù)執(zhí)行散列運(yùn)算來(lái)檢查數(shù)據(jù)的正確性。MD5校驗(yàn)可以應(yīng)用多個(gè)領(lǐng)域,比如說(shuō)機(jī)密資料的檢驗(yàn),下載文件的檢驗(yàn),明文密碼的加密等。

本文主要為大家介紹了如何使用python對(duì)文件進(jìn)行MD5校驗(yàn)并比對(duì)文件重復(fù)

下面是實(shí)現(xiàn)代碼,希望對(duì)大家有所幫助

import os
import time
import hashlib
import re
from concurrent.futures import ProcessPoolExecutor
from functools import partial
 
def generate_md5_for_file(file_path, block_size=4096):
    # Calculate the MD5 hash for a given file
    md5_hash = hashlib.md5()
    with open(file_path, "rb") as f:
        for byte_block in iter(partial(f.read, block_size), b""):
            md5_hash.update(byte_block)
    return file_path, md5_hash.hexdigest()
 
def generate_md5_for_files_parallel(folder_path, block_size=4096):
    # Generate MD5 hashes for all files in a folder using parallel processing
    md5_dict = {}
    with ProcessPoolExecutor() as executor:
        # Get all file paths in the specified folder
        file_paths = [os.path.join(root, file) for root, _, files in os.walk(folder_path) for file in files]
        # Use parallel processing to calculate MD5 hashes for each file
        results = executor.map(partial(generate_md5_for_file, block_size=block_size), file_paths)
 
    # Update the dictionary with the calculated MD5 values
    md5_dict.update(results)
    return md5_dict
 
def write_md5_to_file(md5_dict, output_file):
    # Write MD5 values and file paths to a text file
    with open(output_file, "w") as f:
        for file_path, md5_value in md5_dict.items():
            f.write(f"{md5_value}  {file_path}\n")
 
def check_duplicate_md5(file_path):
    # Check for duplicate MD5 values in a text file
    md5_dict = {}
    with open(file_path, "r") as f:
        for line in f:
            line = line.strip()
            if line:
                md5_value, file_path = line.split(" ", 1)
                if md5_value in md5_dict:
                    # Print information about duplicate MD5 values
                    print(f"Duplicate MD5 found: {md5_value}")
                    print(f"Original file: {md5_dict[md5_value]}")
                    print(f"Duplicate file: {file_path}\n")
                else:
                    md5_dict[md5_value] = file_path
 
def split_and_check_duplicate_part(filename, part_index, seen_parts):
    # Split a filename using "_" and check for duplicate parts
    parts = filename.split("_")
    if len(parts) == 4:
        selected_part = parts[part_index]
        if selected_part in seen_parts:
            # Print information about duplicate parts
            print(f'Duplicate part found at index {part_index}: {selected_part}')
        else:
            seen_parts.add(selected_part)
    else:
        # Print information if the filename does not have four parts
        print(f'File "{filename}" does not have four parts.')
 
def process_folder(folder_path, part_index):
    # Process all filenames in a folder
    files = os.listdir(folder_path)
    seen_parts = set()
    for filename in files:
        # Call the split_and_check_duplicate_part function
        split_and_check_duplicate_part(filename, part_index, seen_parts)
 
def find_max_execution_time(file_path):
    # Find the maximum execution time from a log file
    try:
        with open(file_path, 'r') as file:
            numbers = []
            pattern = re.compile(r'Program execution time: (\d+) microseconds')
            for line in file:
                match = pattern.search(line)
                if match:
                    numbers.append(int(match.group(1)))
            if not numbers:
                raise ValueError("No execution time found in the file.")
            max_number = max(numbers)
            return max_number
    except FileNotFoundError:
        raise FileNotFoundError(f"Error: File '{file_path}' not found.")
    except Exception as e:
        raise Exception(f"An error occurred: {e}")
 
if __name__ == "__main__":
    # Record the start time of the program
    start_time = time.time()
 
    # Set the folder path and log file path
    folder_path = r"D:/outputFile/bmp"
    file_path = r"D:/log.txt"
 
    try:
        # Try to find and print the maximum execution time
        max_execution_time = find_max_execution_time(file_path)
        print(f"The maximum execution time is: {max_execution_time} microseconds")
    except Exception as e:
        # Print an error message if an exception occurs
        print(e)
 
    # Set the index of the part to be compared
    selected_part_index = 1
 
    # Call the process_folder function to handle filenames
    process_folder(folder_path, selected_part_index)
 
    # Set the MD5 file path and block size
    MD5_file = "D:/md5sums.txt"
    block_size = 8192
 
    # Generate MD5 values for files in parallel and write them to a file
    md5_dict = generate_md5_for_files_parallel(folder_path, block_size=block_size)
    write_md5_to_file(md5_dict, MD5_file)
 
    # Print a message indicating successful MD5 generation
    print(f"MD5 values generated and saved to {MD5_file}")
 
    # Check for duplicate MD5 values in the generated file
    check_duplicate_md5(MD5_file)
 
    # Record the end time of the program
    end_time = time.time()
 
    # Calculate the total execution time in milliseconds
    execution_time = (end_time - start_time) * 1000
    print(f"Function execution time: {execution_time} milliseconds")

到此這篇關(guān)于python實(shí)現(xiàn)對(duì)文件進(jìn)行MD5校驗(yàn)的文章就介紹到這了,更多相關(guān)python文件MD5校驗(yàn)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!

相關(guān)文章

  • Python簡(jiǎn)單定義與使用字典dict的方法示例

    Python簡(jiǎn)單定義與使用字典dict的方法示例

    這篇文章主要介紹了Python簡(jiǎn)單定義與使用字典的方法,結(jié)合簡(jiǎn)單實(shí)例形式分析了Python字典的原理、組成、定義及使用方法,需要的朋友可以參考下
    2017-07-07
  • Python竟能畫(huà)這么漂亮的花,帥呆了(代碼分享)

    Python竟能畫(huà)這么漂亮的花,帥呆了(代碼分享)

    這篇文章主要介紹了用Python作圖的一個(gè)簡(jiǎn)單實(shí)例,通過(guò)turtle模塊實(shí)現(xiàn)作圖,具有一定參考價(jià)值,需要的朋友可以了解下。
    2017-11-11
  • 利用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析與可視化的代碼示例

    利用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析與可視化的代碼示例

    隨著時(shí)間序列數(shù)據(jù)在金融、氣象、生態(tài)等領(lǐng)域的廣泛應(yīng)用,利用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析和可視化已成為重要的技能之一,本文將介紹如何使用Python進(jìn)行時(shí)間序列數(shù)據(jù)分析和可視化,并給出相應(yīng)的代碼示例,需要的朋友可以參考下
    2023-11-11
  • 利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化

    利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化

    這篇文章介紹了利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化,主要是一些Python的第三方庫(kù)來(lái)進(jìn)行社交網(wǎng)絡(luò)的可視化,利用領(lǐng)英(Linkedin)的社交關(guān)系數(shù)據(jù)展開(kāi)介紹,內(nèi)容可當(dāng)學(xué)習(xí)練習(xí)題有一定的參考價(jià)值,需要的小伙伴可以參考一下
    2022-06-06
  • Python編程實(shí)現(xiàn)二叉樹(shù)及七種遍歷方法詳解

    Python編程實(shí)現(xiàn)二叉樹(shù)及七種遍歷方法詳解

    這篇文章主要介紹了Python編程實(shí)現(xiàn)二叉樹(shù)及七種遍歷方法,結(jié)合實(shí)例形式詳細(xì)分析了Python二叉樹(shù)的定義及常用遍歷操作技巧,需要的朋友可以參考下
    2017-06-06
  • python用for循環(huán)求和的方法總結(jié)

    python用for循環(huán)求和的方法總結(jié)

    在本篇文章里小編給各位分享了關(guān)于python用for循環(huán)求和的方法以及相關(guān)實(shí)例代碼,需要的朋友們參考學(xué)習(xí)下。
    2019-07-07
  • Python中schedule擴(kuò)展的具體使用

    Python中schedule擴(kuò)展的具體使用

    Python的schedule模塊是一個(gè)輕量級(jí)的Python庫(kù),用于在指定時(shí)間執(zhí)行某些操作,本文就來(lái)介紹一下Python中schedule擴(kuò)展的具體使用,感興趣的可以了解一下
    2024-12-12
  • pytorch中Parameter函數(shù)用法示例

    pytorch中Parameter函數(shù)用法示例

    這篇文章主要為大家介紹了pytorch中Parameter函數(shù)用法,并用詳細(xì)的代碼示例進(jìn)行演示詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助
    2022-01-01
  • python模擬表單提交登錄圖書(shū)館

    python模擬表單提交登錄圖書(shū)館

    這篇文章主要為大家詳細(xì)介紹了python模擬表單提交登錄圖書(shū)館的實(shí)現(xiàn)方法,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下
    2018-04-04
  • python+rsync精確同步指定格式文件

    python+rsync精確同步指定格式文件

    這篇文章主要為大家詳細(xì)介紹了python+rsync精確同步指定格式文件,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下
    2019-08-08

最新評(píng)論