python批量爬取圖片的方法詳解

更新時(shí)間：2023年12月11日 09:17:17 作者：開心就好啦啦啦

這篇文章給大家介紹了如何使用python批量爬取圖片,文中通過代碼示例給大家介紹的非常詳細(xì),對大家的學(xué)習(xí)或工作有一定的幫助,需要的朋友可以參考下

爬蟲步驟

根據(jù)請求url地址獲取網(wǎng)頁源碼，使用requests庫
通過xpath解析源碼獲取需要的數(shù)據(jù)
獲取到數(shù)據(jù)下載到本地

爬取前十頁圖片到本地

根據(jù)頁碼獲取網(wǎng)絡(luò)源碼

def create_request(page):
    if page == 1:
        url = 'https://sc.chinaz.com/tupian/qinglvtupian.html'
    else:
        url = 'https://sc.chinaz.com/tupian/qinglvtupian_'+str(page)+'.html'
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
    }
    request = urllib.request.Request(url,headers=header)
    #獲取網(wǎng)絡(luò)源碼
    response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')
    return content

使用xpath解析網(wǎng)頁

使用xpath需要在chrome中安裝此插件，安裝xpath完成后，按alt+shift+x就會(huì)出現(xiàn)黑框

//img[@class="lazy"]/@alt  #獲取圖片名稱
//img[@class="lazy"]/@data-original  #獲取圖片地址

在這里插入圖片描述

解析網(wǎng)頁并下載圖片

下載圖片、網(wǎng)頁、視頻使用的函數(shù)為urllib.request.urlretrieve()

def down_load(content):
    tree = etree.HTML(content) #解析網(wǎng)頁數(shù)據(jù)   解析本地的html文件  etree.parse('D:/pages/test.html')
    name_list = tree.xpath('//img[@class="lazy"]/@alt')
    # 圖片會(huì)進(jìn)行懶加載
    src_list = tree.xpath('//img[@class="lazy"]/@data-original')
    for i in range(len(name_list)):
        name = name_list[i]
        src = src_list[i]
        url = 'https:'+src
        urllib.request.urlretrieve(url,filename='../loveImg/'+name+'.jpg')#先在當(dāng)前目錄下創(chuàng)建loveImg文件夾

主函數(shù)如下

if __name__ == '__main__':
    start_page = int(input("開始頁"))
    end_page = int(input("結(jié)束頁"))
    for page in range(start_page,end_page+1):
        context = create_request(page)
        down_load(context)

下載的圖片會(huì)在loveImg目錄

在這里插入圖片描述