快捷導(dǎo)航

Python3實(shí)現(xiàn)Web網(wǎng)頁圖片下載

更新時(shí)間：2016年01月28日 14:41:29 投稿：lijiao

這篇文章主要介紹了Python3通過request.urlopen實(shí)現(xiàn)Web網(wǎng)頁圖片下載，感興趣的小伙伴們可以參考一下

先來介紹一些python web編程基礎(chǔ)知識(shí)

1. GET與POST區(qū)別

1)POST是被設(shè)計(jì)用來向web服務(wù)器上放東西的，而GET是被設(shè)計(jì)用來從服務(wù)器取東西的，GET也能夠向服務(wù)器傳送較少的數(shù)據(jù)，而Get之所以也能傳送數(shù)據(jù),只是用來設(shè)計(jì)告訴服務(wù)器,你到底需要什么樣的數(shù)據(jù).POST的信息作為HTTP 請(qǐng)求的內(nèi)容，而GET是在HTTP 頭部傳輸?shù)模?/p>

2)POST與GET在HTTP 中傳送的方式不同，GET的參數(shù)是在HTTP 的頭部傳送的，而Post的數(shù)據(jù)則是在HTTP 請(qǐng)求的內(nèi)容里傳送;

3)POST傳輸數(shù)據(jù)時(shí)，不需要在URL中顯示出來，而GET方法要在URL中顯示；

4)GET方法由于受到URL長(zhǎng)度的限制,只能傳遞大約1024字節(jié)；POST傳輸?shù)臄?shù)據(jù)量大，可以達(dá)到2M

2. Cookies技術(shù)

Cookies現(xiàn)在經(jīng)常被大家提到，那么到底什么是Cookies，它有什么作用呢？

Cookies是一種能夠讓網(wǎng)站服務(wù)器把少量數(shù)據(jù)儲(chǔ)存到客戶端的硬盤或內(nèi)存，或是從客戶端的硬盤讀取數(shù)據(jù)的一種技術(shù)。Cookies是當(dāng)你瀏覽某網(wǎng)站時(shí)，由Web服務(wù)器置于你硬盤上的一個(gè)非常小的文本文件，它可以記錄你的用戶ID、密碼、瀏覽過的網(wǎng)頁、停留的時(shí)間等信息。

當(dāng)你再次來到該網(wǎng)站時(shí)，網(wǎng)站通過讀取Cookies，得知你的相關(guān)信息，就可以做出相應(yīng)的動(dòng)作，如在頁面顯示歡迎你的標(biāo)語，或者讓你不用輸入ID、密碼就直接登錄等等。

從本質(zhì)上講，它可以看作是你的身份證。但Cookies不能作為代碼執(zhí)行，也不會(huì)傳送病毒，且為你所專有，并只能由提供它的服務(wù)器來讀取。

保存的信息片斷以“名/值”對(duì)(name-value pairs)的形式儲(chǔ)存，一個(gè)“名/值”對(duì)僅僅是一條命名的數(shù)據(jù)。

一個(gè)網(wǎng)站只能取得它放在你的電腦中的信息，它無法從其它的Cookies文件中取得信息，也無法得到你的電腦上的其它任何東西。

Cookies中的內(nèi)容大多數(shù)經(jīng)過了加密處理，因此一般用戶看來只是一些毫無意義的字母數(shù)字組合，只有服務(wù)器的CGI處理程序才知道它們真正的含義。

Python3通過Web網(wǎng)頁圖片下載基本功能點(diǎn)

要實(shí)現(xiàn)的主要功能點(diǎn):
解析網(wǎng)頁中的圖片鏈接
對(duì)圖片鏈接進(jìn)行檢測(cè)，如果圖片格式圖片大小不符合要求，則不下載
加入異常處理機(jī)制
自動(dòng)文件名提取，從圖片鏈接直接提取文件名

Python3通過Web網(wǎng)頁圖片下載參考代碼:

from urllib import request
import threading
from time import sleep,ctime
from html import parser
def downjpg( filepath,FileName ="default.jpg" ):
 try:
  web = request.urlopen( filepath)
  print("訪問網(wǎng)絡(luò)文件"+filepath+"\n")
  jpg = web.read()
  DstDir="E:\\image\\"
  print("保存文件"+DstDir+FileName+"\n")
  try:
   File = open( DstDir+FileName,"wb" )
   File.write( jpg)
   File.close()
   return
  except IOError:
   print("error\n")
   return
 except Exception:
  print("error\n")
  return
def downjpgmutithread( filepathlist ):
 print("共有%d個(gè)文件需要下載"%len(filepathlist))
 for file in filepathlist:
  print( file )
 print("開始多線程下載")
 task_threads=[] #存儲(chǔ)線程
 count=1
 for file in filepathlist:
  t= threading.Thread( target=downjpg,args=(file,"%d.jpg"%count) )
  count=count+1
  task_threads.append(t)
 for task in task_threads:
  task.start()
 for task in task_threads:
  task.join() #等待所有線程結(jié)束
 print("線程結(jié)束")
class parserLinks( parser.HTMLParser):
 filelist=[]
 def handle_starttag(self,tag,attrs):
  if tag == 'img':
   for name,value in attrs:
    if name == 'src':
     print( value)
     self.filelist.append(value)
     #print( self.get_starttag_text() )
 def getfilelist(self):
  return self.filelist
def main(WebUrl):
 #globals flist
 if __name__ == "__main__":
  lparser = parserLinks()
  web = request.urlopen( WebUrl )
  #context= web.read()
  for context in web.readlines():
   _str="%s"%context
   try:
    lparser.feed( _str)
   except parser.HTMLParseError:
    #print( "parser error")
    pass
  web.close()
  imagelist= lparser.getfilelist()
  downjpgmutithread( imagelist)  
  #downjpgmutithread( flist)
#WebUrl="http://www.baidu.com/" #要抓去的網(wǎng)頁鏈接,默認(rèn)保存到e盤
WebUrl="http://hi.baidu.com/yuyinxuezi/item/df0b59140a06be27f6625cd4"
main(WebUrl)

以上就是Python3實(shí)現(xiàn)Web網(wǎng)頁圖片下載的相關(guān)介紹，希望對(duì)大家的學(xué)習(xí)有所幫助。

您可能感興趣的文章: