快捷導航

Python爬蟲實現(xiàn)抓取京東店鋪信息及下載圖片功能示例

更新時間：2018年08月07日 08:37:28 作者：1443539042@qq.com

這篇文章主要介紹了Python爬蟲實現(xiàn)抓取京東店鋪信息及下載圖片功能,涉及Python頁面請求、響應(yīng)、解析等相關(guān)操作技巧,需要的朋友可以參考下

本文實例講述了Python爬蟲實現(xiàn)抓取京東店鋪信息及下載圖片功能。分享給大家供大家參考，具體如下：

這個是抓取信息的

from bs4 import BeautifulSoup
import requests
url = 'https://list.tmall.com/search_product.htm?q=%CB%AE%BA%F8+%C9%D5%CB%AE&type=p&vmarket=&spm=875.7931836%2FA.a2227oh.d100&from=mallfp..pc_1_searchbutton'
response = requests.get(url)                          #解析網(wǎng)頁
soup = BeautifulSoup(response.text,'lxml')                   #.text將解析到的網(wǎng)頁可讀
storenames = soup.select('#J_ItemList > div > div > p.productTitle > a')    #選擇出商店的信息
prices = soup.select('#J_ItemList > div > div > p.productPrice > em')     #選擇出價格的信息
sales = soup.select('#J_ItemList > div > div > p.productStatus > span > em')  #選擇出銷售額的信息
for storename, price, sale in zip(storenames,prices,sales):
  storename = storename.get_text().strip()   #用get_text()方法篩選出標簽中的文本信息，由于篩選結(jié)果有換行符\n所以用strip()將換行符去掉
  price = price.get_text()
  sale = sale.get_text()
  print('商店名:%-40s價格:%-40s銷售額:%s'%(storename,price,sale))   #使打印出來的信息規(guī)范
  print('----------------------------------------------------------------------------------------------')

這個是下載圖片的

from bs4 import BeautifulSoup
import requests
import urllib.request
url = 'https://list.tmall.com/search_product.htm?q=%CB%AE%BA%F8+%C9%D5%CB%AE&type=p&vmarket=&spm=875.7931836%2FA.a2227oh.d100&from=mallfp..pc_1_searchbutton'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
imgs = soup.select('#J_ItemList > div > div > div.productImg-wrap > a > img')
a = 1
for i in imgs:
  if(i.get('src')==None):
    break
  img = 'http:'+i.get('src') #這里廢了好長的時間，原來網(wǎng)站必須要有http：的
  #print(img)
  urllib.request.urlretrieve(img,'%s.jpg'%a, None,)
  a = a+1

ps:

1.選擇信息的時候用css

2.用get_text()方法篩選出標簽中的文本信息

3.strip，lstrip，rstrip的用法：

Python中的strip用于去除字符串的首尾字符；同理，lstrip用于去除左邊的字符；rstrip用于去除右邊的字符。

這三個函數(shù)都可傳入一個參數(shù)，指定要去除的首尾字符。

需要注意的是，傳入的是一個字符數(shù)組，編譯器去除兩端所有相應(yīng)的字符，直到?jīng)]有匹配的字符，比如：

theString = 'saaaay yes no yaaaass'
print theString.strip('say')

theString依次被去除首尾在['s'，'a'，'y']數(shù)組內(nèi)的字符，直到字符在不數(shù)組內(nèi)。所以，輸出的結(jié)果為：

yes no

比較簡單吧，lstrip和rstrip原理是一樣的。

注意：當沒有傳入?yún)?shù)時，是默認去除首尾空格和換行符的。

theString = 'saaaay yes no yaaaass'
print theString.strip('say')
print theString.strip('say ') #say后面有空格
print theString.lstrip('say')
print theString.rstrip('say')

運行結(jié)果：