快捷導(dǎo)航

Python使用Chrome插件實(shí)現(xiàn)爬蟲過程圖解

更新時(shí)間：2020年06月09日 12:07:13 作者：Johnthegreat

這篇文章主要介紹了Python使用Chrome插件實(shí)現(xiàn)爬蟲,文中通過示例代碼介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下

做電商時(shí)，消費(fèi)者對(duì)商品的評(píng)論是很重要的，但是不會(huì)寫代碼怎么辦？這里有個(gè)Chrome插件可以做到簡(jiǎn)單的數(shù)據(jù)爬取，一句代碼都不用寫。下面給大家展示部分抓取后的數(shù)據(jù)：

可以看到，抓取的地址，評(píng)論人，評(píng)論內(nèi)容，時(shí)間，產(chǎn)品顏色都已經(jīng)抓取下來了。那么，爬取這些數(shù)據(jù)需要哪些工具呢？就兩個(gè)：

1. Chrome瀏覽器；

2. 插件：Web Scraper

插件下載地址：https://chromecj.com/productivity/2018-05/942.html

最后，如果你想自己動(dòng)手抓取一下，這里是這次抓取的詳細(xì)過程：

1. 首先，復(fù)制如下的代碼，對(duì)，你不需要寫代碼，但是為了便于上手，復(fù)制代碼還是需要的，后續(xù)可以自己定制和選擇，不需要寫代碼。

{
  "_id": "jdreview",
  "startUrl": [
    "https://item.jd.com/100000680365.html#comment"
  ],
  "selectors": [
    {
      "id": "user",
      "type": "SelectorText",
      "selector": "div.user-info",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "comments",
      "type": "SelectorText",
      "selector": "div.comment-column > p.comment-con",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "time",
      "type": "SelectorText",
      "selector": "div.comment-message:nth-of-type(5) span:nth-of-type(4), div.order-info span:nth-of-type(4)",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": "0"
    },
    {
      "id": "color",
      "type": "SelectorText",
      "selector": "div.order-info span:nth-of-type(1)",
      "parentSelectors": [
        "main"
      ],
      "multiple": false,
      "regex": "",
      "delay": 0
    },
    {
      "id": "main",
      "type": "SelectorElementClick",
      "selector": "div.comment-item",
      "parentSelectors": [
        "_root"
      ],
      "multiple": true,
      "delay": "10000",
      "clickElementSelector": "div.com-table-footer a.ui-pager-next",
      "clickType": "clickMore",
      "discardInitialElements": false,
      "clickElementUniquenessType": "uniqueHTMLText"
    }
  ]
}

2. 然后打開chrome瀏覽器，在任意頁面同時(shí)按下Ctrl+Shift+i，在彈出的窗口中找到Web Scraper，如下：