腳本之家服務(wù)器常用軟件

快捷導(dǎo)航

軟件下載

android MAC 驅(qū)動下載字體下載 DLL

源碼下載

PHP ASP.NET ASP JSP

軟件編程

C# JAVA C 語言 Delphi Android

網(wǎng)絡(luò)編程

PHP ASP.NET ASP JavaScript

在線工具

CSS格式化 JS格式化 Html轉(zhuǎn)化為Js

數(shù)據(jù)庫

MYSQL MSSQL oracle DB2 MARIADB

CMS

PHPCMS DEDECMS 帝國CMS WordPress

常用工具

PHP開發(fā)工具 python Photoshop 必備軟件

使用Scrapy框架爬取網(wǎng)頁并保存到Mysql的實現(xiàn)

更新時間：2022年07月07日 10:17:50 作者：鄙人阿彬

本文主要介紹了使用Scrapy框架爬取網(wǎng)頁并保存到Mysql的實現(xiàn)，文中通過示例代碼介紹的非常詳細，對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧

大家好，這一期阿彬給大家分享Scrapy爬蟲框架與本地Mysql的使用。今天阿彬爬取的網(wǎng)頁是虎撲體育網(wǎng)。

（1）打開虎撲體育網(wǎng)，分析一下網(wǎng)頁的數(shù)據(jù)，使用xpath定位元素。

（2）在第一部分析網(wǎng)頁之后就開始創(chuàng)建一個scrapy爬蟲工程，在終端執(zhí)行以下命令：
“scrapy startproject huty（注：‘hpty’是爬蟲項目名稱）”,得到了下圖所示的工程包：

（3）進入到“hpty/hpty/spiders”目錄下創(chuàng)建一個爬蟲文件叫‘“sww”，在終端執(zhí)行以下命令： “scrapy genspider sww” （4）在前兩步做好之后，對整個爬蟲工程相關(guān)的爬蟲文件進行編輯。 1、setting文件的編輯：

把君子協(xié)議原本是True改為False。

再把這行原本被注釋掉的代碼把它打開。

2、對item文件進行編輯，這個文件是用來定義數(shù)據(jù)類型，代碼如下：

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html
 
import scrapy
 
 
class HptyItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
 
    球員 = scrapy.Field()
    球隊 = scrapy.Field()
    排名 = scrapy.Field()
    場均得分 = scrapy.Field()
    命中率 = scrapy.Field()
    三分命中率 = scrapy.Field()
    罰球命中率 = scrapy.Field()

3、對最重要的爬蟲文件進行編輯（即“hpty”文件），代碼如下：

import scrapy
from ..items import HptyItem
 
 
class SwwSpider(scrapy.Spider):
    name = 'sww'
    allowed_domains = ['https://nba.hupu.com/stats/players']
    start_urls = ['https://nba.hupu.com/stats/players']
 
    def parse(self, response):
        whh = response.xpath('//tbody/tr[not(@class)]')
        for i in whh:
            排名 = i.xpath(
                './td[1]/text()').extract()# 排名
            球員 = i.xpath(
                './td[2]/a/text()').extract()  # 球員
            球隊 = i.xpath(
                './td[3]/a/text()').extract()  # 球隊
            場均得分 = i.xpath(
                './td[4]/text()').extract()  # 得分
 
            命中率 = i.xpath(
                './td[6]/text()').extract()  # 命中率
            三分命中率 = i.xpath(
                './td[8]/text()').extract()  # 三分命中率
            罰球命中率 = i.xpath(
                './td[10]/text()').extract()  # 罰球命中率
 
            data = HptyItem(球員=球員, 球隊=球隊, 排名=排名, 場均得分=場均得分, 命中率=命中率, 三分命中率=三分命中率, 罰球命中率=罰球命中率)
            yield data

4、對pipelines文件進行編輯，代碼如下：

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 
 
# useful for handling different item types with a single interface
from cursor import cursor
from itemadapter import ItemAdapter
import pymysql
 
 
class HptyPipeline:
    def process_item(self, item, spider):
        db = pymysql.connect(host="Localhost", user="root", passwd="root", db="sww", charset="utf8")
        cursor = db.cursor()
        球員 = item["球員"][0]
        球隊 = item["球隊"][0]
        排名 = item["排名"][0]
        場均得分 = item["場均得分"][0]
        命中率 = item["命中率"]
        三分命中率 = item["三分命中率"][0]
        罰球命中率 = item["罰球命中率"][0]
        # 三分命中率 = item["三分命中率"][0].strip('%')
        # 罰球命中率 = item["罰球命中率"][0].strip('%')
 
        cursor.execute(
            'INSERT INTO nba(球員,球隊,排名,場均得分,命中率,三分命中率,罰球命中率) VALUES (%s,%s,%s,%s,%s,%s,%s)',
            (球員, 球隊, 排名, 場均得分, 命中率, 三分命中率, 罰球命中率)
        )
        # 對事務(wù)操作進行提交
        db.commit()
        # 關(guān)閉游標
        cursor.close()
        db.close()
        return item

（5）在scrapy框架設(shè)計好了之后，先到mysql創(chuàng)建一個名為“sww”的數(shù)據(jù)庫，在該數(shù)據(jù)庫下創(chuàng)建名為“nba”的數(shù)據(jù)表，代碼如下： 1、創(chuàng)建數(shù)據(jù)庫

create database sww;

2、創(chuàng)建數(shù)據(jù)表

create table nba (球員 char(20),球隊 char(10),排名 char(10),場均得分 char(25),命中率 char(20),三分命中率 char(20),罰球命中率 char(20));

3、通過創(chuàng)建數(shù)據(jù)庫和數(shù)據(jù)表可以看到該表的結(jié)構(gòu)：

（6）在mysql創(chuàng)建數(shù)據(jù)表之后，再次回到終端，輸入如下命令：“scrapy crawl sww”，得到的結(jié)果

到此這篇關(guān)于使用Scrapy框架爬取網(wǎng)頁并保存到Mysql的實現(xiàn)的文章就介紹到這了,更多相關(guān)Scrapy爬取網(wǎng)頁并保存內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

使用Scrapy框架爬取網(wǎng)頁并保存到Mysql的實現(xiàn)

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具