快捷導(dǎo)航

Python爬蟲獲取全網(wǎng)招聘數(shù)據(jù)實(shí)現(xiàn)可視化分析示例詳解

更新時間：2023年07月19日 11:01:05 作者：輕松學(xué)Python

這篇文章主要介紹了Python爬蟲獲取全網(wǎng)招聘數(shù)據(jù)實(shí)現(xiàn)可視化分析示例詳解,實(shí)現(xiàn)采集一下最新的qcwu招聘數(shù)據(jù),本文列舉了部分代碼以及實(shí)現(xiàn)思路,需要的朋友可以參考下

準(zhǔn)備工作

軟件工具

先來看看需要準(zhǔn)備啥

環(huán)境使用

Python 3.8
Pycharm

模塊使用

# 第三方模塊 需要安裝的
requests  >>> pip install requests
csv

實(shí)現(xiàn)爬蟲基本流程

一、數(shù)據(jù)來源分析: 思路固定

1.明確需求: - 明確采集網(wǎng)站以及數(shù)據(jù)內(nèi)容

網(wǎng)址: 51job
內(nèi)容: 招聘信息

2.通過開發(fā)者工具, 進(jìn)行抓包分析, 分析具體數(shù)據(jù)來源

打開開發(fā)者工具: F12 / 右鍵點(diǎn)擊檢查選擇network
刷新網(wǎng)頁, 讓數(shù)據(jù)內(nèi)容重新加載一遍
通過搜索<搜索你要的數(shù)據(jù)>去找數(shù)據(jù)具體位置
招聘信息數(shù)據(jù)包: https://we.***.com/api/job/search-pc?api_key=51job×tamp=1688645783&keyword=python&searchType=2&function=&industry=&jobArea=010000%2C020000%2C030200%2C040000%2C090200&jobArea2=&landmark=&metro=&salary=&workYear=°ree=&companyType=&companySize=&jobType=&issueDate=&sortType=0&pageNum=1&requestId=&pageSize=20&source=1&accountId=&pageCode=sou%7Csou%7Csoulb

二、代碼實(shí)現(xiàn)步驟: 步驟固定

發(fā)送請求, 模擬瀏覽器對于url地址發(fā)送請求
請求鏈接: 招聘信息數(shù)據(jù)包url
獲取數(shù)據(jù), 獲取服務(wù)器返回響應(yīng)數(shù)據(jù) <所有的數(shù)據(jù)>
開發(fā)者工具: response
解析數(shù)據(jù), 提取我們想要的數(shù)據(jù)內(nèi)容
招聘基本信息
保存數(shù)據(jù), 把信息數(shù)據(jù)保存表格文件里面

代碼解析

模塊

# 導(dǎo)入數(shù)據(jù)請求模塊
import requests
# 導(dǎo)入格式化輸出模塊
from pprint import pprint
# 導(dǎo)入csv
import csv

發(fā)送請求, 模擬瀏覽器對于url地址發(fā)送請求

headers = {
    'Cookie': 'guid=54b7a6c4c43a33111912f2b5ac6699e2; sajssdk_2015_cross_new_user=1; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2254b7a6c4c43a33111912f2b5ac6699e2%22%2C%22first_id%22%3A%221892b08f9d11c8-09728ce3464dad8-26031d51-3686400-1892b08f9d211e7%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfY29va2llX2lkIjoiMTg5MmIwOGY5ZDExYzgtMDk3MjhjZTM0NjRkYWQ4LTI2MDMxZDUxLTM2ODY0MDAtMTg5MmIwOGY5ZDIxMWU3IiwiJGlkZW50aXR5X2xvZ2luX2lkIjoiNTRiN2E2YzRjNDNhMzMxMTE5MTJmMmI1YWM2Njk5ZTIifQ%3D%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%24identity_login_id%22%2C%22value%22%3A%2254b7a6c4c43a33111912f2b5ac6699e2%22%7D%2C%22%24device_id%22%3A%221892b08f9d11c8-09728ce3464dad8-26031d51-3686400-1892b08f9d211e7%22%7D; nsearch=jobarea%3D%26%7C%26ord_field%3D%26%7C%26recentSearch0%3D%26%7C%26recentSearch1%3D%26%7C%26recentSearch2%3D%26%7C%26recentSearch3%3D%26%7C%26recentSearch4%3D%26%7C%26collapse_expansion%3D; search=jobarea%7E%60010000%2C020000%2C030200%2C040000%2C090200%7C%21recentSearch0%7E%60010000%2C020000%2C030200%2C040000%2C090200%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA00%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA0%A1%FB%A1%FApython%A1%FB%A1%FA2%A1%FB%A1%FA1%7C%21; privacy=1688644161; Hm_lvt_1370a11171bd6f2d9b1fe98951541941=1688644162; Hm_lpvt_1370a11171bd6f2d9b1fe98951541941=1688644162; JSESSIONID=BA027715BD408799648B89C132AE93BF; acw_tc=ac11000116886495592254609e00df047e220754059e92f8a06d43bc419f21; ssxmod_itna=Qqmx0Q0=K7qeqD5itDXDnBAtKeRjbDce3=e8i=Ax0vTYPGzDAxn40iDtrrkxhziBemeLtE3Yqq6j7rEwPeoiG23pAjix0aDbqGkPA0G4GG0xBYDQxAYDGDDPDocPD1D3qDkD7h6CMy1qGWDm4kDWPDYxDrjOKDRxi7DDvQkx07DQ5kQQGxjpBF=FHpu=i+tBDkD7ypDlaYj9Om6/fxMp7Ev3B3Ix0kl40Oya5s1aoDUlFsBoYPe723tT2NiirY6QiebnnDsAhWC5xyVBDxi74qTZbKAjtDirGn8YD===; ssxmod_itna2=Qqmx0Q0=K7qeqD5itDXDnBAtKeRjbDce3=e8i=DnIfwqxDstKhDL0iWMKV3Ekpun3DwODKGcDYIxxD==; acw_sc__v2=64a6bf58f0b7feda5038718459a3b1e625849fa8',
    'Referer': 'https://we.51job.com/pc/search?jobArea=010000,020000,030200,040000,090200&keyword=python&searchType=2&sortType=0&metro=',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
}
# 請求鏈接
url = 'https://we.***.com/api/job/search-pc'
# 請求參數(shù)
data = {
    'api_key': '51job',
    'timestamp': '*****',
    'keyword': '****',
    'searchType': '2',
    'function': '',
    'industry': '',
    'jobArea': '010000,020000,030200,040000,090200',
    'jobArea2': '',
    'landmark': '',
    'metro': '',
    'salary': '',
    'workYear': '',
    'degree': '',
    'companyType': '',
    'companySize': '',
    'jobType': '',
    'issueDate': '',
    'sortType': '0',
    'pageNum': '1',
    'requestId': '',
    'pageSize': '20',
    'source': '1',
    'accountId': '',
    'pageCode': 'sou|sou|soulb',
}
# 發(fā)送請求
response = requests.get(url=url, params=data, headers=headers)

獲取數(shù)據(jù)

獲取服務(wù)器返回響應(yīng)數(shù)據(jù) <所有的數(shù)據(jù)>

開發(fā)者工具: response

- response.json() 獲取響應(yīng)json數(shù)據(jù)

解析數(shù)據(jù)

提取我們想要的數(shù)據(jù)內(nèi)容

for循環(huán)遍歷

for index in response.json()['resultbody']['job']['items']:
    # index 具體崗位信息 --> 字典
    dit = {
        '職位': index['jobName'],
        '公司': index['fullCompanyName'],
        '薪資': index['provideSalaryString'],
        '城市': index['jobAreaString'],
        '經(jīng)驗(yàn)': index['workYearString'],
        '學(xué)歷': index['degreeString'],
        '公司性質(zhì)': index['companyTypeString'],
        '公司規(guī)模': index['companySizeString'],
        '職位詳情頁': index['jobHref'],
        '公司詳情頁': index['companyHref'],
    }

以字典方式進(jìn)行數(shù)據(jù)保存

csv_writer.writerow(dit)
print(dit)

保存表格

f = open('python.csv', mode='w', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
    '職位',
    '公司',
    '薪資',
    '城市',
    '經(jīng)驗(yàn)',
    '學(xué)歷',
    '公司性質(zhì)',
    '公司規(guī)模',
    '職位詳情頁',
    '公司詳情頁',
])
csv_writer.writeheader()

可視化部分

import pandas as pd
df = pd.read_csv('data.csv')
df.head()
df['學(xué)歷'] = df['學(xué)歷'].fillna('不限學(xué)歷')
edu_type = df['學(xué)歷'].value_counts().index.to_list()
edu_num = df['學(xué)歷'].value_counts().to_list()
from pyecharts import options as opts
from pyecharts.charts import Pie
from pyecharts.faker import Faker
from pyecharts.globals import CurrentConfig, NotebookType
CurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_LAB
c = (
    Pie()
    .add(
        "",
        [
            list(z)
            for z in zip(edu_type,edu_num)
        ],
        center=["40%", "50%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Python學(xué)歷要求"),
        legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),
    )
    .set_series_opts(label_opts=opts.LabelOpts(formatter=": {c}"))
)
c.load_javascript()
c.render_notebook()
df['城市'] = df['城市'].str.split('·').str[0]
city_type = df['城市'].value_counts().index.to_list()
city_num = df['城市'].value_counts().to_list()
c = (
    Pie()
    .add(
        "",
        [
            list(z)
            for z in zip(city_type,city_num)
        ],
        center=["40%", "50%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Python招聘城市分布"),
        legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),
    )
    .set_series_opts(label_opts=opts.LabelOpts(formatter=": {c}"))
)
c.render_notebook()
def LowMoney(i):
    if '萬' in i:
        low = i.split('-')[0]
        if '千' in low:
            low_num = low.replace('千', '')
            low_money = int(float(low_num) * 1000)
        else:
            low_money = int(float(low) * 10000)
    else:
        low = i.split('-')[0]
        if '元/天' in low:
            low_num = low.replace('元/天', '')
            low_money = int(low_num) * 30
        else:
            low_money = int(float(low) * 1000)
    return low_money
df['最低薪資'] = df['薪資'].apply(LowMoney)
def MaxMoney(j):
    Max = j.split('-')[-1].split('·')[0]
    if '萬' in Max and '萬/年' not in Max:
        max_num = int(float(Max.replace('萬', '')) * 10000)
    elif '千' in Max:
        max_num = int(float(Max.replace('千', '')) * 1000)
    elif '元/天' in Max:
        max_num = int(Max.replace('元/天', ''))  * 30
    else:
        max_num = int((int(Max.replace('萬/年', ''))  * 10000) / 12)
    return max_num
df['最高薪資'] = df['薪資'].apply(MaxMoney)
def tranform_price(x):
    if x <= 5000.0:
        return '0~5000元'
    elif x <= 8000.0:
        return '5001~8000元'
    elif x <= 15000.0:
        return '8001~15000元'
    elif x <= 25000.0:
        return '15001~25000元'
    else:
        return '25000以上'
df['最低薪資分級'] = df['最低薪資'].apply(lambda x:tranform_price(x))
price_1 = df['最低薪資分級'].value_counts()
datas_pair_1 = [(i, int(j)) for i, j in zip(price_1.index, price_1.values)]
df['最高薪資分級'] = df['最高薪資'].apply(lambda x:tranform_price(x))
price_2 = df['最高薪資分級'].value_counts()
datas_pair_2 = [(i, int(j)) for i, j in zip(price_2.index, price_2.values)]
pie1 = (
    Pie(init_opts=opts.InitOpts(theme='dark',width='1000px',height='600px'))
    .add('', datas_pair_1, radius=['35%', '60%'])
    .set_series_opts(label_opts=opts.LabelOpts(formatter=":vvxyksv9kd%"))
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title="Python工作薪資\n\n最低薪資區(qū)間", 
            pos_left='center', 
            pos_top='center',
            title_textstyle_opts=opts.TextStyleOpts(
                color='#F0F8FF', 
                font_size=20, 
                font_weight='bold'
            ),
        )
    )
    .set_colors(['#EF9050', '#3B7BA9', '#6FB27C', '#FFAF34', '#D8BFD8', '#00BFFF', '#7FFFAA'])
)
pie1.render_notebook()
pie1 = (
    Pie(init_opts=opts.InitOpts(theme='dark',width='1000px',height='600px'))
    .add('', datas_pair_2, radius=['35%', '60%'])
    .set_series_opts(label_opts=opts.LabelOpts(formatter=":vvxyksv9kd%"))
    .set_global_opts(
        title_opts=opts.TitleOpts(
            title="Python工作薪資\n\n最高薪資區(qū)間", 
            pos_left='center', 
            pos_top='center',
            title_textstyle_opts=opts.TextStyleOpts(
                color='#F0F8FF', 
                font_size=20, 
                font_weight='bold'
            ),
        )
    )
    .set_colors(['#EF9050', '#3B7BA9', '#6FB27C', '#FFAF34', '#D8BFD8', '#00BFFF', '#7FFFAA'])
)
pie1.render_notebook() 
exp_type = df['經(jīng)驗(yàn)'].value_counts().index.to_list()
exp_num = df['經(jīng)驗(yàn)'].value_counts().to_list()
c = (
    Pie()
    .add(
        "",
        [
            list(z)
            for z in zip(exp_type,exp_num)
        ],
        center=["40%", "50%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Python招聘經(jīng)驗(yàn)要求"),
        legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),
    )
    .set_series_opts(label_opts=opts.LabelOpts(formatter=": {c}"))
)
c.render_notebook()
# 按城市分組并計算平均薪資
avg_salary = df.groupby('城市')['最低薪資'].mean()
CityType = avg_salary.index.tolist()
CityNum = [int(a) for a in avg_salary.values.tolist()]
avg_salary_1 = df.groupby('城市')['最高薪資'].mean()
CityType_1 = avg_salary_1.index.tolist()
CityNum_1 = [int(a) for a in avg_salary_1.values.tolist()]
from pyecharts.charts import Bar
# 創(chuàng)建柱狀圖實(shí)例
c = (
    Bar()
    .add_xaxis(CityType)
    .add_yaxis("", CityNum)
    .set_global_opts(
        title_opts=opts.TitleOpts(title="各大城市Python低平均薪資"),
        visualmap_opts=opts.VisualMapOpts(
            dimension=1,
            pos_right="5%",
            max_=30,
            is_inverse=True,
        ),
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=45))  # 設(shè)置X軸標(biāo)簽旋轉(zhuǎn)角度為45度
    )
    .set_series_opts(
        label_opts=opts.LabelOpts(is_show=False),
        markline_opts=opts.MarkLineOpts(
            data=[
                opts.MarkLineItem(type_="min", name="最小值"),
                opts.MarkLineItem(type_="max", name="最大值"),
                opts.MarkLineItem(type_="average", name="平均值"),
            ]
        ),
    )
)
c.render_notebook()
# 創(chuàng)建柱狀圖實(shí)例
c = (
    Bar()
    .add_xaxis(CityType_1)
    .add_yaxis("", CityNum_1)
    .set_global_opts(
        title_opts=opts.TitleOpts(title="各大城市Python高平均薪資"),
        visualmap_opts=opts.VisualMapOpts(
            dimension=1,
            pos_right="5%",
            max_=30,
            is_inverse=True,
        ),
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=45))  # 設(shè)置X軸標(biāo)簽旋轉(zhuǎn)角度為45度
    )
    .set_series_opts(
        label_opts=opts.LabelOpts(is_show=False),
        markline_opts=opts.MarkLineOpts(
            data=[
                opts.MarkLineItem(type_="min", name="最小值"),
                opts.MarkLineItem(type_="max", name="最大值"),
                opts.MarkLineItem(type_="average", name="平均值"),
            ]
        ),
    )
)
c.render_notebook()
### 結(jié)論:
    1. 學(xué)歷要求基本大專以上
    2. 薪資待遇: 8000-25000 左右
    3. 北上廣 薪資偏高一些
### 如何簡單實(shí)現(xiàn)可視化分析
    1. 通過爬蟲采集完整的數(shù)據(jù)內(nèi)容 --> 表格 / 數(shù)據(jù)庫
    2. 讀取文件內(nèi)容
    3. 統(tǒng)計每個類目的數(shù)據(jù)情況
    4. 通過可視化模塊: <使用官方文檔提供代碼模板去實(shí)現(xiàn)>
import pandas as pd
# 讀取數(shù)據(jù)
df = pd.read_csv('data.csv')
# 顯示前五行數(shù)據(jù)
df.head()
c_type = df['公司性質(zhì)'].value_counts().index.to_list() # 統(tǒng)計數(shù)據(jù)類目
c_num = df['公司性質(zhì)'].value_counts().to_list() # 統(tǒng)計數(shù)據(jù)個數(shù)
c_type
from pyecharts.charts import Bar # 導(dǎo)入pyecharts里面柱狀圖
from pyecharts.faker import Faker # 導(dǎo)入隨機(jī)生成數(shù)據(jù)
from pyecharts.globals import ThemeType # 主題設(shè)置
c = (
    Bar({"theme": ThemeType.MACARONS}) # 主題設(shè)置
    .add_xaxis(c_type)  # x軸數(shù)據(jù)
    .add_yaxis("", c_num) # Y軸數(shù)據(jù)
    .set_global_opts(
        # 標(biāo)題顯示
        title_opts={"text": "Python招聘企業(yè)公司性質(zhì)分布", "subtext": "民營', '已上市', '外資（非歐美）', '合資', '國企', '外資（歐美）', '事業(yè)單位'"}
    )
    # 保存html文件
#     .render("bar_base_dict_config.html")
)
# print(Faker.choose()) # ['小米', '三星', '華為', '蘋果', '魅族', 'VIVO', 'OPPO'] 數(shù)據(jù)類目
# print(Faker.values()) # [38, 54, 20, 85, 71, 22, 38] 數(shù)據(jù)個數(shù)
c.render_notebook() # 直接顯示在jupyter上面

到此這篇關(guān)于Python爬蟲獲取全網(wǎng)招聘數(shù)據(jù)實(shí)現(xiàn)可視化分析示例詳解的文章就介紹到這了,更多相關(guān)Python獲取全網(wǎng)招聘數(shù)據(jù)內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: