欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python手機(jī)號(hào)前7位歸屬地爬蟲(chóng)代碼實(shí)例

 更新時(shí)間:2020年03月31日 10:26:21   作者:wanli001  
這篇文章主要介紹了python手機(jī)號(hào)前7位歸屬地爬蟲(chóng)代碼實(shí)例,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下

需求分析

項(xiàng)目上需要用到手機(jī)號(hào)前7位,判斷號(hào)碼是否合法,還有歸屬地查詢(xún)。舊的數(shù)據(jù)是幾年前了太久了,打算用python爬蟲(chóng)重新爬一份

單線程版本

# coding:utf-8
import requests
from datetime import datetime


class PhoneInfoSpider:
  def __init__(self, phoneSections):
    self.phoneSections = phoneSections

  def phoneInfoHandler(self, textData):
    text = textData.splitlines(True)
    # print("text length:" + str(len(text)))

    if len(text) >= 9:
      number = text[1].split('\'')[1]
      province = text[2].split('\'')[1]
      mobile_area = text[3].split('\'')[1]
      postcode = text[5].split('\'')[1]
      line = "number:" + number + ",province:" + province + ",mobile_area:" + mobile_area + ",postcode:" + postcode
      line_text = number + "," + province + "," + mobile_area + "," + postcode
      print(line_text)
      # print("province:" + province)

      try:
        f = open('./result.txt', 'a')
        f.write(str(line_text) + '\n')
      except Exception as e:
        print(Exception, ":", e)

  def requestPhoneInfo(self, phoneNum):
    try:
      url = 'https://tcc.taobao.com/cc/json/mobile_tel_segment.htm?tel=' + phoneNum
      response = requests.get(url)
      self.phoneInfoHandler(response.text)
    except Exception as e:
      print(Exception, ":", e)

  def requestAllSections(self):
    # last用于接上次異常退出前的號(hào)碼
    last = 0
    # last = 4
    # 自動(dòng)生成手機(jī)號(hào)碼,后四位補(bǔ)0
    for head in self.phoneSections:
      head_begin = datetime.now()
      print(head + " begin time:" + str(head_begin))

      # for i in range(last, 10000):
      for i in range(last, 10):
        middle = str(i).zfill(4)
        phoneNum = head + middle + "0000"
        self.requestPhoneInfo(phoneNum)
      last = 0

      head_end = datetime.now()
      print(head + " end time:" + str(head_end))


if __name__ == '__main__':
  task_begin = datetime.now()
  print("phone check begin time:" + str(task_begin))

  # 電信,聯(lián)通,移動(dòng),虛擬運(yùn)營(yíng)商
  dx = ['133', '149', '153', '173', '177', '180', '181', '189', '199']
  lt = ['130', '131', '132', '145', '146', '155', '156', '166', '171', '175', '176', '185', '186', '166']
  yd = ['134', '135', '136', '137', '138', '139', '147', '148', '150', '151', '152', '157', '158', '159', '172',
     '178', '182', '183', '184', '187', '188', '198']
  add = ['170']
  all_num = dx + lt + yd + add

  # print(all_num)
  print(len(all_num))

  # 要爬的號(hào)碼段
  spider = PhoneInfoSpider(all_num)
  spider.requestAllSections()

  task_end = datetime.now()
  print("phone check end time:" + str(task_end))

發(fā)現(xiàn)爬取一個(gè)號(hào)段,共10000次查詢(xún),單線程版大概要多1個(gè)半小時(shí),太慢了。

多線程版本

# coding:utf-8
import requests
from datetime import datetime
import queue
import threading

threadNum = 32


class MyThread(threading.Thread):
  def __init__(self, func):
    threading.Thread.__init__(self)
    self.func = func

  def run(self):
    self.func()


def requestPhoneInfo():
  global lock
  while True:
    lock.acquire()
    if q.qsize() != 0:
      print("queue size:" + str(q.qsize()))
      p = q.get() # 獲得任務(wù)
      lock.release()

      middle = str(9999 - q.qsize()).zfill(4)
      phoneNum = phone_head + middle + "0000"
      print("phoneNum:" + phoneNum)

      try:
        url = 'https://tcc.taobao.com/cc/json/mobile_tel_segment.htm?tel=' + phoneNum
        # print(url)
        response = requests.get(url)
        # print(response.text)
        phoneInfoHandler(response.text)
      except Exception as e:
        print(Exception, ":", e)
    else:
      lock.release()
      break


def phoneInfoHandler(textData):
  text = textData.splitlines(True)

  if len(text) >= 9:
    number = text[1].split('\'')[1]
    province = text[2].split('\'')[1]
    mobile_area = text[3].split('\'')[1]
    postcode = text[5].split('\'')[1]
    line = "number:" + number + ",province:" + province + ",mobile_area:" + mobile_area + ",postcode:" + postcode
    line_text = number + "," + province + "," + mobile_area + "," + postcode
    print(line_text)
    # print("province:" + province)

    try:
      f = open('./result.txt', 'a')
      f.write(str(line_text) + '\n')
    except Exception as e:
      print(Exception, ":", e)


if __name__ == '__main__':
  task_begin = datetime.now()
  print("phone check begin time:" + str(task_begin))

  dx = ['133', '149', '153', '173', '177', '180', '181', '189', '199']
  lt = ['130', '131', '132', '145', '155', '156', '166', '171', '175', '176', '185', '186', '166']
  yd = ['134', '135', '136', '137', '138', '139', '147', '150', '151', '152', '157', '158', '159', '172', '178',
     '182', '183', '184', '187', '188', '198']
  all_num = dx + lt + yd
  print(len(all_num))

  for head in all_num:
    head_begin = datetime.now()
    print(head + " begin time:" + str(head_begin))

    q = queue.Queue()
    threads = []
    lock = threading.Lock()

    for p in range(10000):
      q.put(p + 1)

    print(q.qsize())

    for i in range(threadNum):
      middle = str(i).zfill(4)
      global phone_head
      phone_head = head

      thread = MyThread(requestPhoneInfo)
      thread.start()
      threads.append(thread)
    for thread in threads:
      thread.join()

    head_end = datetime.now()
    print(head + " end time:" + str(head_end))

  task_end = datetime.now()
  print("phone check end time:" + str(task_end))

多線程版的1個(gè)號(hào)碼段1000條數(shù)據(jù),大概2,3min就好,cpu使用飆升,大概維持在70%左右。

總共40多個(gè)號(hào)段,爬完大概1,2個(gè)小時(shí),總數(shù)據(jù)41w左右

以上就是本文的全部?jī)?nèi)容,希望對(duì)大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。

相關(guān)文章

  • Pytest中skip skipif跳過(guò)用例詳解

    Pytest中skip skipif跳過(guò)用例詳解

    今天給大家?guī)?lái)的是關(guān)于Python的相關(guān)知識(shí),文章圍繞著Pytest中skip skipif跳過(guò)用例展開(kāi),文中有非常詳細(xì)的介紹及代碼示例,需要的朋友可以參考下
    2021-06-06
  • 利用python繪制帶有時(shí)間線的柱狀圖

    利用python繪制帶有時(shí)間線的柱狀圖

    這篇文章主要為大家詳細(xì)介紹了如何使用python繪制出帶有時(shí)間線的柱狀圖,文中的示例代碼講解的非常詳細(xì),具有一定的學(xué)習(xí)與借鑒價(jià)值,需要的可以參考一下
    2023-07-07
  • Python實(shí)現(xiàn)繪制凸包的示例代碼

    Python實(shí)現(xiàn)繪制凸包的示例代碼

    凸包(Convex Hull)是一個(gè)計(jì)算幾何(圖形學(xué))中的概念。這篇文章主要為大家詳細(xì)介紹了Python繪制凸包的示例代碼,感興趣的小伙伴可以了解一下
    2023-05-05
  • Python 專(zhuān)題六 局部變量、全局變量global、導(dǎo)入模塊變量

    Python 專(zhuān)題六 局部變量、全局變量global、導(dǎo)入模塊變量

    本文主要講述python全局變量、局部變量和導(dǎo)入模塊變量的方法。具有很好的參考價(jià)值,下面跟著小編一起來(lái)看下吧
    2017-03-03
  • python格式的Caffe圖片數(shù)據(jù)均值計(jì)算學(xué)習(xí)

    python格式的Caffe圖片數(shù)據(jù)均值計(jì)算學(xué)習(xí)

    這篇文章主要為大家介紹了python格式的Caffe圖片數(shù)據(jù)均值計(jì)算學(xué)習(xí)示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪
    2022-06-06
  • Python爬蟲(chóng)框架NewSpaper使用詳解

    Python爬蟲(chóng)框架NewSpaper使用詳解

    這篇文章主要為大家介紹了Python爬蟲(chóng)框架NewSpaper使用詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪
    2022-08-08
  • pywinauto自動(dòng)化操作記事本

    pywinauto自動(dòng)化操作記事本

    這篇文章主要為大家詳細(xì)介紹了pywinauto自動(dòng)化操作記事本,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下
    2019-08-08
  • Python中Django 后臺(tái)自定義表單控件

    Python中Django 后臺(tái)自定義表單控件

    本篇文章主要介紹了Python中Django 后臺(tái)自定義表單控件,其實(shí) django 已經(jīng)為我們提供了一些可用的表單控件,比如:多選框、單選按鈕等,有興趣的開(kāi)業(yè)了解一下。
    2017-03-03
  • Pytorch GPU顯存充足卻顯示out of memory的解決方式

    Pytorch GPU顯存充足卻顯示out of memory的解決方式

    今天小編就為大家分享一篇Pytorch GPU顯存充足卻顯示out of memory的解決方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧
    2020-01-01
  • 在PyCharm中高效使用遠(yuǎn)程文件編輯功能的實(shí)現(xiàn)

    在PyCharm中高效使用遠(yuǎn)程文件編輯功能的實(shí)現(xiàn)

    PyCharm作為業(yè)界領(lǐng)先的集成開(kāi)發(fā)環(huán)境(IDE),提供了強(qiáng)大的本地和遠(yuǎn)程開(kāi)發(fā)功能,本文詳細(xì)介紹了如何在PyCharm中使用遠(yuǎn)程文件編輯功能,希望能夠幫助你提高遠(yuǎn)程開(kāi)發(fā)的效率和體驗(yàn)
    2024-08-08

最新評(píng)論