欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

python爬蟲字體加密的解決

 更新時(shí)間:2023年03月03日 08:33:39   作者:L'y  
本文主要介紹了python爬蟲字體加密的解決,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧

直接點(diǎn) 某8網(wǎng) https://*****.b*b.h*****y*8*.com/

具體網(wǎng)址格式就是這樣的但是為了安全起見,我就這樣打碼了.

拋出問(wèn)題

在這里插入圖片描述

我們看到這個(gè)號(hào)碼是在頁(yè)面上正常顯示的

在這里插入圖片描述

F12 又是這樣就比較麻煩,不能直接獲取.

用requests庫(kù)也是獲取不到正常想要的 源碼的,因?yàn)樽煮w加密了.

在這里插入圖片描述

查看頁(yè)面源代碼又是這樣的.所以就是我們想怎么解密呢.

解決步驟

  • 獲取到真正的源碼
  • 找到對(duì)應(yīng)的字體庫(kù)
  • 進(jìn)行解析操作.

獲取到真正的源碼

為什么用webdriver,因?yàn)?code>requests拿不到真正的源碼.

from selenium import webdriver
# --- 進(jìn)行chrome的配置
options = webdriver.ChromeOptions()

prefs = {"profile.managed_default_content_settings.images": 2}  # 設(shè)置無(wú)圖模式
options.add_experimental_option("prefs", prefs)
options.add_argument("service_args = ['–ignore-ssl-errors = true', '–ssl-protocol = TLSv1']")
options.binary_location = r'C:\Program Files\Google\Chrome\Application\chrome.exe'
# ---- chrome進(jìn)行端口接管調(diào)用
options.add_argument('-incognito')

driver = webdriver.Chrome(options=options)
driver.set_page_load_timeout(5)
# --- 設(shè)置寬和高位置
driver.maximize_window()
# --- 攔截webdriver檢測(cè)代碼
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
                       {"source": """
                  Object.defineProperty(navigator, 'webdriver', {
                  get: () => undefined
                  })
                  """})

找到對(duì)應(yīng)的字體庫(kù)

在這里插入圖片描述

在這里插入圖片描述

這上面進(jìn)行申明了告訴了我們這個(gè)是字體base64,然后就是那下來(lái)然后生成文件.

# 示例
import base64

# 省略了很長(zhǎng)的...
b64_code = 'AAEAAAAKAIAAAwAgT1MvMla19RMAAACsAAAAYGNtYXAGQAPOAAABDAAAAa5nbHlmZrwdwAAAArwAAAakaGVhZBQx4JoAAAlgAAAANmhoZWEFswFxAAAJmAAAACRobXR4DVYBYgAACbwAAAAubG9jYQwQCnYAAAnsAAAAIm1heHAAFABOAAAKEAAAACBuYW1lUuodRwAACjAAAAGecG9zdDHgxUkAAAvQAAAAdAAEAgsBkAAFAAACmQLMAAAAjwKZAswAAAHrADMBCQAAAgAGAwAAAAAAAAAAAAEQAAAAAAAAAAAAAABQZkVkAMAAI4EEAyz/LABcAywA1AAAAAEAAAAAAxgAAAAAACAAAQAAAAQAAAADAAAAJAABAAAAAABcAAMAAQAAACQAAwAKAAABYgAEADgAAAAKAAgAAgACACMAKwAtAC///wAAACMAKgAtAC/e/9j/1//WAAEAAAAAAAAAAAAAAAABBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAgMABAAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAABMAAAAAAAAAAUAAAAjAAAAIwAAAAEAAAAqAAAAKwAAAAIAAAAtAAAALQAAAAQAAAAvAAAALwAAAAUACID7AAiBBAAAAAYAAAACACIAAAEyAqoAAwAHAAA3ESERJzMRIyIBEO7MzAACqv1WIgJmAAAAAgAdAAACIALbABsAHwAAARUjByM3IwcjNyM1MzcjNTM3MwczNzMHMxUjByMzNyMB/4AmSCZrJ0knZnQjdoQkSSVrJkkmYnAitWwkbAEUR83Nzc1HuUjGxsbGSLm5AAAAAQAkAKQB3gI2ABEAABM3FyczBzcXBxcHJxcjNwcnNyQumSJzJZkun58umSRyIZguoAGXZ26mpGpmKClma6anbWYqAAABAEMAkwH6AkoACwAAARUjNSM1MzUzFTMVAUNKtrZKtwFKt7dJt7dJAAAAAAEAGgFCASQBrQADAAATNSEVGgEKAUJrawAAAAABAAD/gwEnAwoAAwAAFycTM0pK30h9AQOGAAAAAgAj//YCGgLmABMAJwAAARQOAiMiLgI1ND4CMzIeAgUUHgIzMj4CNTQuAiMiDgICGhw9X0NGYDwaGjxgR0JfPRz+qAgUJB0cJBUHBxQkHB0kFQgBb1WLYzY2Y4xVVYpiNTVii1VKc08qKk9zSklzTykpT3MAAAAAAQArAAACCgLfACEAADc1MzI+AjURDgMjIi4CNT4DPwEzERQeAjsBFWRUDRMNBhQiIB8PDRUQChAiJiwaSHIFCxUQUgA3Bg8aEwIBGCccDwoUHBEEDBIbEjX9mhAZEQg3AAAAAAEAJAAAAg4C5gArAAABFA4EDwEzMjY/ATMHITU3PgM1NCYjIgYVIi4CNTQ+AjMyHgIB9AsYKDtPM2fvHy0JCD0G/hyYLz0jDiomNCodMCMTHThUODpXPB4CPBgtMDZATjFhJCMf12qaMU5HRSg6NllYCxgnGxwyJhcYLD8AAAAAAQAd//YCDgLmAEQAABciLgI1ND4CMxQeAjMyPgI1NC4CKwE1MzI+AjU0JiMiDgIVIiY1ND4CMzIeAhUUDgIHHgMVFA4C+TpTNhkOGB8SEiEvHBktIxUVKDsnP0MhMSAQKyobIxMHQEUdOVQ4N1c+IRgqOSIfQTUiL01kChQiLRgTHhUKITEhEA4iOiweMSMUQBUoOCE4PxstOR4tLxsvJBQWKz4oIzouIgwFGSo/LD5VNBYAAgAOAAACKQLbABgAIwAAJRUUHgI7ARUhNTMyPgI9ASE1ATMRMxUlNDY3DgMPATMBvw0XHxEN/pkcEh4XDf7lASKPav8AAwQFFhkXBorUvz8YHQ8FNzcFDx0YPz4B3v4nQ/YtaDAMKiwoCeUAAQAp//YCBgLbADoAADcyPgI1NCYjIg4CBycTIRcjJy4DKwEUDgIPAT4DMzIeAhUUDgIjIi4CNTQ2MxQeAuwZLiIVSUMTIBsYCy8gAYQFOwgCBgsQDNUCAgMBCAgZHiIPPGBFJTBNXy85UDIXLSUMGis+ECVAL0xLAwUHAxIBYrojCQ4KBgEQGyISXgMGBAMcNlI3Q1o3GBUiLRgkIxYsIxYAAAACAC7/9gIZAuYALAA8AAABIg4CBz4DMzIeAhUUDgIjIi4CNTQ+AjMyHgIVFA4CIzQuAgMiDgIHFB4CMzI2NTQmAUkeMSMVAwobIysaL0s2HR48WDs5XUMlJEhuSjJFKxMNHS4iBg8bNw4fHBgGEh4pFygtMgKpJEVkQQcNCwcdN04yN1tBJCpWg1lVk20/EyAoFhAdFg0XLyYY/tkIDhIJSWpEIFBZU0wAAAAAAQAtAAACGwLbAAsAADcBISIGDwEjNyEVAakBEf7yHBwDBj4FAen+5QACbBsZNNcy/VcAAAMAH//2Ah4C5gAlADkATQAANzQ+AjcuATU0PgIzMh4CFRQOAgceAxUUDgIjIi4CFzI+AjU0LgInDgMVFB4CEzQuAiMiDgIVFB4CFz4DHxUoOCE9QRg4W0I2UjcbEyQzIC5BKBMkQ2E+QF4+Hf4aKx4QESU4KBEeFQ0RHit6DBgkFxUhFgsOHCkbExsSCLshNSslESNaPCRDNCAbMEInHi8nIRAXLTI2HzFLNBwfNUhiEyIvHBkpIyISCx0jLBseMiMUAgQWKyEUER8qGBsoIBkNCxkgKAAAAAIAJP/2Ag8C5gAoADYAABciLgI1NDY3HgMzMjY3DgMjIi4CNTQ+AjMyHgIVFA4CAzI2NzQuAiMiBhUUFukvQCgRGBoHFR4nGkVKBQwdJS0aLEo1HiA9Vzc3XkUmIUdvHyU1DxEcKBgsMDAKFCAqFhYfBRcoHRGVkw8ZEwobNk80N1tCJChUglpVlG9AAW4lH0JePB1WV0dJAAAAAAEAAAABAACt4Ie1Xw889QALBAAAAAAA2XTOiAAAAADZdM6IAAD/gwIpAwoAAAAIAAIAAAAAAAAAAQAAAyz/LABcAj0AAAAAAikAAQAAAAAAAAAAAAAAAAAAAAcBdgAiAj0AHQICACQCPQBDAT4AGgEnAAACPQAjACsAJAAdAA4AKQAuAC0AHwAkAAAAAAAUAEQAZgB8AIoAmADUAQYBRgGgAdYCKAJ+ApgDBANSAAAAAQAAABAATgADAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAlgABAAAAAAABAA0AAAABAAAAAAACAAYADQABAAAAAAADAA0AEwABAAAAAAAEAA0AIAABAAAAAAAFAB4ALQABAAAAAAAGAA0ASwADAAEECQABABoAWAADAAEECQACAAwAcgADAAEECQADABoAfgADAAEECQAEABoAmAADAAEECQAFADwAsgADAAEECQAGABoA7kxlZVRyZWVzaGFkb3dNZWRpdW1MZWVUcmVlc2hhZG93TGVlVHJlZXNoYWRvd1ZlcnNpb24gMS4wOyBGb250RWRpdG9yICh2MS4wKUxlZVRyZWVzaGFkb3cATABlAGUAVAByAGUAZQBzAGgAYQBkAG8AdwBNAGUAZABpAHUAbQBMAGUAZQBUAHIAZQBlAHMAaABhAGQAbwB3AEwAZQBlAFQAcgBlAGUAcwBoAGEAZABvAHcAVgBlAHIAcwBpAG8AbgAgADEALgAwADsAIABGAG8AbgB0AEUAZABpAHQAbwByACAAKAB2ADEALgAwACkATABlAGUAVAByAGUAZQBzAGgAYQBkAG8AdwAAAAIAAAAAAAAAMgAAAAAAAAAAAAAAAAAAAAAAAAAAABAAEAAAAAYADQAOABAAEgECAQMBBAEFAQYBBwEIAQkBCgELBHplcm8Db25lA3R3bwV0aHJlZQRmb3VyBGZpdmUDc2l4BXNldmVuBWVpZ2h0BG5pbmU='

with open('font.ttf', 'wb') as f:
    f.write(base64.decodebytes(b64_code.encode()))


from fontTools.ttLib import TTFont  # 導(dǎo)包

font = TTFont('font.ttf')
font.saveXML('font.xml')
# 簡(jiǎn)單封裝下
import base64
def w_tff(one_html):
    res_tff = re.findall(r';base64,(.*?)"', one_html, re.S)
    if res_tff and len(res_tff) == 1:
        new_res_ttf = res_tff[0]
        with open('123_new_ttf.ttf', 'wb') as f:
            f.write(base64.decodebytes(new_res_ttf.encode()))

讀取文件找到里面的對(duì)應(yīng)關(guān)系,就是 你這個(gè)數(shù)字的格式 是存儲(chǔ)在.ttf文件里的.

from fontTools.ttLib import TTFont
def get_num_phone(es_str: str):
    # 加載字體生成映射關(guān)系
    path = '123_new_ttf.ttf'
    font = TTFont(path)

    # font.saveXML('font.xml')   # 生成xml文件
    # 得到映射關(guān)系
    bestcmap = font.getBestCmap()

    ss = {}
    for key, value in bestcmap.items():
        keys = hex(key).replace('0x', '').replace("&#x", "")  # 10進(jìn)制轉(zhuǎn)16進(jìn)制
        if value == "zero":
            value = 0
        elif value == "one":
            value = 1
        elif value == "one":
            value = 1
        elif value == "two":
            value = 2
        elif value == "three":
            value = 3
        elif value == "four":
            value = 4
        elif value == "five":
            value = 5
        elif value == "six":
            value = 6
        elif value == "seven":
            value = 7
        elif value == "eight":
            value = 8
        elif value == "nine":
            value = 9
        elif value == "hyphen":
            value = "-"
        ss.update({
            keys: value
        })

    need_re = es_str
    list_phone = ""
    try:
        for item in need_re.split(";"):
            if item:
                new_item = item.replace("&#x", "")
                list_phone += "".join(str(ss[new_item]))
        if not list_phone or len(list_phone) < 2:
            return None
        return list_phone
    except Exception as e:
        return None
<cmap>
    <tableVersion version="0"/>
    <cmap_format_4 platformID="0" platEncID="3" language="0">
      <map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
      <map code="0x2a" name="asterisk"/><!-- ASTERISK -->
      <map code="0x2b" name="plus"/><!-- PLUS SIGN -->
      <map code="0x2d" name="hyphen"/><!-- HYPHEN-MINUS -->
      <map code="0x2f" name="slash"/><!-- SOLIDUS -->
    </cmap_format_4>
    <cmap_format_0 platformID="1" platEncID="0" language="0">
      <map code="0x23" name="numbersign"/>
      <map code="0x2a" name="asterisk"/>
      <map code="0x2b" name="plus"/>
      <map code="0x2d" name="hyphen"/>
      <map code="0x2f" name="slash"/>
    </cmap_format_0>
    <cmap_format_4 platformID="3" platEncID="1" language="0">
      <map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
      <map code="0x2a" name="asterisk"/><!-- ASTERISK -->
      <map code="0x2b" name="plus"/><!-- PLUS SIGN -->
      <map code="0x2d" name="hyphen"/><!-- HYPHEN-MINUS -->
      <map code="0x2f" name="slash"/><!-- SOLIDUS -->
    </cmap_format_4>
    <cmap_format_12 platformID="3" platEncID="10" format="12" reserved="0" length="76" language="0" nGroups="5">
      <map code="0x23" name="numbersign"/><!-- NUMBER SIGN -->
      <map code="0x2a" name="asterisk"/><!-- ASTERISK -->
      <map code="0x2b" name="plus"/><!-- PLUS SIGN -->
      <map code="0x2d" name="hyphen"/><!-- HYPHEN-MINUS -->
      <map code="0x2f" name="slash"/><!-- SOLIDUS -->
      <map code="0x880fb" name="zero"/><!-- ???? -->
      <map code="0x880fc" name="one"/><!-- ???? -->
      <map code="0x880fd" name="two"/><!-- ???? -->
      <map code="0x880fe" name="three"/><!-- ???? -->
      <map code="0x880ff" name="four"/><!-- ???? -->
      <map code="0x88100" name="five"/><!-- ???? -->
      <map code="0x88101" name="six"/><!-- ???? -->
      <map code="0x88102" name="seven"/><!-- ???? -->
      <map code="0x88103" name="eight"/><!-- ???? -->
      <map code="0x88104" name="nine"/><!-- ???? -->
    </cmap_format_12>
  </cmap>

讀取ttf文件,(再生成xml文件,第一次尋找映射關(guān)系是需要做的)

  • font.getBestCmap() 獲取映射關(guān)系表
  • 我們觀察 xml文件的cmap段進(jìn)行研究 ,可以看到我們明確需要的結(jié)果
  • keys = hex(key).replace('0x', '').replace("&#x", "") 10進(jìn)制轉(zhuǎn)16進(jìn)制 ,會(huì)得到映射關(guān)系表 {'23': 'numbersign', '2a': 'asterisk', '2b': 'plus', '2d': '-', '2f': 'slash', '8826e': 0, '8826f': 1, '88270': 2, '88271': 3, '88272': 4, '88273': 5, '88274': 6, '88275': 7, '88276': 8, '88277': 9}
  • 和從頁(yè)面上那些來(lái)的結(jié)果 進(jìn)行 逐個(gè)匹配調(diào)整就行了.

注意的點(diǎn)

webdriver拿下來(lái)的頁(yè)面源碼有可能有點(diǎn)問(wèn)題,所以我用了 soup_text = bs4.BeautifulSoup(driver.page_source, 'lxml').text 的方法來(lái)處理源代碼 (import bs4)

其他的就是一些小細(xì)節(jié)上的問(wèn)題了.基本的思路就是這樣的.

到此這篇關(guān)于python爬蟲字體加密的解決的文章就介紹到這了,更多相關(guān)python爬蟲字體加密內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!

相關(guān)文章

  • python模擬預(yù)測(cè)一下新型冠狀病毒肺炎的數(shù)據(jù)

    python模擬預(yù)測(cè)一下新型冠狀病毒肺炎的數(shù)據(jù)

    這篇文章主要介紹了python模擬預(yù)測(cè)一下新型冠狀病毒肺炎的數(shù)據(jù) ,需要的朋友可以參考下
    2020-02-02
  • Python入門教程(九)Python字符串介紹

    Python入門教程(九)Python字符串介紹

    這篇文章主要介紹了Python入門教程(九)Python字符串,Python是一門非常強(qiáng)大好用的語(yǔ)言,也有著易上手的特性,本文為入門教程,需要的朋友可以參考下
    2023-04-04
  • python 爬蟲 實(shí)現(xiàn)增量去重和定時(shí)爬取實(shí)例

    python 爬蟲 實(shí)現(xiàn)增量去重和定時(shí)爬取實(shí)例

    今天小編就為大家分享一篇python 爬蟲 實(shí)現(xiàn)增量去重和定時(shí)爬取實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧
    2020-02-02
  • PyQt4編程之讓狀態(tài)欄顯示信息的方法

    PyQt4編程之讓狀態(tài)欄顯示信息的方法

    今天小編就為大家分享一篇PyQt4編程之讓狀態(tài)欄顯示信息的方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧
    2019-06-06
  • PyCharm出現(xiàn)卡頓問(wèn)題的解決

    PyCharm出現(xiàn)卡頓問(wèn)題的解決

    這篇文章主要介紹了PyCharm出現(xiàn)卡頓問(wèn)題的解決方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2024-02-02
  • pythotn條件分支與循環(huán)詳解

    pythotn條件分支與循環(huán)詳解

    這篇文章主要介紹了Python條件分支和循環(huán)用法,結(jié)合實(shí)例形式較為詳細(xì)的分析了Python邏輯運(yùn)算操作符,條件分支語(yǔ)句,循環(huán)語(yǔ)句等功能與基本用法,需要的朋友可以參考下
    2021-08-08
  • 淺談Python 中整型對(duì)象的存儲(chǔ)問(wèn)題

    淺談Python 中整型對(duì)象的存儲(chǔ)問(wèn)題

    這篇文章主要介紹了淺談Python 中整型對(duì)象的存儲(chǔ)問(wèn)題的相關(guān)資料,需要的朋友可以參考下
    2016-05-05
  • python使用代理IP爬取貓眼電影專業(yè)評(píng)分?jǐn)?shù)據(jù)

    python使用代理IP爬取貓眼電影專業(yè)評(píng)分?jǐn)?shù)據(jù)

    在編寫爬蟲程序的過(guò)程中,IP封鎖無(wú)疑是一個(gè)常見且棘手的問(wèn)題,盡管網(wǎng)絡(luò)上存在大量的免費(fèi)IP代理網(wǎng)站,但其質(zhì)量往往參差不齊,令人堪憂,本篇文章中介紹一下如何使用Python的Requests庫(kù)和BeautifulSoup庫(kù)來(lái)抓取貓眼電影網(wǎng)站上的專業(yè)評(píng)分?jǐn)?shù)據(jù),需要的朋友可以參考下
    2024-03-03
  • Python實(shí)現(xiàn)詞云圖詞頻統(tǒng)計(jì)

    Python實(shí)現(xiàn)詞云圖詞頻統(tǒng)計(jì)

    這篇文章主要為大家詳細(xì)介紹了Python數(shù)據(jù)分析中的詞頻統(tǒng)計(jì)和詞云圖可視化,文中的示例代碼講解詳細(xì),對(duì)我們學(xué)習(xí)Python有一定的幫助,需要的可以參考一下
    2022-12-12
  • pandas將DataFrame的列變成行索引的方法

    pandas將DataFrame的列變成行索引的方法

    下面小編就為大家分享一篇pandas將DataFrame的列變成行索引的方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧
    2018-04-04

最新評(píng)論