快捷導(dǎo)航

JavaScript 轉(zhuǎn)義字符JSON parse錯(cuò)誤研究

更新時(shí)間：2022年10月26日 14:31:02 作者：justjavac

這篇文章主要為大家介紹了JavaScript 轉(zhuǎn)義字符JSON parse錯(cuò)誤研究，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

JSON 字符串轉(zhuǎn)換為 JavaScript 對(duì)象

JSON.parse 將一個(gè) JSON 字符串轉(zhuǎn)換為 JavaScript 對(duì)象。

JSON.parse('{"hello":"\world"}')

以上代碼輸出：

{
hello: "world"
}

是一個(gè) JavaScript 對(duì)象，但是仔細(xì)觀察會(huì)發(fā)現(xiàn)，"\world" 變成了 "world"。

那么我們繼續(xù)運(yùn)行如下代碼：

JSON.parse('{"hello":"\\world"}')

出拋出異常：

VM376:1 Uncaught SyntaxError: Unexpected token w in JSON at position 11
at JSON.parse (<anonymous>)
at <anonymous>:1:6

Unexpected token w。

好奇心不死，繼續(xù)試，3 個(gè)反斜杠：

JSON.parse('{"hello":"\\\world"}')

結(jié)果是：

VM16590:1 Uncaught SyntaxError: Unexpected token w in JSON at position 11
at JSON.parse (<anonymous>)
at <anonymous>:1:6

繼續(xù)，4 個(gè)反斜杠：

JSON.parse('{"hello":"\\\\world"}')

結(jié)果正常:

{
hello: "\world"
}

1個(gè)，"world"
2個(gè)，Error
3個(gè)，Error
4個(gè)，"\world"
5個(gè)，"\world"
6個(gè)，Error
7個(gè)，Error
8個(gè)，"\\world"
。。。

我們換個(gè)思路，把 JSON.parse 去掉，只輸出 JavaScript 字符串：

> 'hello'
"hello"
> '\hello'
"hello"
> '\\hello'
"\hello"
> '\\\hello'
"\hello"
> '\\\\hello'
"\\hello"

問(wèn)題大概找到了。

把上面的規(guī)則帶入到之前的 JSON.parse 代碼，問(wèn)題就解決了。

我們看看 JSON 的字符串解析規(guī)則：

根據(jù)這個(gè)規(guī)則，我們解析一下 "\hello"，第 1 個(gè)字符是反斜杠（\），所以在引號(hào)后面走最下面的分支（紅線標(biāo)注）：

第 2 個(gè)字符是 h，但是反斜杠后面只有 9 條路，這個(gè)不屬于任何一條路，所以這個(gè)是個(gè)非法字符。

不只是 JSON，在很多語(yǔ)言中都會(huì)拋出類似 Error:(7, 27) Illegal escape: '\h' 的錯(cuò)誤。

但是不知道為什么 JavaScript 偏偏可以解析這個(gè)非法轉(zhuǎn)義字符，而解決方式也很暴力：直接忽略。

在 es 規(guī)范我沒(méi)有找到具體的章節(jié)。去看看 V8 是怎么解析的吧。

引擎讀取 JavaScript 源碼后首先進(jìn)行詞法分析，文件 /src/parsing/scanner.cc 的功能是讀取源碼并解析（當(dāng)前最新版 6.4.286）。

找到 Scanner::Scan() 函數(shù)關(guān)鍵代碼：

case '"':
case '\'':
  token = ScanString();
break;

是一個(gè)很長(zhǎng)的 switch 語(yǔ)句：如果遇到雙引號(hào)(")、單引號(hào)(')則調(diào)用 ScanString() 函數(shù)。

簡(jiǎn)單解釋下：以上代碼是 C++ 代碼，在 C++ 中單引號(hào)是字符，雙引號(hào)是字符串。所以表示字符時(shí)，雙引號(hào)不需要轉(zhuǎn)義，但是單引號(hào)需要轉(zhuǎn)義；而表示字符串時(shí)，正好相反。此處的 C++ 轉(zhuǎn)義并不是我們今天要研究的轉(zhuǎn)義。

ScanString() 函數(shù)

在 ScanString() 函數(shù)中我們也只看重點(diǎn)代碼：

while (c0_ != quote && c0_ != kEndOfInput && !IsLineTerminator(c0_)) {
  uc32 c = c0_;
  Advance();
  if (c == '\\') {
    if (c0_ == kEndOfInput || !ScanEscape<false, false>()) {
      return Token::ILLEGAL;
    }
  } else {
    AddLiteralChar(c);
  }
}
if (c0_ != quote) return Token::ILLEGAL;
literal.Complete();

如果已經(jīng)到了末尾，或者下 1 個(gè)字符是不能轉(zhuǎn)義的字符，則返回 Token::ILLEGAL。那么我們看看 ScanEscape 是不是返回了 false 呢？

template <bool capture_raw, bool in_template_literal>
bool Scanner::ScanEscape() {
  uc32 c = c0_;
  Advance<capture_raw>();
  // Skip escaped newlines.
  if (!in_template_literal && c0_ != kEndOfInput && IsLineTerminator(c)) {
    // Allow escaped CR+LF newlines in multiline string literals.
    if (IsCarriageReturn(c) && IsLineFeed(c0_)) Advance<capture_raw>();
    return true;
  }
  switch (c) {
    case '\'':  // fall through
    case '"' :  // fall through
    case '\\': break;
    case 'b' : c = '\b'; break;
    case 'f' : c = '\f'; break;
    case 'n' : c = '\n'; break;
    case 'r' : c = '\r'; break;
    case 't' : c = '\t'; break;
    case 'u' : {
      c = ScanUnicodeEscape<capture_raw>();
      if (c < 0) return false;
      break;
    }
    case 'v':
      c = '\v';
      break;
    case 'x': {
      c = ScanHexNumber<capture_raw>(2);
      if (c < 0) return false;
      break;
    }
    case '0':  // Fall through.
    case '1':  // fall through
    case '2':  // fall through
    case '3':  // fall through
    case '4':  // fall through
    case '5':  // fall through
    case '6':  // fall through
    case '7':
      c = ScanOctalEscape<capture_raw>(c, 2);
      break;
  }
  // Other escaped characters are interpreted as their non-escaped version.
  AddLiteralChar(c);
  return true;
}

這個(gè)函數(shù)只有 2 處返回了 false。

1、如果轉(zhuǎn)義字符后面是 u，u 后面不是 Unicode 字符時(shí)，返回 false

2、如果轉(zhuǎn)義字符后面是 x，x 后面不是十六進(jìn)制數(shù)字時(shí)，返回 false

也就是說(shuō)：'\u'、'\uhello'、'\u1'、'\x'、'\xx' 都拋出異常。

Uncaught SyntaxError: Invalid Unicode escape sequence

或

Uncaught SyntaxError: Invalid hexadecimal escape sequence

而其它非轉(zhuǎn)義字符，都直接執(zhí)行了后面的代碼：

AddLiteralChar(c);
return true;

前面的注釋也說(shuō)明了這一點(diǎn)：

Other escaped characters are interpreted as their non-escaped version.

其他轉(zhuǎn)義字符被解釋為對(duì)應(yīng)的非轉(zhuǎn)義版本。

綜上，問(wèn)題的根源就是 JavaScript 和 JSON 對(duì)轉(zhuǎn)義字符的處理方式不同，導(dǎo)致了難以發(fā)現(xiàn)的 bug。JSON 遇到不能轉(zhuǎn)義的字符直接拋出異常，而 JavaScript 遇到不能轉(zhuǎn)義的字符直接解釋為對(duì)應(yīng)的非轉(zhuǎn)義版本。

以上就是JavaScript 轉(zhuǎn)義字符JSON parse錯(cuò)誤研究的詳細(xì)內(nèi)容，更多關(guān)于JavaScript JSON parse錯(cuò)誤的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: