一个有意思的问题

Wednesday. November 29, 2017 - 1 min

最近在开发一个简单的 webide 时，发现了一个诡异的问题。

使用 fetch 更新数据，如果 header 的 content-type 设置成 application/x-www-form-urlencoded;, 发送类似 {srouce: "1+2+3"} 这样的数据，会被替换成 {source: "1 2 3"}。+号都被替换成了空格，后来又试了一下 -, ? 和 &。发现 & 也会被替换。

但是把 content-type 改成 application/json; 就不会存在这个问题了。

抱着求证的态度跟同事讨论了一下，发现了这样的一个描述

This is the default content type. Forms submitted with this content type must be encoded as follows:

Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH’, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., `%0D%0A’).

The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by &’.

看起来是和 url 的 encoding 有关，点进文档中的链接又继续看了一下，发现了如下的描述：

2.2. URL Character Encoding Issues

URLs are sequences of characters, i.e., letters, digits, and special characters. A URLs may be represented in a variety of ways: e.g., ink on paper, or a sequence of octets in a coded character set. The interpretation of a URL depends only on the identity of the characters used. In most URL schemes, the sequences of characters in different parts of a URL are used to represent sequences of octets used in Internet protocols. For example, in the ftp scheme, the host name, directory name and file names are such sequences of octets, represented by parts of the URL. Within those parts, an octet may be represented by

大概想了一下，应该是因为空格在请求中会被替换成+，所以如果数据中有+的话，会被当成空格，所以就直接被转成了空格。

针对这个问题，如果不想更改 header 的话，使用 encodeURIComponent 把字符串 encode 一下也是可以的。