简体   繁体   English

如何使单引号字符串像 Ruby 中的双引号字符串一样?

[英]How to make a single-quoted string act like a double-quoted string in Ruby?

I have a file that have an HTMl code, the HTML tags are encoded like the following content:我有一个包含 HTMl 代码的文件,HTML 标记的编码如下内容:

\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e

The decoded HTML should be:解码后的 HTML 应该是:

<div data-name="region-name" class="main-id">UK</div>

In Ruby, I used cgi library to unescapeHTML however it does not work because when it read the content it does not identify the encoded tags, here is another example:在 Ruby 中,我使用cgi库来unescapeHTML ,但它不起作用,因为当它读取内容时它不能识别编码标签,这是另一个示例:

require 'cgi'

single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'
double_quoted_string = "\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e"


puts 'unescape single_quoted_string ' + CGI.unescapeHTML(single_quoted_string)
puts 'unescape double_quoted_string ' + CGI.unescapeHTML(double_quoted_string)

The output of the previous code is:前面代码的输出是:

unescape single_quoted_string \x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e
unescape double_quoted_string <div data-name="region-name" class="main-id">UK</div>

My question is, how can I make the single_quoted_string act as if its content is double-quoted to make the function understand the encoded tags?我的问题是,我怎样才能让single_quoted_string表现得好像它的内容是双引号的,以使函数理解编码的标签?

Thanks谢谢

Your problem has nothing to do with HTML, \x3c represent the hex number '3c' in the ascii table .您的问题与 HTML 无关, \x3c表示ascii table中的十六进制数字 '3c' 。 Double-quoted strings look for this patterns and convert them to the desired value, single-quoted strings treat it the final outcome.双引号字符串查找此模式并将它们转换为所需的值,单引号字符串将其视为最终结果。

You can check for yourself that CGI is not doing anything.您可以自己检查 CGI 没有做任何事情。

CGI.unescapeHTML(double_quoted_string) == double_quoted_string

The easiest way I know to solve your problem is through gsub我知道解决您的问题的最简单方法是通过gsub

def convert(str)
  str.gsub(/\\x(\w\w)/) do
    [Regexp.last_match(1)].pack("H*")
  end
end

single_quoted_string = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'

puts convert(single_quoted_string)

What convert does is to get every pair of hex escaped values and pack them as characters. convert所做的是获取每对十六进制转义值并将它们打包为字符。

Ruby's parser allows certain escape sequences in string literals . Ruby 的解析器允许字符串文字中的某些转义序列。

The double-quoted string literal "\x3c" is recognized as containing a hexadecimal pattern \xnn which represents the single character < .双引号字符串文字"\x3c"被识别为包含一个十六进制模式\xnn ,它表示单个字符< (0x3C in ASCII) (ASCII 中的 0x3C)

The single-quoted string literal '\x3c' however is treated literally, ie it represents four characters: \ , x , 3 , and c .然而,单引号字符串文字'\x3c'字面意思处理,即它表示四个字符: \x3c

how can I make the single_quoted_string act as if its content is double-quoted我怎样才能让single_quoted_string表现得好像它的内容是双引号的

You can't.你不能。 In order to turn these four characters into < you have to parse the string yourself:为了将这四个字符变成<你必须自己解析字符串:

str = '\x3c'

str[2, 2]         #=> "3c"  take hex part
str[2, 2].hex     #=> 60    convert to number
str[2, 2].hex.chr #=> "<"   convert to character

You can apply this to gsub :您可以将此应用于gsub

str = '\x3cdiv data-name\x3d\x22region-name\x22 class\x3d\x22main-id\x22\x3eUK\x3c/div\x3e'

str.gsub(/\\x\h{2}/) { |m| m[2, 2].hex.chr }
#=> "<div data-name=\"region-name\" class=\"main-id\">UK</div>"

/\\x\h{2}/ matches a literal backslash ( \\ ) followed by x and two ( {2} ) hex characters ( \h ). /\\x\h{2}/匹配文字反斜杠 ( \\ ) 后跟x和两个 ( {2} ) 十六进制字符 ( \h )。


Just for reference, a CGI encoded string would look like this:仅供参考,CGI 编码的字符串如下所示:

str = "<div data-name=\"region-name\" class=\"main-id\">UK</div>"

CGI.escapeHTML(str)
#=> "&lt;div data-name=&quot;region-name&quot; class=&quot;main-id&quot;&gt;UK&lt;/div&gt;"

It uses &...;它使用&...; style character references .样式字符参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 单引号和双引号html属性之间的功能差异是什么? - What are the functional differences between single-quoted vs double-quoted html attributes? 正则表达式以这种模式捕获单引号的字符串…? - Regex to capture single-quoted string in this pattern…? 将双引号连接到单引号字符串是否安全? - Is it safe to concatenate a double quoted to a single quoted string? 为什么在双引号PHP回显字符串中使用双斜杠注释会产生JS错误? - Why double slash comments in double-quoted PHP echo string generates JS error? 如何在PHP单引号内的工具提示标题html属性中使用换行符? - How can I use a new-line character within a tooltip title html attribute inside a single-quoted echo in PHP? JS知道属性值是双引号,单引号还是裸引号的一种方法? - JS a way to know if attribute value is double quoted, single quoted or bare? 使用BeautifulSoup解析单引号属性值中的非转义撇号 - Parsing a non-escaped apostrophe in a single-quoted attribute value with BeautifulSoup 如何解析引用HTML,JavaScript的背景图片url中的字符串? - How to parse a string in background-image url in quoted html,javascript? Javascript HTML:构建带引号的Google搜索字符串 - Javascript html: Constructing a Quoted Google search String 将引用的字符串发送到javascript函数,然后发送到php - Sending quoted string to javascript function, then to php
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM