简体   繁体   English

Windows工具来解码文件中的HTML实体

[英]Windows tool to decode HTML entities in a file

Is there a command line/batch script tool for Windows that can be used to decode HTML entitles like   是否有适用于Windows的命令行/批处理脚本工具,可用于解码 等HTML标题  , ℘ ℘ , and ‰ ‰ to readable UTF-8 text? 可读的UTF-8文本?

I found this web tool ( https://mothereff.in/html-entities ) that uses javascript that can do just this but I need this done from a Windows batch file. 我发现这个使用JavaScript的网络工具( https://mothereff.in/html-entities )可以做到这一点,但是我需要Windows批处理文件来完成。 I know of the amazing JREPL.bat utility which incorporates javascript into windows command shell to make regex replacements in files. 我知道惊人的JREPL.bat实用程序,该实用程序将javascript合并到Windows命令外壳中,以在文件中进行正则表达式替换。 I just can't find a similar tool for HTML entities conversion. 我只是找不到用于HTML实体转换的类似工具。

Edit: To the bright coders out there, I hope you can write a batch tool that can perform HTML entities decoding/encoding to help me and the future readers looking for the same solution. 编辑:对于出色的编码人员,我希望您可以编写一个批处理工具,该工具可以执行HTML实体的解码/编码,以帮助我和未来的读者寻求相同的解决方案。 Here are Github pages I think can be of use: https://github.com/mathiasbynens/he https://github.com/mathiasbynens/mothereff.in/tree/master/html-entities 这是我认为可以使用的Github页面: https : //github.com/mathiasbynens/he https://github.com/mathiasbynens/mothereff.in/tree/master/html-entities

You don't need extensive applications (like JREPL.bat or my own FindRepl.bat ) or complicated programs in order to perform a replacement as simple as this one. 您不需要广泛的应用程序(例如JREPL.bat或我自己的FindRepl.bat )或复杂的程序即可执行像这样简单的替换。 The small Batch file below is an example that performs a replacement of 3 HTML entities: 下面的小批处理文件是一个示例,该示例执行3个HTML实体的替换:

@set @a=0 // & cscript //nologo //E:JScript "%~F0" < input.txt & goto :EOF

var rep = new Array();
rep["&#xA9;"]   = "\u00A9";
rep["&#xD306;"] = "\uD306";
rep["&#x2603;"] = "\u2603";

var f = new ActiveXObject("Scripting.FileSystemObject").CreateTextFile("output.txt", true, true);
f.Write(WScript.Stdin.ReadAll().replace(/&#xA9;|&#xD306;|&#x2603;/g,function (A) {return rep[A]}));
f.Close();

input.txt: input.txt:

Foo &#xA9; bar &#xD306; baz &#x2603; qux

output.txt: output.txt:

Foo © bar 팆 baz ☃ qux

You only need to add as many character equivalences as you want to convert... 您只需要添加要转换的尽可能多的字符等价...

It is trivial to incorporate JScript into a batch file, so you could easily write your own custom hybrid JScript/batch script that incorporates the he.js found at https://github.com/mathiasbynens/he . 将JScript合并到批处理文件中是微不足道的,因此您可以轻松编写自己的自定义混合JScript / batch脚本,其中包含在https://github.com/mathiasbynens/he上找到的he.js。

But it is even simpler to use the JREPL.BAT tool that you already mentioned. 但是使用您已经提到的JREPL.BAT工具甚至更简单。 You can use the /JLIB option to load the he.js code, thus making all of the he (html-entities) functionality accessible to JREPL. 您可以使用/JLIB选项加​​载he.js代码,从而使JREPL可以访问所有的he(html实体)功能。

Here is a trivial example that decodes test.txt, overwriting the original file. 这是一个解码test.txt并覆盖原始文件的简单示例。

jrepl "^.*" "he.decode($0)" /jlib "he.js" /f test.txt /o -

This isn't the most efficient way to do it, but it is probably plenty fast enough, and it sure is convenient. 这不是最有效的方法,但它可能足够快,而且肯定很方便。

Here is another example that encodes every character in test.txt (including newlines), writing the result to out.txt 这是另一个示例,该示例对test.txt中的每个字符(包括换行符)进行编码,并将结果写入out.txt

jrepl "^[\s\S]*" "he.encode($0,{encodeEverything:true})" /m /j /jlib he\he.js /f test.txt /o out.txt

You should study all the documentation for both he and JREPL to discover all the possibilities. 您应该研究他和JREPL的所有文档,以发现所有可能性。

The regex portion in the examples might seem to be more of a hindrance then help. 示例中的正则表达式部分似乎是一个障碍,然后有所帮助。 But it is easy to envision how it might be useful to selectively encode only portions of your input text. 但是很容易想到仅选择性地对输入文本的一部分进行编码可能会有用。 Or you could use the JREPL /T option to use different encoding options for different sections of text. 或者,您可以使用JREPL / T选项对文本的不同部分使用不同的编码选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM