在D中将文本转换为HTML

Question

I'm trying to figure the best way of encoding text (either 8-bit ubyte[] or string ) to its HTML counterpart. 我正在尝试找到将文本（8位ubyte[]或string ）编码为HTML对应文本的最佳方法。

My proposal so far is to use a lookup-table to map the 8-bit characters 到目前为止，我的建议是使用查找表来映射8位字符

string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = "&quot";
lutLatin1ToXML[0x26] = "&amp";
...

in HTML that have special meaning using the function 在HTML中使用函数具有特殊含义

pure string toHTML(in string src,
                   ref in string[256] lut) {
    return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}

I almost work except for the fact that I don't know how to create a string from a `ubyte? 除了我不知道如何从`ubyte创建字符串之外，我几乎可以工作。 (the no-translation case). （无翻译的情况）。

I tried 我试过了

writeln(new string('a'));

but it prints garbage and I don't know why. 但它会打印垃圾，我不知道为什么。

For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference 有关HTML编码的更多详细信息，请参见https://en.wikipedia.org/wiki/Character_entity_reference

Answer 1

You can make a string from a ubyte most easily by doing "" ~ b, for example: 您可以通过执行“”〜b来最轻松地从ubyte生成字符串，例如：

ubyte b = 65;
string a = "" ~ b;
writeln(a); // prints A

BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful: https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff 顺便说一句，如果您想做很多HTML事情，我的dom.d和characterencodings.d可能会有用： https : //github.com/adamdruppe/misc-stuff-includes-D-programming-language-web-stuff

It has a html parser, dom manipulation functions similar to javascript (eg ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded. 它具有html解析器，类似于javascript的dom操作功能（例如ele.querySelector（），getElementById，ele.innerHTML，ele.innerText等），从包括latin1在内的几种不同字符编码的转换，并输出ascii安全html所有特殊字符和unicode字符均已正确编码。

assert(htmlEntitiesEncode("foo < bar") == "foo &lt; bar";

stuff like that. 像那样的东西。

Answer 2

In this case Adam's solution works just fine, of course. 当然，在这种情况下，亚当的解决方案就可以了。 (It takes advantage of the fact that ubyte is implicitly convertible to char, which is then appended to the immutable(char)[] array for which string is an alias.) （利用了以下事实：ubyte可隐式转换为char，然后将其附加到字符串是别名的immutable（char）[]数组中。）

In general the safe way of converting types is to use std.conv. 通常，转换类型的安全方法是使用std.conv。

import std.stdio, std.conv;

void main() {
    // utf-8
    char cc = 'a';
    string s1 = text(cc);
    string s2 = to!string(cc);
    writefln("%c %s %s", cc, s1, s2);

    // utf-16
    wchar wc = 'a';
    wstring s3 = wtext(wc);
    wstring s4 = to!wstring(wc);
    writefln("%c %s %s", wc, s3, s4);    

    // utf-32
    dchar dc = 'a';
    dstring s5 = dtext(dc);
    dstring s6 = to!dstring(dc); 
    writefln("%c %s %s", dc, s5, s6);

    ubyte b = 65;
    string a = to!string(b);
}

NB. 注意 text() is actually intended for processing multiple arguments, but is conveniently short. text（）实际上是用于处理多个参数的，但是它很短。

在D中将文本转换为HTML

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-09-23 21:41:15

解决方案2
1 2013-09-24 14:53:19

在D中将文本转换为HTML

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-09-23 21:41:15

解决方案2 1 2013-09-24 14:53:19

解决方案1
2 已采纳 2013-09-23 21:41:15

解决方案2
1 2013-09-24 14:53:19