简体   繁体   English

在D中将文本转换为HTML

[英]Converting Text to HTML In D

I'm trying to figure the best way of encoding text (either 8-bit ubyte[] or string ) to its HTML counterpart. 我正在尝试找到将文本(8位ubyte[]string )编码为HTML对应文本的最佳方法。

My proposal so far is to use a lookup-table to map the 8-bit characters 到目前为止,我的建议是使用查找表来映射8位字符

string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = "&quot";
lutLatin1ToXML[0x26] = "&amp";
...

in HTML that have special meaning using the function 在HTML中使用函数具有特殊含义

pure string toHTML(in string src,
                   ref in string[256] lut) {
    return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}

I almost work except for the fact that I don't know how to create a string from a `ubyte? 除了我不知道如何从`ubyte创建字符串之外,我几乎可以工作。 (the no-translation case). (无翻译的情况)。

I tried 我试过了

writeln(new string('a'));

but it prints garbage and I don't know why. 但它会打印垃圾,我不知道为什么。

For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference 有关HTML编码的更多详细信息,请参见https://en.wikipedia.org/wiki/Character_entity_reference

You can make a string from a ubyte most easily by doing "" ~ b, for example: 您可以通过执行“”〜b来最轻松地从ubyte生成字符串,例如:

ubyte b = 65;
string a = "" ~ b;
writeln(a); // prints A

BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful: https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff 顺便说一句,如果您想做很多HTML事情,我的dom.d和characterencodings.d可能会有用: https : //github.com/adamdruppe/misc-stuff-includes-D-programming-language-web-stuff

It has a html parser, dom manipulation functions similar to javascript (eg ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded. 它具有html解析器,类似于javascript的dom操作功能(例如ele.querySelector(),getElementById,ele.innerHTML,ele.innerText等),从包括latin1在内的几种不同字符编码的转换,并输出ascii安全html所有特殊字符和unicode字符均已正确编码。

assert(htmlEntitiesEncode("foo < bar") == "foo &lt; bar";

stuff like that. 像那样的东西。

In this case Adam's solution works just fine, of course. 当然,在这种情况下,亚当的解决方案就可以了。 (It takes advantage of the fact that ubyte is implicitly convertible to char, which is then appended to the immutable(char)[] array for which string is an alias.) (利用了以下事实:ubyte可隐式转换为char,然后将其附加到字符串是别名的immutable(char)[]数组中。)

In general the safe way of converting types is to use std.conv. 通常,转换类型的安全方法是使用std.conv。

import std.stdio, std.conv;

void main() {
    // utf-8
    char cc = 'a';
    string s1 = text(cc);
    string s2 = to!string(cc);
    writefln("%c %s %s", cc, s1, s2);

    // utf-16
    wchar wc = 'a';
    wstring s3 = wtext(wc);
    wstring s4 = to!wstring(wc);
    writefln("%c %s %s", wc, s3, s4);    

    // utf-32
    dchar dc = 'a';
    dstring s5 = dtext(dc);
    dstring s6 = to!dstring(dc); 
    writefln("%c %s %s", dc, s5, s6);

    ubyte b = 65;
    string a = to!string(b);
} 

NB. 注意 text() is actually intended for processing multiple arguments, but is conveniently short. text()实际上是用于处理多个参数的,但是它很短。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM