简体   繁体   English

如何用单词位置数组替换句子

[英]How to replace sentences with an array of words postion

I have a sentence like this: 我有这样一个句子:

ジェーンは先週日本に来て、毎日4時間日本語のクラスで勉強しています

And data token like this: 像这样的数据令牌:

[{"token":"ジェーン","type":"word","start_offset":0,"end_offset":4,"position":0},{"token":"は","type":"word","start_offset":4,"end_offset":5,"position":1},{"token":"先週","type":"word","start_offset":5,"end_offset":7,"position":2},{"token":"日本","type":"word","start_offset":7,"end_offset":9,"position":3},{"token":"に","type":"word","start_offset":9,"end_offset":10,"position":4},{"token":"来","type":"word","start_offset":10,"end_offset":11,"position":5},{"token":"て","type":"word","start_offset":11,"end_offset":12,"position":6},{"token":"毎日","type":"word","start_offset":13,"end_offset":15,"position":7},{"token":"4","type":"word","start_offset":15,"end_offset":16,"position":8},{"token":"時間","type":"word","start_offset":16,"end_offset":18,"position":9},{"token":"日本語","type":"word","start_offset":18,"end_offset":21,"position":10},{"token":"の","type":"word","start_offset":21,"end_offset":22,"position":11},{"token":"クラス","type":"word","start_offset":22,"e [{“ token”:“ジェーン”,“ type”:“ word”,“ start_offset”:0,“ end_offset”:4,“ position”:0},{“ token”:“は”,“ type”: “ word”,“ start_offset”:4,“ end_offset”:5,“ position”:1},{“ token”:“先周”,“ type”:“ word”,“ start_offset”:5,“ end_offset”: 7,“ position”:2},{“ token”:“日本”,“ type”:“ word”,“ start_offset”:7,“ end_offset”:9,“ position”:3},{“ token”: “に”,“ type”:“ word”,“ start_offset”:9,“ end_offset”:10,“ position”:4},{“ token”:“来”,“ type”:“ word”,“ start_offset “:10,” end_offset“:11,” position“:5},{” token“:”て“,” type“:” word“,” start_offset“:11,” end_offset“:12,” position“: 6},{“ token”:“毎日”,“ type”:“ word”,“ start_offset”:13,“ end_offset”:15,“ position”:7},{“ token”:“ 4”,“ type “:” word“,” start_offset“:15,” end_offset“:16,” position“:8},{” token“:”时间“,” type“:” word“,” start_offset“:16,” end_offset “:18,” position“:9},{” token“:”日本语“,” type“:” word“,” start_offset“:18,” end_offset“:21,” position“:10},{” token“:”の“,” type“:” word“,” start_offset“:21,” end_offset“:22,” position“:11},{” token“:”クラス“,” type“:” word“ ,“ start_offset”:22,“ e nd_offset":25,"position":12},{"token":"で","type":"word","start_offset":25,"end_offset":26,"position":13},{"token":"勉強","type":"word","start_offset":26,"end_offset":28,"position":14},{"token":"し","type":"word","start_offset":28,"end_offset":29,"position":15},{"token":"て","type":"word","start_offset":29,"end_offset":30,"position":16},{"token":"い","type":"word","start_offset":30,"end_offset":31,"position":17}] nd_offset“:25,” position“:12},{” token“:”で“,” type“:” word“,” start_offset“:25,” end_offset“:26,” position“:13},{” token“:”勉强“,” type“:” word“,” start_offset“:26,” end_offset“:28,” position“:14},{” token“:”し“,” type“:” word“ ,“ start_offset”:28,“ end_offset”:29,“ position”:15},{“ token”:“て”,“ type”:“ word”,“ start_offset”:29,“ end_offset”:30,“ position“:16},{” token“:”い“,” type“:” word“,” start_offset“:30,” end_offset“:31,” position“:17}]]

how can i wrap text in sentence by start_offset and end_offset like this: 我如何通过start_offset和end_offset将文本包装在句子中,如下所示:

<span>ジェーン</span><span>は</span><span>先週</span>... 

I've tried StringBuilder to replace at position but index of words changed, so from token 2, it's wrong. 我试过用StringBuilder在位置处替换,但单词索引已更改,因此从标记2开始,这是错误的。

Inserting new elements moves the position of everything after that element. 插入新元素会将所有元素的位置移到该元素之后 Therefore, try starting from the end of the string and working backwards. 因此,请尝试从字符串的末尾开始并向后工作。 This means you don't have to recalculate the positions, because the positions that are affected are the ones you've already dealt with. 这意味着您不必重新计算头寸,因为受影响的头寸是您已经处理过的头寸。

string result = sentence;

foreach (var token in dataTokens.OrderByDescending(x => x.position))
{
    result = result.Insert(token.end_offset, "</span>");
    result = result.Insert(token.start_offset, "<span>");
}

return result;

Testing this out yields the following string: 测试出来将产生以下字符串:

 <span>ジェーン</span><span>は</span><span>先週</span><span>日本</span><span>に</span><span>来</span><span>て</span>、<span>毎日</span><span>4</span><span>時間</span><span>日本語</span><span>の</span><span>クラス</span><span>で</span><span>勉強</span><span>し</span><span>て</span><span>い</span>ます

I would suggest to create a new string and then do something like this (not exact code): 我建议创建一个新字符串,然后执行以下操作(不完全是代码):

string s = null;
foreach(string token in dataTokens)
{
   s+="<span>" + token + "</span>";
}

UPD: After comments I've tried to simulate your scenario with such token: UPD:发表评论后,我尝试使用以下令牌模拟您的方案:

 class Token
{
    private int start_offset;
    private int end_offset;
    private int position;
    string type;
    string token;

    public Token(int so, int se, int pos, string type, string token)
    {
        start_offset = so;
        end_offset = se;
        position = pos;
        this.type = type;
        this.token = token;
    }

    public string TokenProp
    {
       get { return token; }
    }
}

The rolling with span I've done like this: 我完成跨度滚动是这样的:

List<Token> tokens = new List<Token>();

        tokens.Add(new Token(0, 4, 0, "word", "abcd"));
        tokens.Add(new Token(4, 5, 1, "word", "e"));
        tokens.Add(new Token(6, 9, 2, "word", "fgh"));
        tokens.Add(new Token(9, 11, 3, "word", "ijk"));

        StringBuilder sb = new StringBuilder();
        foreach (Token t in tokens)
        {
            sb.Append("<span>");
            sb.Append(t.TokenProp);
            sb.Append("</span>");
        }

UPD2 The answer above is way better :) UPD2上面的答案更好:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM