簡體   English   中英

如何用單詞位置數組替換句子

[英]How to replace sentences with an array of words postion

我有這樣一個句子:

ジェーンは先週日本に來て、毎日4時間日本語のクラスで勉強しています

像這樣的數據令牌:

[{“ token”:“ジェーン”,“ type”:“ word”,“ start_offset”:0,“ end_offset”:4,“ position”:0},{“ token”:“は”,“ type”: “ word”,“ start_offset”:4,“ end_offset”:5,“ position”:1},{“ token”:“先周”,“ type”:“ word”,“ start_offset”:5,“ end_offset”: 7,“ position”:2},{“ token”:“日本”,“ type”:“ word”,“ start_offset”:7,“ end_offset”:9,“ position”:3},{“ token”: “に”,“ type”:“ word”,“ start_offset”:9,“ end_offset”:10,“ position”:4},{“ token”:“來”,“ type”:“ word”,“ start_offset “:10,” end_offset“:11,” position“:5},{” token“:”て“,” type“:” word“,” start_offset“:11,” end_offset“:12,” position“: 6},{“ token”:“毎日”,“ type”:“ word”,“ start_offset”:13,“ end_offset”:15,“ position”:7},{“ token”:“ 4”,“ type “:” word“,” start_offset“:15,” end_offset“:16,” position“:8},{” token“:”時間“,” type“:” word“,” start_offset“:16,” end_offset “:18,” position“:9},{” token“:”日本語“,” type“:” word“,” start_offset“:18,” end_offset“:21,” position“:10},{” token“:”の“,” type“:” word“,” start_offset“:21,” end_offset“:22,” position“:11},{” token“:”クラス“,” type“:” word“ ,“ start_offset”:22,“ e nd_offset“:25,” position“:12},{” token“:”で“,” type“:” word“,” start_offset“:25,” end_offset“:26,” position“:13},{” token“:”勉強“,” type“:” word“,” start_offset“:26,” end_offset“:28,” position“:14},{” token“:”し“,” type“:” word“ ,“ start_offset”:28,“ end_offset”:29,“ position”:15},{“ token”:“て”,“ type”:“ word”,“ start_offset”:29,“ end_offset”:30,“ position“:16},{” token“:”い“,” type“:” word“,” start_offset“:30,” end_offset“:31,” position“:17}]]

我如何通過start_offset和end_offset將文本包裝在句子中,如下所示:

<span>ジェーン</span><span>は</span><span>先週</span>... 

我試過用StringBuilder在位置處替換,但單詞索引已更改,因此從標記2開始,這是錯誤的。

插入新元素會將所有元素的位置移到該元素之后 因此,請嘗試從字符串的末尾開始並向后工作。 這意味着您不必重新計算頭寸,因為受影響的頭寸是您已經處理過的頭寸。

string result = sentence;

foreach (var token in dataTokens.OrderByDescending(x => x.position))
{
    result = result.Insert(token.end_offset, "</span>");
    result = result.Insert(token.start_offset, "<span>");
}

return result;

測試出來將產生以下字符串:

 <span>ジェーン</span><span>は</span><span>先週</span><span>日本</span><span>に</span><span>來</span><span>て</span>、<span>毎日</span><span>4</span><span>時間</span><span>日本語</span><span>の</span><span>クラス</span><span>で</span><span>勉強</span><span>し</span><span>て</span><span>い</span>ます

我建議創建一個新字符串,然后執行以下操作(不完全是代碼):

string s = null;
foreach(string token in dataTokens)
{
   s+="<span>" + token + "</span>";
}

UPD:發表評論后,我嘗試使用以下令牌模擬您的方案:

 class Token
{
    private int start_offset;
    private int end_offset;
    private int position;
    string type;
    string token;

    public Token(int so, int se, int pos, string type, string token)
    {
        start_offset = so;
        end_offset = se;
        position = pos;
        this.type = type;
        this.token = token;
    }

    public string TokenProp
    {
       get { return token; }
    }
}

我完成跨度滾動是這樣的:

List<Token> tokens = new List<Token>();

        tokens.Add(new Token(0, 4, 0, "word", "abcd"));
        tokens.Add(new Token(4, 5, 1, "word", "e"));
        tokens.Add(new Token(6, 9, 2, "word", "fgh"));
        tokens.Add(new Token(9, 11, 3, "word", "ijk"));

        StringBuilder sb = new StringBuilder();
        foreach (Token t in tokens)
        {
            sb.Append("<span>");
            sb.Append(t.TokenProp);
            sb.Append("</span>");
        }

UPD2上面的答案更好:)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM