简体   繁体   English

C#将检索到的字符串拆分为列表/数组

[英]C# Splitting retrieved string to list/array

So I fetch a string from a website via code from another question I posted here. 因此,我通过我在此处发布的另一个问题的代码从网站获取了一个字符串。 This works really well when I put it into a rich textbox, but, now I need to split the string into seperate sentences in a list/array (suppose list will be easier, since you don't need to determine how long the input is going to be). 当我将其放入富文本框中时,这确实很好用,但是,现在我需要将字符串分成列表/数组中的单独句子(假设列表会更容易,因为您无需确定输入的时间长短将)。

Yesterday I found the following code at another question (didn't note the question, sorry): 昨天,我在另一个问题上发现了以下代码(对不起,这个问题没有记录):

List<string> list = new List<string>(Regex.Split(lyrics, Environment.NewLine));

But the input is now spliting into two parts, the first three sentences and the rest. 但是现在输入分为两部分,前三个句子和其余部分。

I retrieve the text from musixmatch.com with the following code (added fixed url for simplicity): 我使用以下代码从musixmatch.com检索了文本(为简单起见添加了固定的url):

var source = "https://www.musixmatch.com/lyrics/Krewella/Alive";
var htmlWeb = new HtmlWeb();
var documentNode = htmlWeb.Load(source).DocumentNode;

var findclasses = documentNode
    .Descendants("p")
    .Where(d => d.Attributes["class"]?.Value.Contains("mxm-lyrics__content") == true);

var text = string.Join(Environment.NewLine, findclasses.Select(x => x.InnerText));

More information about this code can be found here . 有关此代码的更多信息,请参见此处 What it does in a nutshell is it retrieves specific html that has the lyrics in it. 简而言之,它会检索包含歌词的特定html。 I need to split the lyrics line by line for a synchronization process that I'm building (just like was built-in in Spotify a while ago). 我需要逐行拆分歌词以进行我正在构建的同步过程 (就像前一段时间在Spotify中内置的一样)。 I need something (preferably an list/array) that I can index because that would make the database to store all this data a bit smaller. 我需要可以索引的内容(最好是列表/数组),因为这会使数据库存储所有这些数据的空间变小。 What am I supposed to use for this process? 我应该在此过程中使用什么?

Edit: Answer to the mark of a possible duplicate: C# Splitting retrieved string to list/array 编辑:回答可能重复的标记: C#将检索到的字符串拆分为列表/数组

您可以将两者分开:

var lines = string.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);

What I would do is to ensure that there is a common concept of "NewLine" in the code. 我要做的是确保代码中存在“ NewLine”的通用概念。 It could be \\r, \\n or \\r\\n. 可能是\\ r,\\ n或\\ r \\ n。 Simply replace all '\\n' with "". 只需将所有'\\ n'替换为“”。 (Edited this one) (编辑此内容)

Now, all you have to do is 现在,您要做的就是

var lyricLines = lyricsWithCommonNewLine.Split('\r')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM