简体   繁体   中英

Combining text and strings

I have the following file containing some information about an audio file.

Language = es-ES

Duration = 00:00:00.9100000

Unos amigos.
*

Language = es-ES

Duration = 00:00:03.5600000

Yo sé vamos a la fiesta en English with.
*

Language = en-US

Duration = 00:00:05.0200000

Hey, let us go to the party and Spanish. We say bye Marcella.
*

Language = es-ES

Duration = 00:00:02.2700000

Fiesta que yo use.
*

Language = es-ES

Duration = 00:00:00.8300000

La fiesta.

I want to combine the duration and every sentence together if it's the same language. I was thinking of splitting to an array of strings first using * as a delimiter but I don't know how to combine the duration or the sentences together, any help? I'm using C# btw. Is it better to create an object for each paragraph?

string[]subs=textFile.Split('*')

The wanted output:

Language = es-ES

Duration = 00:00:08.93

Unos amigos. Yo sé vamos a la fiesta en English with. Fiesta que yo use. La fiesta.

Language = en-US

Duration = 00:00:05.0200000

Hey, let us go to the party and Spanish. We say bye Marcella. 
    var source = @"Language = es-ES

Duration = 00:00:00.9100000

Unos amigos.
*

Language = es-ES

Duration = 00:00:03.5600000

Yo sé vamos a la fiesta en English with.
*

Language = en-US

Duration = 00:00:05.0200000

Hey, let's go to the party and Spanish. We say bye Marcella.
*

Language = es-ES

Duration = 00:00:02.2700000

Fiesta que yo use.
*

Language = es-ES

Duration = 00:00:00.8300000

La fiesta.";

var results =
    from section in source.Split(new string[] { $"*{Environment.NewLine}" }, StringSplitOptions.None)
    let parts = section.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
    let language = parts[0].Split('=', StringSplitOptions.TrimEntries)[1]
    let duration = TimeSpan.Parse(parts[1].Split('=', StringSplitOptions.TrimEntries)[1])
    let text = parts[2]
    group new { duration, text } by language into languages
    select new
    {
        language = languages.Key,
        duration = languages.Select(x => x.duration).Aggregate((x, y) => x.Add(y)),
        text = String.Join(" ", languages.Select(x => x.text)),
    };

Given this source data I got this:

结果

I would do something like this. It is very messy and not good code, but I am in a hurry so make out of it what you want. It would be probably be best practice to make a class for each language.

    List<string> language = new List<string>();
    List<TimeSpan> duration = new List<TimeSpan>();
    List<string> text = new List<string>();

    void Main(string[] args)
    {
        string file = System.IO.File.ReadAllText(@"path\file.txt");
        string[] lines = file.Split('\n');

        for(int i = 0; i < lines.Length; i++)
        {
            int pos = language.IndexOf(lines[i]);
            if(pos != -1)
            {
                i++;
                duration[pos].Add(TimeSpan.Parse(lines[i].Substring(10, 16)));
                i++;
                text[pos] += lines[i];
                i+=2;
            }else
            {
                language.Add(lines[i]);
                pos = language.IndexOf(lines[i]);
                i++;
                duration.Add(TimeSpan.Parse(lines[i].Substring(10, 16)));
                i++;
                text.Add(lines[i]);
                i += 2;
            }
        }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM