简体   繁体   English

如何拆分文本文件每一行包含多个分隔符的字符串?

[英]How to split string containing multiple delimeters of each and every line of text file?

This is the Input my file contains: 这是我的文件包含的输入:

50|Hallogen|Mercury|M:4;C:40;A:1
90|Oxygen|Mars|M:10;C:20;A:00
5|Hydrogen|Saturn|M:33;C:00;A:3

Now i want to split each and every line of my text file and store in my class file like : 现在,我想分割文本文件的每一行,并将其存储在类文件中,例如:

Expected output : 预期产量

Planets[0]:
{
   Number:50
   name: Hallogen
   object:Mercury
   proportion[0]:
             {
                 Number:4
             },
    proportion[1]:
             {
                 Number:40
             },
proportion[2]:
             {
                 Number:1
             }
}

etc........ 等等........

My class file to store all this values: 我的类文件存储所有这些值:

public class Planets
    {
        public int Number { get; set; }  //This field points to first cell of every row.output 50,90,5
        public string name { get; set; } //This field points to Second cell of every row.output Hallogen,Oxygen,Hydrogen
        public string object { get; set; } ////This field points to third cell of every row.output Mercury,Mars,Saturn
        public List<proportion> proportion { get; set; } //This will store all proportions with respect to planet object.
         //for Hallogen it will store 4,40,1.Just store number.ignore M,C,A initials.
         //for oxygen it will store 10,20,00.Just store number.ignore M,C,A initials.
    }

    public class proportion
    {
        public int Number { get; set; } 
    }

This is what i have done: 这是我所做的:

 List<Planets> Planets = new List<Planets>();
                        using (StreamReader sr = new StreamReader(args[0]))
                        {
                            String line;
                            while ((line = sr.ReadLine()) != null)
                            {
                                string[] parts = Regex.Split(line, @"(?<=[|;-])");
                                foreach (var item in parts)
                                {
                                     var Obj = new Planets();//Not getting how to store it but not getting proper output in parts
                                }

                               Console.WriteLine(line);
                            }
                        }

To my understanding, multiple delimiters are maintained to have a nested structure. 据我了解,多个定界符被维护为具有嵌套结构。

You need to split the whole string first based on pipe, followed by semi colon and lastly by colon. 您首先需要基于管道将整个字符串分割,然后是半冒号,最后是冒号。

The order of splitting here is important. 此处的分割顺序很重要。 I don't think you can have all the tokens at once by splitting with all 3 delimiters. 我认为您无法通过将所有3个定界符拆分来同时拥有所有令牌。

Try following code for same kind of data 尝试遵循以下代码获取同类数据

var values = new List<string>
{
     "50|Hallogen|Mercury|M:4;C:40;A:1",
     "90|Oxygen|Mars|M:10;C:20;A:00",
     "5|Hydrogen|Saturn|M:33;C:00;A:3"
};
foreach (var value in values)
{
     var pipeSplitted = value.Split('|');
     var firstNumber = pipeSplitted[0];
     var name = pipeSplitted[1];
     var objectName = pipeSplitted[2];
     var semiSpltted = value.Split(';');
     var secondNumber = semiSpltted[0].Split(':')[1];
     var thirdNumber = semiSpltted[1].Split(':')[1];
     var colenSplitted = value.Split(':');
     var lastNumber = colenSplitted[colenSplitted.Length - 1];
}

在此处输入图片说明

If I understand correctly, your input is well formed. 如果我理解正确,则您的输入格式正确。 In this case you could use something like this: 在这种情况下,您可以使用以下方式:

string[] parts = Regex.Split(line, @"[|;-]");
var planet =  new Planets(parts);


...

public Planets(string[] parts) {
    int.TryParse(parts[0], this.Number);
    this.name = parts[1];
    this.object = parts[2];
    this.proportion = new List<proportion>();
    Regex PropRegex = new Regex("\d+");
    for(int i = 3; i < parts.Length; i++){
        Match PropMatch = PropRegex.Match(part[i]);
        if(PropMatch.IsMatch){
            this.proportion.Add(int.Parse(PropMatch.Value));
        }
    }

}

Without you having to change any of your logic in "Planets"-class my fast solution to your problem would look like this: 无需更改“行星”类中的任何逻辑,我对问题的快速解决方案将如下所示:

List<Planets> Planets = new List<Planets>();
                        using (StreamReader sr = new StreamReader(args[0]))
                        {
                            String line;
                            while ((line = sr.ReadLine()) != null)
                            {
                                Planets planet = new Planets();
                                String[] parts = line.Split('|');
                                planet.Number = Convert.ToInt32(parts[0]);
                                planet.name = parts[1];
                                planet.obj = parts[2];

                                String[] smallerParts = parts[3].Split(';');
                                planet.proportion = new List<proportion>();
                                foreach (var item in smallerParts)
                                {
                                    proportion prop = new proportion();
                                    prop.Number =                                    
                                    Convert.ToInt32(item.Split(':')[1]);
                                    planet.proportion.Add(prop);
                                }
                                Planets.Add(planet);
                            }
                        }

Oh before i forget it, you should not name your property of class Planets "object" because "object" is a keyword for the base class of everything, use something like "obj", "myObject" ,"planetObject" just not "object" your compiler will tell you the same ;) 哦,在我忘记它之前,您不应该将Planets类的属性命名为“ object”,因为“ object”是所有内容的基类的关键字,请使用诸如“ obj”,“ myObject”,“ planetObject”之类的东西,而不要使用“ object” “您的编译器会告诉您相同的内容;)

The most straigtforward solution is to use a regex where every (sub)field is matched inside a group 最有效的解决方案是使用正则表达式,其中每个(子)字段都在组内匹配

var subjectString = @"50|Hallogen|Mercury|M:4;C:40;A:1
90|Oxygen|Mars|M:10;C:20;A:00
5|Hydrogen|Saturn|M:33;C:00;A:3";

    Regex regexObj = new Regex(@"^(.*?)\|(.*?)\|(.*?)\|M:(.*?);C:(.*?);A:(.*?)$", RegexOptions.Multiline);
    Match match = regexObj.Match(subjectString);
    while (match.Success) {

        match.Groups[1].Value.Dump();
        match.Groups[2].Value.Dump();
        match.Groups[3].Value.Dump();
        match.Groups[4].Value.Dump();
        match.Groups[5].Value.Dump();
        match.Groups[6].Value.Dump();

        match = match.NextMatch();
    } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM