简体   繁体   中英

Extract some specific result from text file in c#

the following input File

INPUT FILE

a    00002098    0    0.75    unable#1    (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
a    00002312    0.23    0.43    dorsal#2 abaxial#1    facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"
a    00023655    0    0.5    outside#10 away#3 able#2    (of a baseball pitch) on the far side of home plate from the batter; "the pitch was away (or wide)"; "an outside pitch"    

And i wants the following result for this file
OUTPUT

a,00002098,0,0.75,unable#1
a,00002312,0.23,0.43,dorsal#2 
a,00002312,0.23,0.43,abaxial#1    
a,00023655,0, 0.5,outside#10    
a,00023655,0, 0.5,away#3
a,00023655,0, 0.5,able#2    

i writes the following code to extract such above result

 TextWriter tw = new StreamWriter("D:\\output.txt");

        private void button1_Click(object sender, EventArgs e)
        {
            if (textBox1.Text != null)
            {
                StreamReader reader = new StreamReader(@"C:\Users\Zia\Desktop\input.txt");
                string line;
                String lines = "";
                while ((line = reader.ReadLine()) != null)
                {
                    String[] str = line.Split('\t');
                    String[] words = str[3].Split(' ');
                    for (int k = 0; k < words.Length; k++)
                    {
                        for (int i = 0; i < str.Length; i++)
                        {
                            if (i + 1 != str.Length)
                            {
                                lines = lines + str[i] + ",";
                            }
                            else
                            {
                                lines = lines + words[k] + "\r\n";
                            }
                        }
                    }
                }
                tw.Write(lines);
                tw.Close();
                reader.Close();
            }
        }    

when i change the index,this code gives the following Error and not gives the desire result.
ERROR
Index was outside the bounds of the array.
thanks in advance.

Why not try this algorithm, looping for each line in the text:

var elements = line.Split('\t');
var words = elements[4].Split(' ');
foreach(var word in words)
{
    Console.WriteLine(string.Concat(elements[0], ",", elements[1], ",", elements[2], ",", elements[3], ",", word));
}

This seems to output exactly what you need. Just change the Console.WriteLine to write to your file.

I understand that you want each word(in the last column) that contain # should be as a new result line So it should be something like

        List<string> result = new List<string>();

        var lines = str.Split('\n');
        foreach (var line in lines)
        {
            var words = line.Split('\t');
            string res = String.Format("{1}{0}{2}{0}{3}{0}{4}", ",", words[0], words[1], words[2], words[3]);

            var xx = words[4].Split(' ').Where(word => word.Contains("#"));
            foreach (var s in xx)
            {
                result.Add(String.Format(res + "," + s));
            }
        }
       private void extrcat()
       {
            char[] delimiters = new char[] { '\r', '\n' };
            using (StreamReader reader = new StreamReader(@"C:\Users\Zia\Desktop\input.txt"))
            {
                string words = reader.ReadToEnd();
                string[] lines = words.Split(delimiters);
                foreach (var item in lines)
                {
                    foreach (var i in findItems(item))
                    {
                        if (i != " ")
                            Console.WriteLine(i);
                    }
                }

            }

        }
        private static List<string> findItems(string item)
        {
            List<string> items = new List<string>();

            if (item.Length <= 0)
            {
                items.Add(" ");
            }
            else
            {
                List<string> names = new List<string>();
                string temp = item.Substring(0, item.IndexOf("#") + 2);
                temp = temp.Replace("\t", ",");
                temp = temp.Replace("\\t", ",");


                items.Add(temp);
                names = item.Split(' ').Where(x => x.Contains('#')).ToList();
                int i = 1;
                while (i < names.Count)
                {
                    temp = items[0].Substring(0, items[0].LastIndexOf(',')+1) + names[i];
                    items.Add(temp);
                    i++;
                }
            }

            return items;

        }

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM