简体   繁体   中英

need help sorting the data off a tsv in a C# console program.

I am trying to sort the data using the 3rd column from a tab separated txt file. Tried a couple of ways, not sure how I sort it using the 3rd column. have sorted it using the first for now. Also I need to remove duplicates from the 3rd column(Case sensitiver ie riVEr is different than River) Here is my code so far. will mark as an answer as soon as I get progress. Thanks ;)

string[] lines = File.ReadAllLines(@"d:\instance_test.txt");
//Dictionary<String, Int32> EAR_appcode = new Dictionary<String, Int32>();
//Console.WriteLine();
//Console.ReadLine();
//// Display the file contents by using a foreach loop.
//System.Console.WriteLine("Contents of WriteLines2.txt = ");
//foreach (string line in lines)
//{
//    // Use a tab to indent each line of the file.
//    Console.WriteLine("\t" + line.Substring(4));
//    Console.ReadLine();
//}
var no = lines;

var orderedScores = lines.OrderBy(x => x.Split(' ')[0]);
//string result = Regex.Split(no, @"[,\t ]+");
foreach (var score in orderedScores)
{
    string replacement = Regex.Replace(score, @"\t|\n|\r", "           ");
    DataTable table = new DataTable();
    table.Columns.Add("myCol", typeof(string));
    table.Columns.Add("myCol2", typeof(string));
    table.Columns.Add("EAR_appcode", typeof(string));
    table.Rows.Add(11, "abc11");
    table.Rows.Add(13, "abc13");
    table.Rows.Add(12, "abc12");
    Console.WriteLine(replacement) ;
    Console.ReadLine();

}
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit.");
System.Console.ReadKey();

}

Something like:

// read lines somehow
// ...
// create a list
var list = new List<Tuple<string, string, string>>();
foreach(string line in lines)
{
    var split = line.Split('\x9');
    list.Add(new Tuple(split[0], split[1], split[2]));
}
// sort
list = list.OrderBy(x => x.Item3);
// remove duplicates
for(int i = 1; i < list.Count; i++)
    if(list[i].Item3 == list[i-1].Item3)
        list.RemoveAt(i);

I believe all above can be done with just one linq expression, but I am very bad in it. Have to steal OrderBy part from you anyway ^^.

If you don't have .Net Framework 4.0, then substitute Tuple with non-generic version (declare list as List<Tuple> ):

class Tuple
{
    public string Item1;
    public string Item2;
    public string Item3;
    public Tuple(string i1, string i2, string i3)
    {
        Item1 = i1;
        Item2 = i2;
        Item3 = i3;
    }
}

This is my sample data:

Col1    Col2    Col3
zxcv    789 14:02
asdf    123 12:00
qwer    456 13:01
asdf    123 12:00

I used this LINQ statement to:

  1. Create a range of indexes from "start" to "lines.Length - 1"
  2. Split by '\\t'
  3. Dump each column into an anonymous type
  4. Group by string that is a combination of all columns
  5. Select only the first item for each group
  6. Sort by column 3

     static void Main(string[] args) { string[] lines = File.ReadAllLines("Tab.txt"); int start = 1; // set to zero, if no header var records = (from i in Enumerable.Range(start, lines.Length - 1) let pieces = lines[i].Split('\\t') select new { Col1 = pieces[0], Col2 = pieces[1], Col3 = pieces[2] }) .GroupBy(c => c.Col1 + c.Col2 + c.Col3) .Select(gr => gr.First()) .OrderBy(c => c.Col3); foreach (var r in records) Console.WriteLine("{0}, {1}, {2}", r.Col1, r.Col2, r.Col3); Console.WriteLine(); Console.WriteLine("Done"); Console.ReadLine(); } 

Of course, you can add parsing/conversion code in the last line of the LINQ statement to order by int or DateTime.

And I tested it ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM