简体   繁体   English

如何从C#中的文本文件获取数据

[英]How to get data's from text file in c#

I have a textile contains big amount of data ,first thing is i have to filter Leaf cell data which is scattered over there and here.For this first line i filtered is line beginning with ADD GCELL which contains primary data,next what i have to do is i have to get the related data from the same text file by using CELLID coming in the same ADD GCELL line.Related datas are coming in the line beggining with ADD GTRX and datas are FREQ , TRXNO , ISMAINBCCH , .in nutshell CELLID is the common value for both line ADD GCELL and ADD GTRX . 我有一个包含大量数据的纺织品,第一件事是我必须过滤散布在那里和此处的叶子细胞数据。对于这第一行,我过滤的是以ADD GCELL开头的行,其中包含主要数据,接下来我要做的是我以获得来自同一个文本文件中的相关数据CELLID在同一个未来ADD GCELL line.Related DATAS该行是来与beggining ADD GTRX和的数据都是FREQ , TRXNO , ISMAINBCCH ,在简而言之CELLIDADD GCELLADD GTRX行的公共值。 I have done few coding in c# , but i got stuck somewhere Here is part of text file ........................... ........................... 我在c#中几乎没有进行编码,但是我被卡在某个地方,这是文本文件的一部分................... .......................

ADD GCELL:CELLID=13, CELLNAME="NR_0702_07021_G1_A", MCC="424", MNC="02", LAC=6112, CI=7021, NCC=6, BCC=0, EXTTP=Normal_cell, IUOTP=Concentric_cell, ENIUO=ON, DBFREQBCCHIUO=Extra, FLEXMAIO=OFF, CSVSP=3, CSDSP=5, PSHPSP=4, PSLPSVP=6, BSPBCCHBLKS=1, BSPAGBLKSRES=4, BSPRACHBLKS=1, TYPE=GSM900_DCS1800, OPNAME="Tester", VIPCELL=NO
..............................
ADD GTRX:TRXID=11140, TRXNAME="T_RAK_JaziratHamra_G_702_7021_A-0", FREQ=99, TRXNO=0, CELLID=13, IDTYPE=BYID, ISMAINBCCH=YES, ISTMPTRX=NO, GTRXGROUPID=80;

Code i have done is 我做的代码是

using (StreamReader sr = File.OpenText(filename))
{
    while ((s = sr.ReadLine()) != null)
    {
        if (s.Contains("ADD GCELL:"))
        {
            s = s.Replace("ADD GCELL:", "");
            string[] items = s.Split(',');
            foreach (string str in items)
            {
                string[] str1 = str.Split('=');
                if (str1[0] == "CELLID")
                {
                    cellidnew = str1[1];
                }
                string fieldname = str1[0];
                string value = str1[1].Replace(";", string.Empty).Replace("\"", string.Empty);

            }

            Getgtrxvalues(filename, ref cellname, ref cellidnew, ref Frequency, ref TRXNO ,ref ISMAINBCCH);


        }
    }
}

private static void Getgtrxvalues(string filename, ref string cellname, ref string cellid, ref int Frequency,  ref int TRXNO ,ref bool ISMAINBCCH)
{
    using (StreamReader sr = File.OpenText(filename))
    {
        while ((s = sr.ReadLine()) != null)
        {
            if (s.Contains("ADD GTRX:"))
            {
                try
                {


}
}
}
}

UPDATE 更新

Everything working fine except one more condition i have to satisfy.Here for for ADD Gtrx: i am taking all values including Freq when ISMAINBCCH=YES ,but at the same time ISMAINBCCH=NO there are values for Freq which i have to get as comma seperated values.For example Like here First i will take FREQ where CELLID = 639(dynamic one anything can happen) and ISMAINBCCH=YES,that i have done now next task is i have to contenate FREQ values in a comma seperated way where CELLID=639 and ISMAINBCCH=NO, so here the output i want is 24,28,67 .How to achieve this one 一切正常,除了我必须满足的另外一个条件。这里是ADD Gtrx:当ISMAINBCCH = YES时,我将使用包括Freq在内的所有值,但同时ISMAINBCCH = NO时,我必须将逗号获取的Freq值例如,首先我将使用FREQ,其中CELLID = 639(动态可能发生任何事情),而ISMAINBCCH = YES,这是我现在要做的下一个任务是我必须用逗号分隔的方式压缩FREQ值,其中CELLID = 639并且ISMAINBCCH = NO,所以这里我想要的输出是24,28,67。如何实现这一点

lines are 线是

 ADD GTRX:TRXID=0, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-0", FREQ=81, TRXNO=0, CELLID=639, IDTYPE=BYID, ISMAINBCCH=YES, ISTMPTRX=NO, GTRXGROUPID=2556;
 ADD GTRX:TRXID=1, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-1", FREQ=24, TRXNO=1, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
 ADD GTRX:TRXID=5, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-2", FREQ=28, TRXNO=2, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
 ADD GTRX:TRXID=6, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-3", FREQ=67, TRXNO=3, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;

UPDATE 更新

Finally i did it like shown below code 最后,我做到了,如下面的代码所示

i created one more property DEFINED_TCH_FRQ = null for getting concatenated string.But the problem is it is very slow .I am iterating text file two times ,first time is sr.readline() and second is for getting concatenated string by File.Readline (this aslo previously i used File.Readalllines and got out of memory exception) 我另外创建了一个属性DEFINED_TCH_FRQ = null以获取串联的字符串。但是问题是它非常慢。我迭代文本文件两次,第一次是sr.readline(),第二次是通过File.Readline ()获得串联的字符串(这个以前我也使用File.Readalllines并退出内存异常)

 List<int> intarr = new List<int>();
            intarr.Clear(); 
var gtrx = new Gtrx
                            {
                                CellId = int.Parse(PullValue(s, "CELLID")),
                                Freq = int.Parse(PullValue(s, "FREQ")),
                                TrxNo = int.Parse(PullValue(s, "TRXNO")),
                                IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
                                Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
                                DEFINED_TCH_FRQ = null,

                                TrxName = PullValue(s, "TRXNAME"),

                            };

 if (!intarr.Contains(gtrx.CellId))
                            {

                                if (!_dictionary.ContainsKey(gtrx.CellId))
                                {
                                    // No GCell record for this id. Do something!
                                    continue;
                                }
                                intarr.Add(gtrx.CellId);
                                string results = string.Empty;

                                    var result = String.Join(",",
        from ss in File.ReadLines(filename)
        where ss.Contains("ADD GTRX:")
        where int.Parse(PullValue(ss, "CELLID")) == gtrx.CellId
        where PullValue(ss, "ISMAINBCCH").ToUpper() != "YES"
        select int.Parse(PullValue(ss, "FREQ")));
                                    results = result;


                                var gtrxnew = new Gtrx
                                {
                                    DEFINED_TCH_FRQ = results
                                };

                                _dictionary[gtrx.CellId].Gtrx = gtrx;

UPDATE 更新

Finally i did it like first i saved lines starting with ADD GTRX in to an array by using File.Readalllines and then used only that array to get concatenated string instead of storing entire text file and got some performance improvement.Now my question is if i convert my Text files each contain hundreds of thousands of lines in to xml and then retrieve data from xml file, will it make any performance improvement? 最后我做到了,就像我首先使用File.Readalllines将以ADD GTRX开头的行保存到数组中,然后仅使用该数组来获取串联的字符串,而不是存储整个文本文件并获得了一些性能提升。现在我的问题是我是否将每个包含数十万行的文本文件转换为xml,然后从xml文件中检索数据,这会提高性能吗? if i use datatable and dataset rather than classes here will it make any performance improvement? 如果我在这里使用数据表和数据集而不是类,是否会改善性能?

Assuming the data is consistent and I'm also assuming the GCells will come before GTrx line (since GTrx is referencing the id of the GCell), then you could create a simple parser for doing this and store the values in a dictionary. 假设数据是一致的,并且我还假设GCell将在GTrx行之前(因为GTrx引用了GCell的ID),那么您可以创建一个简单的解析器来将其存储在字典中。

First thing to do is create a class to hold the Gtrx data and the GCell data. 首先要做的是创建一个类来保存Gtrx数据和GCell数据。 Keep in mind that I am just grabbing a subset of the data. 请记住,我只是获取数据的一个子集。 You can add to this if you need more fields: 如果需要更多字段,可以添加到此:

private class Gtrx
{
    public int Freq { get; set; }
    public int TrxNo { get; set; }
    public string TrxName { get; set; }
    public int CellId { get; set; }
    public bool IsMainBcch { get; set; }
}

private class Gcell
{
    public int CellId { get; set; }
    public string CellName { get; set; }
    public string Mcc { get; set; }
    public int Lac { get; set; }
    public int Ci { get; set; }
}

In addition to these classes, we will also need a class to "link" these two classes together: 除了这些类之外,我们还将需要一个类来将这两个类“链接”在一起:

private class GcellGtrx
{
    public Gcell Gcell { get; set; }
    public Gtrx Gtrx { get; set; }
}

Now we can build a simple parser: 现在我们可以构建一个简单的解析器:

private readonly Dictionary<int, GcellGtrx> _dictionary = new Dictionary<int, GcellGtrx>();

string data = "ADD GCELL:CELLID=13, CELLNAME=\"NR_0702_07021_G1_A\", MCC=\"424\", MNC=\"02\", LAC=6112, CI=7021, NCC=6, BCC=0, EXTTP=Normal_cell, IUOTP=Concentric_cell, ENIUO=ON, DBFREQBCCHIUO=Extra, FLEXMAIO=OFF, CSVSP=3, CSDSP=5, PSHPSP=4, PSLPSVP=6, BSPBCCHBLKS=1, BSPAGBLKSRES=4, BSPRACHBLKS=1, TYPE=GSM900_DCS1800, OPNAME=\"Tester\", VIPCELL=NO" + Environment.NewLine;
data = data + "ADD GTRX:TRXID=11140, TRXNAME=\"T_RAK_JaziratHamra_G_702_7021_A-0\", FREQ=99, TRXNO=0, CELLID=13, IDTYPE=BYID, ISMAINBCCH=YES, ISTMPTRX=NO, GTRXGROUPID=80;" + Environment.NewLine;

using (var sr = new StringReader(data))
{
    string line = sr.ReadLine();
    while (line != null)
    {
        line = line.Trim();
        if (line.StartsWith("ADD GCELL:"))
        {
            var gcell = new Gcell
            {
                CellId = int.Parse(PullValue(line, "CELLID")),
                CellName = PullValue(line, "CELLNAME"),
                Ci = int.Parse(PullValue(line, "CI")),
                Lac = int.Parse(PullValue(line, "LAC")),
                Mcc = PullValue(line, "MCC")
            };
            var gcellGtrx = new GcellGtrx();
            gcellGtrx.Gcell = gcell;
            _dictionary.Add(gcell.CellId, gcellGtrx);
        }
        if (line.StartsWith("ADD GTRX:"))
        {
            var gtrx = new Gtrx
            {
                CellId = int.Parse(PullValue(line, "CELLID")),
                Freq = int.Parse(PullValue(line, "FREQ")),
                TrxNo = int.Parse(PullValue(line, "TRXNO")),
                IsMainBcch = PullValue(line, "ISMAINBCCH").ToUpper() == "YES",
                TrxName = PullValue(line, "TRXNAME")
            };

            if (!_dictionary.ContainsKey(gtrx.CellId))
            {
                // No GCell record for this id. Do something!
                continue;
            }
            _dictionary[gtrx.CellId].Gtrx = gtrx;
        }
        line = sr.ReadLine();
    }
}

// Now you can pull your data using a CellId:
// GcellGtrx cell13 = _dictionary[13];
// 
// Or you could iterate through each one:
// foreach (KeyValuePair<int, GcellGtrx> kvp in _dictionary)
// {
//     int key = kvp.Key;
//     GcellGtrx gCellGtrxdata = kvp.Value;
//     // Do Stuff
// }

And finally, we need to define a simple helper method: 最后,我们需要定义一个简单的辅助方法:

private string PullValue(string line, string key)
{
    key = key + "=";
    int ndx = line.IndexOf(key, 0, StringComparison.InvariantCultureIgnoreCase);
    if (ndx >= 0)
    {
        int ndx2 = line.IndexOf(",", ndx, StringComparison.InvariantCultureIgnoreCase);
        if (ndx2 == -1)
            ndx2 = line.Length - 1;
        return line.Substring(ndx + key.Length, ndx2 - ndx - key.Length).Trim('"').Trim();
    }

    return "";
}

That should do it! 那应该做! See if that doesn't work for you. 看看那对您不起作用。 Keep in mind that this is very basic. 请记住,这是非常基本的。 You'd probably want to handle some possible errors (such as the key not existing, etc). 您可能希望处理一些可能的错误(例如,键不存在等)。

You didn't specify what exactly is going wrong, but my guess is that the problem you are having is caused by your split: 您没有指定到底出了什么问题,但是我想您遇到的问题是由拆分引起的:

string[] str1 = str.Split('=');

This split causes your strings to be " CELLID" and "13" (from your file example). 这种拆分会使您的字符串分别为“ CELLID”和“ 13”(来自您的文件示例)。 Notice the space in front of "CELLID". 注意“ CELLID”前面的空间。 This causes the following code to never pass: 这将导致以下代码永不通过:

if (str1[0] == "CELLID")

You could change it to: 您可以将其更改为:

if (str1[0].Trim() == "CELLID")

it might work. 它可能会起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM