如何用C＃解析文本文件

Question

通過文本格式我意味着更復雜的東西。

起初我開始手動將我問這個問題的文本文件中的5000行添加到我的項目中。

文本文件有5000行，長度不同。例如：

1   1   ITEM_ETC_GOLD_01    골드(소)   xxx xxx xxx_TT_DESC 0   0   3   3   5   0   180000  3   0   1   0   0   255 1   1   0   0   0   0   0   0   0   0   0   0   -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_money_small.bsr    xxx xxx xxx 0   2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1   표현할 골드의 양(param1이상) -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   4   ITEM_ETC_HP_POTION_01   HP 회복 약초    xxx SN_ITEM_ETC_HP_POTION_01    SN_ITEM_ETC_HP_POTION_01_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   60  0   0   0   1   21  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_01.ddj   xxx xxx 50  2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP회복양   0   HP회복양(%)    0   MP회복양   0   MP회복양(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   5   ITEM_ETC_HP_POTION_02   HP 회복약 (소)  xxx SN_ITEM_ETC_HP_POTION_02    SN_ITEM_ETC_HP_POTION_02_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   110 0   0   0   2   39  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_02.ddj   xxx xxx 50  2   0   0   2   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP회복양   0   HP회복양(%)    0   MP회복양   0   MP회복양(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

第一個字符（1）和第二個字符（1/4/5）之間的文本不是空格，它是一個制表符。該文本文件中沒有空格。

我想要的是：

我想獲得第二個整數（在我上面發布的三行中，第二個整數是1,4和5），每行中間的字符串表示路徑（以“item”開頭，以文件擴展名“.ddj”）。

我的問題：

當我谷歌“文本格式化C＃” - 我得到的是如何打開文本文件以及如何在C＃中編寫文本文件。我不知道如何在文本文件中搜索文本。我也無法搜索對於第一個整數，因為如果它是一個像我上面發布的三行中的小整數，我將無法找到正確的位置，因為例如“1”可能存在於不同的位置。

我的問題：

這將是最好的如果我寫一個程序，將刪除任何東西，但我需要什么。

在我的腦海中另一種方式是直接搜索該文件，但正如我上面提到的 - 如果它太低，我可能會得到第二個整數的錯誤位置。

請提出建議，我不能手工格式化這一切。

Answer 1

好的，這就是我們的工作：打開文件，逐行讀取，然后按標簽拆分。 然后我們抓住第二個整數並遍歷其余整數以找到路徑。

StreamReader reader = File.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) 
{
    string[] items = line.Split('\t');
    int myInteger = int.Parse(items[1]);   // Here's your integer.

    // Now let's find the path.
    string path = null;
    foreach (string item in items) 
    {
        if (item.StartsWith("item\\") && item.EndsWith(".ddj"))
            path = item;
    }

    // At this point, `myInteger` and `path` contain the values we want
    // for the current line. We can then store those values or print them,
    // or anything else we like.
}

Answer 2

另一個解決方案，這次使用正則表達式：

using System.Text.RegularExpressions;

...

Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");

StreamReader reader = FileInfo.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) {
    Match match = parts.Match(line);
    if (match.Success) {
        int number = int.Parse(match.Group(1).Value);
        string path = match.Group(2).Value;

        // At this point, `number` and `path` contain the values we want
        // for the current line. We can then store those values or print them,
        // or anything else we like.
    }
}

那個表達有點復雜，所以在這里分解：

^        Start of string
\d+      "\d" means "digit" - 0-9. The "+" means "one or more."
         So this means "one or more digits."
\t       This matches a tab.
(\d+)    This also matches one or more digits. This time, though, we capture it
         using brackets. This means we can access it using the Group method.
\t       Another tab.
.+?      "." means "anything." So "one or more of anything". In addition, it's lazy.
         This is to stop it grabbing everything in sight - it'll only grab as much
         as it needs to for the regex to work.
\t       Another tab.

(item\\[^\t]+\.ddj)
    Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"

Answer 3

你可以這樣做：

using (TextReader rdr = OpenYourFile()) {
    string line;
    while ((line = rdr.ReadLine()) != null) {
        string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
        int theInt = Convert.ToInt32(fields[1]);
    }
}

搜索“格式化”時未找到相關結果的原因是您正在執行的操作稱為“解析”。

Answer 4

就像它已經提到的那樣，我強烈建議使用正則表達式（在System.Text中）來完成這種工作。

與RegexBuddy這樣的實用工具相結合，您正在考慮處理任何復雜的文本記錄解析情況，以及快速獲得結果。 該工具使其變得非常簡單。

希望有所幫助。

Answer 5

我發現在這種情況下非常有用的一種方法是老去學校並使用Jet OLEDB提供程序以及schema.ini文件來讀取使用ADO.Net的大型制表符分隔文件。 顯然，只有在知道要導入的文件的格式時，此方法才有用。

public void ImportCsvFile(string filename)
{
    FileInfo file = new FileInfo(filename);

    using (OleDbConnection con = 
            new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
            file.DirectoryName + "\";
            Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))
    {
        using (OleDbCommand cmd = new OleDbCommand(string.Format
                                  ("SELECT * FROM [{0}]", file.Name), con))
        {
            con.Open();

            // Using a DataReader to process the data
            using (OleDbDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    // Process the current reader entry...
                }
            }

            // Using a DataTable to process the data
            using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
            {
                DataTable tbl = new DataTable("MyTable");
                adp.Fill(tbl);

                foreach (DataRow row in tbl.Rows)
                {
                    // Process the current row...
                }
            }
        }
    }
}

一旦你有一個像數據表這樣的漂亮格式的數據，過濾掉你需要的數據變得非常簡單。

Answer 6

試試正則表達式。 您可以在文本中找到某種模式，並將其替換為您想要的內容。 我現在無法給你確切的代碼，但你可以用這個來測試你的表達式。

http://www.radsoftware.com.au/regexdesigner/

Answer 7

您可以打開文件並使用StreamReader.ReadLine逐行讀取文件。 然后你可以使用String.Split將每一行分成幾個部分（使用\\ t分隔符）來提取第二個數字。

由於項目數量不同，您需要在字符串中搜索模式'item \\ * .ddj'。

要刪除項目，您可以（例如）將所有文件的內容保留在內存中，並在用戶單擊“保存”時寫出新文件。

如何用C＃解析文本文件

問題描述

7 個解決方案

解決方案1
53 已采納 2009-05-13 15:59:21

解決方案2
34 2009-05-13 16:09:24

解決方案3
5 2009-05-13 15:58:02

解決方案4
1 2009-05-13 16:15:05

解決方案5
1 2009-05-13 16:28:31

解決方案6
0 2009-05-13 15:58:09

解決方案7
0 2009-05-13 16:00:51

如何用C＃解析文本文件

問題描述

7 個解決方案

解決方案1 53 已采納 2009-05-13 15:59:21

解決方案2 34 2009-05-13 16:09:24

解決方案3 5 2009-05-13 15:58:02

解決方案4 1 2009-05-13 16:15:05

解決方案5 1 2009-05-13 16:28:31

解決方案6 0 2009-05-13 15:58:09

解決方案7 0 2009-05-13 16:00:51

解決方案1
53 已采納 2009-05-13 15:59:21

解決方案2
34 2009-05-13 16:09:24

解決方案3
5 2009-05-13 15:58:02

解決方案4
1 2009-05-13 16:15:05

解決方案5
1 2009-05-13 16:28:31

解決方案6
0 2009-05-13 15:58:09

解決方案7
0 2009-05-13 16:00:51