在Python中將多個.txt文件轉換為.csv文件時，各列未用逗號分隔

Question

我有一個充滿4000個txt文件的時間序列數據的文件夾，我想使用Pandas等進行分析。我已經能夠對其進行重命名並將其轉換為csv文件，但是這些列未正確分組。 我在這里瀏覽了幾篇文章，並觀看了一些有關解析txt文件等的視頻，但到目前為止我沒有嘗試過。

這是其中一個txt文件的示例。 我在記事本中看不到任何前導或尾隨空格，制表符或換行符：

這是我正在處理的代碼，主要是從這里獲得的：

file = 'patient0.txt'
csv_f = "patient0.csv"
with open(file,'r') as in_txt:
        stripped = (line.strip() for line in in_txt)
        lines = (line for line in stripped if line)
        grouped = zip(*[lines]*3)
        with open(csv_f,'w') as out_file:
            writer = csv.writer(out_file)
            writer.writerows(grouped)

這是生成的csv文件。

這是我需要格式化的方式：

我剛剛了解了發電機的今天。 這是將它們轉換為列表時的結果：

Answer 1

看來原始csv中的冒號代表換行，因此請將原始文本文件中的這些冒號轉換為新行，然后將其另存為csv。 然后應使用以下命令輕松解析：

將熊貓作為pd導入

df = pd.read_csv（csv_file_name）

Answer 2

**編輯**

我只是意識到這並不能產生您想要的確切格式。 留給其他人發現它有用

**編輯**

您要查找的數據看起來像是使用鍵，值對的自定義格式。 我不知道您是否要使用csv模塊讀取這些文件。 （盡管在編寫輸出csv文件時非常有用）

格式如下：

不同的行可能具有不同的參數（無法從您提供的很小的數據片段中看出來）。 看起來您在文件的前面添加了“時間，參數，值”，這就是為什么我們看到奇怪的“ Value00：00”條目的原因。 我認為您是要在Value之后放一個換行符。

我制作了一個虛擬文件，其中包含一些數據，因為我認為您擁有它：

00:00, RecordID,5,Age,73
00:42,PaCO2,3400:42,PaO2,34401:11
01:11,SysABP,10501:11,Temp,35.201:11

在這里，我們期望輸出csv文件具有的唯一列名稱是

RecordID, Age, PaCO2, PaO2, SysABP, Temp

我們需要遍歷文件以發現所有這些。 找到它們之后，我們可以使用相應的列創建一個csv.DictWriter。 然后，我們再次遍歷輸入文件，將所有內容寫入dict。

我在上面創建的虛擬文件上成功測試了此腳本。 希望從腳本中的注釋中可以很清楚地了解發生了什么。

import csv


def txt_to_csv(input_filenames):

    for input_filename in input_filenames:
        column_names = set()
        output_filename = input_filename[:-4] + '.csv'
        with open(input_filename, 'rb') as in_txt:

            # figure out which column names are in the file on at least one line
            for line in in_txt:

                # get a list of parameters that were split by comma in the input txt file
                params = line.strip().split(",")

                # lines[1::2] slices out every other entry starting with the first column name
                # we or the entries into the set to keep our memory footprint small by only
                # storing one copy of each unique column name
                # we strip each entry of any extra whitespace while doing a set comprehension.
                column_names |= set(params[1::2])

                # notice that we always skip the first column with the timestamp by starting at 1

            # strip off any extra whitespace in column names
            column_names = {x.strip() for x in column_names}

            # add in missing timestamp column to the column names
            column_names.add('timestamp')

            # sort column names and convert python3 strings to bytes as required by csv module
            sorted_column_names = sorted(column_names)

            # bring the pointer back to the beginning of the file
            in_txt.seek(0, 0)

            # open a csv file and start writing the output
            with open(output_filename, 'wb') as out_csv:
                writer = csv.DictWriter(out_csv, sorted_column_names, dialect='excel')

                # write column names
                writer.writeheader()

                for line in in_txt:
                    # create a list of values for this line
                    params = [x.strip() for x in line.strip().split(",")]

                    # turn key value pairs into dictionary
                    row_dict = dict(zip(params[1::2], params[2::2]))

                    # write timestamp entry to the dictionary
                    row_dict['timestamp'] = params[0]

                    # write row to file
                    writer.writerow(row_dict)


if __name__ == '__main__':
    input_filenames = [r'C:\Users\cruse\Desktop\dummy_data.txt']
    txt_to_csv(input_filenames)

我得到的輸出是

Age PaCO2       PaO2      RecordID  SysABP      Temp            timestamp
73                        5                                     0:00
    3400:42:00  34401:11                                        0:42
                                    10501:11    35.201:11       1:11

該數據集哪個正確。 然后，您將使用Pandas之類的工具通過時間傳播價值。 （即使用pd.fillna將相同的RecordID分配給所有后續行）

如果要處理更多文件，只需在底部的input_filenames列表中添加更多路徑。

在Python中將多個.txt文件轉換為.csv文件時，各列未用逗號分隔

問題描述

2 個解決方案

解決方案1
1 2016-12-05 23:56:45

解決方案2
1 已采納 2016-12-06 00:58:19

在Python中將多個.txt文件轉換為.csv文件時，各列未用逗號分隔

問題描述

2 個解決方案

解決方案1 1 2016-12-05 23:56:45

解決方案2 1 已采納 2016-12-06 00:58:19

解決方案1
1 2016-12-05 23:56:45

解決方案2
1 已采納 2016-12-06 00:58:19