如何使用 Python（熊貓）將文本文件格式化/解析為 CSV

Question

我想以單列方式讀取包含測試結果的文本文件（每行有一個測試用例）並將其轉換為具有多列的 CSV 文件，其中列是參加測試的人的姓名及其結果在他們的專欄中。

CSV 文件中的列標題將為：“Matt Test, Mark Test, John Test, Mike Test”

在每個人的列下，他們的結果將從最慢到最快的時間排列。 例如，在“Matt Test”下，他將有 3 行 trl_matt_test 和 6 行 get_trl_time，“Mark Test 將有 2 行 trl_mark_test 和 3 行 get_trl_time 等等......結果每次都會產生不同數量的結果，所以我無法硬編碼行數。

testdata.txt（這是我正在閱讀的文本文件數據）：

trl_matt_test：15 秒
trl_matt_test：10 秒
trl_matt_test：12s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mark_test：13s
trl_mark_test：20 秒
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_john_test：20 秒
trl_john_test：25 秒
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mike_test: 2s
get_trl_time: 1s
get_trl_time: 1s

# I want to use pandas and data frame if possible
import pandas as pd

# These are the headers I want to use for the columns in the CSV
header_list = ['Matt Test', 'Mark Test', 'John Test', 'Mike Test']

# I want to use a substring of the test name as a delimiter of where to split off   
delimiter_list = ['mark', 'John', 'mike']

# I want to put the row number where the delimiter is to know how many rows
# of data each person has
delimiter_row_nums = []

# the idea behind this is I can know that Matts test are from rows 0-15 and that Marks
# test are from rows 16-20 etc... this
# is just an example but then I can create a list for Matts data [0:15] then a list of 
# Marks data [16:20] etc...    

# read the file in as a CSV using pandas and save the read file to the data_file
data_file = pd.read_csv("testdata.txt", header = header_list)

# use a count to get the row number needed
count = 1

# for each element in the delimiter list
for delim in delimiter_list:
    # for each row or line in the file
    for row in data_file:
        # if an element in delimiter list is a substring of a row/line in the data file 
        if row.find(delim) != -1:

# take the new list and sort them then place them under their respected headers

Answer 1

目前還不是很清楚你在尋找什么，但這可能會讓你開始。 我用您提供的數據創建了一個 txt 文件。

df = pd.read_csv('testdata.txt', header=0, names=['Results'])

# map the tester to the data
dd = df.Results.str.split('_', 1).str[1].str.split(':').str[0]
cmap = {'matt_test': 'Matt Test', 'mark_test': 'Mark Test', 'john_test': 'John Test', 'mike_test': 'Mike Test'}
df['Tester'] = dd.map(cmap).fillna(method='ffill') # not sure here if you want forward or back fill

# re-orient the data
df_pivot = df.pivot(columns=['Tester'])

                       Results
Tester           John Test           Mark Test           Matt Test          Mike Test
0                      NaN                 NaN  trl_matt_test: 10s                NaN
1                      NaN                 NaN  trl_matt_test: 12s                NaN
2                      NaN                 NaN    get_trl_time: 1s                NaN
3                      NaN                 NaN    get_trl_time: 1s                NaN
4                      NaN                 NaN    get_trl_time: 1s                NaN
5                      NaN                 NaN    get_trl_time: 1s                NaN
6                      NaN                 NaN    get_trl_time: 1s                NaN
7                      NaN                 NaN    get_trl_time: 1s                NaN
8                      NaN  trl_mark_test: 13s                 NaN                NaN
9                      NaN  trl_mark_test: 20s                 NaN                NaN
10                     NaN    get_trl_time: 1s                 NaN                NaN
11                     NaN    get_trl_time: 1s                 NaN                NaN
12                     NaN    get_trl_time: 1s                 NaN                NaN
13      trl_john_test: 20s                 NaN                 NaN                NaN
14       trl_john_test:25s                 NaN                 NaN                NaN
15        get_trl_time: 1s                 NaN                 NaN                NaN
16        get_trl_time: 1s                 NaN                 NaN                NaN
17        get_trl_time: 1s                 NaN                 NaN                NaN
18        get_trl_time: 1s                 NaN                 NaN                NaN
19        get_trl_time: 1s                 NaN                 NaN                NaN
20        get_trl_time: 1s                 NaN                 NaN                NaN
21        get_trl_time: 1s                 NaN                 NaN                NaN
22        get_trl_time: 1s                 NaN                 NaN                NaN
23        get_trl_time: 1s                 NaN                 NaN                NaN
24        get_trl_time: 1s                 NaN                 NaN                NaN
25                     NaN                 NaN                 NaN  trl_mike_test: 2s
26                     NaN                 NaN                 NaN   get_trl_time: 1s
27                     NaN                 NaN                 NaN   get_trl_time: 1s



# do a count
df_pivot.count()

         Tester
Results  John Test    12
         Mark Test     5
         Matt Test     8
         Mike Test     3
dtype: int64

如何使用 Python（熊貓）將文本文件格式化/解析為 CSV

問題描述

1 個解決方案

解決方案1
1 2021-05-14 17:32:22

如何使用 Python（熊貓）將文本文件格式化/解析為 CSV

問題描述

1 個解決方案

解決方案1 1 2021-05-14 17:32:22

解決方案1
1 2021-05-14 17:32:22