簡體   English   中英

如何使用 Python(熊貓)將文本文件格式化/解析為 CSV

[英]How to format/parse a text file into a CSV using Python (pandas)

我想以單列方式讀取包含測試結果的文本文件(每行有一個測試用例)並將其轉換為具有多列的 CSV 文件,其中列是參加測試的人的姓名及其結果在他們的專欄中。

CSV 文件中的列標題將為:“Matt Test, Mark Test, John Test, Mike Test”

在每個人的列下,他們的結果將從最慢到最快的時間排列。 例如,在“Matt Test”下,他將有 3 行 trl_matt_test 和 6 行 get_trl_time,“Mark Test 將有 2 行 trl_mark_test 和 3 行 get_trl_time 等等......結果每次都會產生不同數量的結果,所以我無法硬編碼行數。

testdata.txt(這是我正在閱讀的文本文件數據):

trl_matt_test:15 秒
trl_matt_test:10 秒
trl_matt_test:12s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mark_test:13s
trl_mark_test:20 秒
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_john_test:20 秒
trl_john_test:25 秒
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mike_test: 2s
get_trl_time: 1s
get_trl_time: 1s

# I want to use pandas and data frame if possible
import pandas as pd

# These are the headers I want to use for the columns in the CSV
header_list = ['Matt Test', 'Mark Test', 'John Test', 'Mike Test']

# I want to use a substring of the test name as a delimiter of where to split off   
delimiter_list = ['mark', 'John', 'mike']

# I want to put the row number where the delimiter is to know how many rows
# of data each person has
delimiter_row_nums = []

# the idea behind this is I can know that Matts test are from rows 0-15 and that Marks
# test are from rows 16-20 etc... this
# is just an example but then I can create a list for Matts data [0:15] then a list of 
# Marks data [16:20] etc...    

# read the file in as a CSV using pandas and save the read file to the data_file
data_file = pd.read_csv("testdata.txt", header = header_list)

# use a count to get the row number needed
count = 1

# for each element in the delimiter list
for delim in delimiter_list:
    # for each row or line in the file
    for row in data_file:
        # if an element in delimiter list is a substring of a row/line in the data file 
        if row.find(delim) != -1:

# take the new list and sort them then place them under their respected headers

目前還不是很清楚你在尋找什么,但這可能會讓你開始。 我用您提供的數據創建了一個 txt 文件。

df = pd.read_csv('testdata.txt', header=0, names=['Results'])

# map the tester to the data
dd = df.Results.str.split('_', 1).str[1].str.split(':').str[0]
cmap = {'matt_test': 'Matt Test', 'mark_test': 'Mark Test', 'john_test': 'John Test', 'mike_test': 'Mike Test'}
df['Tester'] = dd.map(cmap).fillna(method='ffill') # not sure here if you want forward or back fill

# re-orient the data
df_pivot = df.pivot(columns=['Tester'])

                       Results
Tester           John Test           Mark Test           Matt Test          Mike Test
0                      NaN                 NaN  trl_matt_test: 10s                NaN
1                      NaN                 NaN  trl_matt_test: 12s                NaN
2                      NaN                 NaN    get_trl_time: 1s                NaN
3                      NaN                 NaN    get_trl_time: 1s                NaN
4                      NaN                 NaN    get_trl_time: 1s                NaN
5                      NaN                 NaN    get_trl_time: 1s                NaN
6                      NaN                 NaN    get_trl_time: 1s                NaN
7                      NaN                 NaN    get_trl_time: 1s                NaN
8                      NaN  trl_mark_test: 13s                 NaN                NaN
9                      NaN  trl_mark_test: 20s                 NaN                NaN
10                     NaN    get_trl_time: 1s                 NaN                NaN
11                     NaN    get_trl_time: 1s                 NaN                NaN
12                     NaN    get_trl_time: 1s                 NaN                NaN
13      trl_john_test: 20s                 NaN                 NaN                NaN
14       trl_john_test:25s                 NaN                 NaN                NaN
15        get_trl_time: 1s                 NaN                 NaN                NaN
16        get_trl_time: 1s                 NaN                 NaN                NaN
17        get_trl_time: 1s                 NaN                 NaN                NaN
18        get_trl_time: 1s                 NaN                 NaN                NaN
19        get_trl_time: 1s                 NaN                 NaN                NaN
20        get_trl_time: 1s                 NaN                 NaN                NaN
21        get_trl_time: 1s                 NaN                 NaN                NaN
22        get_trl_time: 1s                 NaN                 NaN                NaN
23        get_trl_time: 1s                 NaN                 NaN                NaN
24        get_trl_time: 1s                 NaN                 NaN                NaN
25                     NaN                 NaN                 NaN  trl_mike_test: 2s
26                     NaN                 NaN                 NaN   get_trl_time: 1s
27                     NaN                 NaN                 NaN   get_trl_time: 1s



# do a count
df_pivot.count()

         Tester
Results  John Test    12
         Mark Test     5
         Matt Test     8
         Mike Test     3
dtype: int64

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM