[英]How to format/parse a text file into a CSV using Python (pandas)
我想以單列方式讀取包含測試結果的文本文件(每行有一個測試用例)並將其轉換為具有多列的 CSV 文件,其中列是參加測試的人的姓名及其結果在他們的專欄中。
CSV 文件中的列標題將為:“Matt Test, Mark Test, John Test, Mike Test”
在每個人的列下,他們的結果將從最慢到最快的時間排列。 例如,在“Matt Test”下,他將有 3 行 trl_matt_test 和 6 行 get_trl_time,“Mark Test 將有 2 行 trl_mark_test 和 3 行 get_trl_time 等等......結果每次都會產生不同數量的結果,所以我無法硬編碼行數。
testdata.txt(這是我正在閱讀的文本文件數據):
trl_matt_test:15 秒
trl_matt_test:10 秒
trl_matt_test:12s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mark_test:13s
trl_mark_test:20 秒
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_john_test:20 秒
trl_john_test:25 秒
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mike_test: 2s
get_trl_time: 1s
get_trl_time: 1s
# I want to use pandas and data frame if possible
import pandas as pd
# These are the headers I want to use for the columns in the CSV
header_list = ['Matt Test', 'Mark Test', 'John Test', 'Mike Test']
# I want to use a substring of the test name as a delimiter of where to split off
delimiter_list = ['mark', 'John', 'mike']
# I want to put the row number where the delimiter is to know how many rows
# of data each person has
delimiter_row_nums = []
# the idea behind this is I can know that Matts test are from rows 0-15 and that Marks
# test are from rows 16-20 etc... this
# is just an example but then I can create a list for Matts data [0:15] then a list of
# Marks data [16:20] etc...
# read the file in as a CSV using pandas and save the read file to the data_file
data_file = pd.read_csv("testdata.txt", header = header_list)
# use a count to get the row number needed
count = 1
# for each element in the delimiter list
for delim in delimiter_list:
# for each row or line in the file
for row in data_file:
# if an element in delimiter list is a substring of a row/line in the data file
if row.find(delim) != -1:
# take the new list and sort them then place them under their respected headers
目前還不是很清楚你在尋找什么,但這可能會讓你開始。 我用您提供的數據創建了一個 txt 文件。
df = pd.read_csv('testdata.txt', header=0, names=['Results'])
# map the tester to the data
dd = df.Results.str.split('_', 1).str[1].str.split(':').str[0]
cmap = {'matt_test': 'Matt Test', 'mark_test': 'Mark Test', 'john_test': 'John Test', 'mike_test': 'Mike Test'}
df['Tester'] = dd.map(cmap).fillna(method='ffill') # not sure here if you want forward or back fill
# re-orient the data
df_pivot = df.pivot(columns=['Tester'])
Results
Tester John Test Mark Test Matt Test Mike Test
0 NaN NaN trl_matt_test: 10s NaN
1 NaN NaN trl_matt_test: 12s NaN
2 NaN NaN get_trl_time: 1s NaN
3 NaN NaN get_trl_time: 1s NaN
4 NaN NaN get_trl_time: 1s NaN
5 NaN NaN get_trl_time: 1s NaN
6 NaN NaN get_trl_time: 1s NaN
7 NaN NaN get_trl_time: 1s NaN
8 NaN trl_mark_test: 13s NaN NaN
9 NaN trl_mark_test: 20s NaN NaN
10 NaN get_trl_time: 1s NaN NaN
11 NaN get_trl_time: 1s NaN NaN
12 NaN get_trl_time: 1s NaN NaN
13 trl_john_test: 20s NaN NaN NaN
14 trl_john_test:25s NaN NaN NaN
15 get_trl_time: 1s NaN NaN NaN
16 get_trl_time: 1s NaN NaN NaN
17 get_trl_time: 1s NaN NaN NaN
18 get_trl_time: 1s NaN NaN NaN
19 get_trl_time: 1s NaN NaN NaN
20 get_trl_time: 1s NaN NaN NaN
21 get_trl_time: 1s NaN NaN NaN
22 get_trl_time: 1s NaN NaN NaN
23 get_trl_time: 1s NaN NaN NaN
24 get_trl_time: 1s NaN NaN NaN
25 NaN NaN NaN trl_mike_test: 2s
26 NaN NaN NaN get_trl_time: 1s
27 NaN NaN NaN get_trl_time: 1s
# do a count
df_pivot.count()
Tester
Results John Test 12
Mark Test 5
Matt Test 8
Mike Test 3
dtype: int64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.