如何處理文本文件中的數據

Question

我正在嘗試制作一個程序，該程序需要一個大的整數數據文件，並以另一種格式創建一個新的csv，其中它需要30行的x,y,z ，並將它們合並為一行。

大型數據集的格式為（ timestamp, x,y,z ）

例如：

0.000, 5, 6, 8,

1.000, -6, 7, 9,

2.000, -15, 25, 23,

要么：

timestamp, x1, y1, z1

timestamp, x2, y2, z2

timestamp, x3, y3, z3

新的數據集將如下所示：

delta timestamp, x1, y1, z1, x2, y2, z2, x3, y3, z3....x30, y30, z30,

delta timestamp, x31, y31, z31, x32, y32, z32, x33,... x60, y60, z60,

等等。（每行包含30 x,y,z ）

我想到了可能每30行添加一個\\ n，然后用逗號替換每行。 而且我下面的代碼不起作用。 它只是在新數據看起來像這樣的地方加了一個逗號：

timestamp, x1, y1, z1,, timestamp, x2, y2, z2,, timestamp...

你們有什么主意嗎？

list = []
import csv
i=0
results = []
with open('bikefall.csv', newline='') as inputfile:
    for row in csv.reader(inputfile):
        i+=1
        if i%30==0:
            results.append(row)
            results.append('\n')
        else:
            results.append(row)

print("\n".join([item.replace('\n', ',') for item in 
open('bikefall.csv').read().split('\n\n')]))

Answer 1

我不知道您如何計算增量，所以我只放了一個占位符函數。

關於你的代碼，你可以使用改進一點點enumerate ，所以你不必更新i手動。

您還可以使用切片符號在csv文件中獲取每行的前4個項目。

import csv

def calculate_delta(timestamps):
    pass

output = ""

with open('bikefall.csv', "r") as inputfile:
    timestamps = []
    results = []
    for i, row in enumerate(csv.reader(inputfile)):
        timestamp, x, y, z = row[:4]
        timestamps.append(timestamp)
        results.extend((x, y, z))
        if len(timestamps) == 30:
            delta = calculate_delta(timestamps)
            str_timestamps = ", ".join(results)
            output += "{}, {}\n".format(delta, str_timestamps)
            timestamps = []
            results = []

print(output)

這段代碼有一個錯誤，當CSV中只有29行時會發生什么？

這29行將被忽略，因此您仍然需要檢查當前行是否為csv文件中的最后一行，並進行相應處理。

Answer 2

一種方法是一次讀取30塊的CSV文件。 然后合並這些行。 我假設delta是通過從每個塊的最后一個時間戳減去第一個時間戳來計算的（另一種可能是每個塊的開始之間存在差異，所以第一個為0？）：

from itertools import zip_longest
import csv

f_input = open('bikefall.csv', newline='')
f_output = open('output.csv', 'w', newline='')

with f_input, f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for rows in zip_longest(*[iter(csv_input)] * 30, fillvalue=None):
        rows = [[float(row[0])] + row[1:] for row in rows if row]
        delta = rows[-1][0] - rows[0][0]
        combined = [delta]

        for row in rows:
            combined.extend([row[1], row[2], row[3]])

        csv_output.writerow(combined)

分組基於Python文檔中的itertools grouper grouper()配方。

Answer 3

這是zip的完美工作。 這是一個解決方案，比以前的答案多了pythonic：

with open('bikefall.csv') as inputfile:
    # version using csv reader
    matrix = [[line[0],','.join(line[1:])] for line in csv.reader(inputfile)]
    # version using standard text file reader
    #matrix = [line.strip().split(',', maxsplit=1) for line in inputfile]

stamps, coords = zip(*matrix) # split matrix into stamps and coords

for n in range(0, len(stamps), 30):
  print(','.join((stamps[n],) + coords[n:n+30]))

注意：由於采用了切片符號，因此可以自動管理可能少於30個項目的最后一行。

如何處理文本文件中的數據

問題描述

3 個解決方案

解決方案1
0 2018-03-16 17:37:06

解決方案2
0 已采納 2018-03-16 18:03:23

解決方案3
0 2018-03-16 19:10:39

如何處理文本文件中的數據

問題描述

3 個解決方案

解決方案1 0 2018-03-16 17:37:06

解決方案2 0 已采納 2018-03-16 18:03:23

解決方案3 0 2018-03-16 19:10:39

解決方案1
0 2018-03-16 17:37:06

解決方案2
0 已采納 2018-03-16 18:03:23

解決方案3
0 2018-03-16 19:10:39