[英]Remove redundant timestamps from csv
I have created a CSV file from the recording of different sensors using pandas DataFrame.我使用 pandas DataFrame 从不同传感器的记录中创建了 CSV 文件。 The CSV file basically looks like this:
CSV 文件基本上是这样的:
I would like to get rid of the redundant timestamps and instead have all sensor entries that share a timestamp appear in the same row (for example x2 and x3 in the image).我想摆脱多余的时间戳,而是让所有共享时间戳的传感器条目出现在同一行中(例如图像中的 x2 和 x3)。 Also, the labels that share a timestamp are always identical, but would need to be reduced as well.
此外,共享时间戳的标签始终相同,但也需要减少。
So far, I've come across the drop_duplicate()
function which only drops entire rows.到目前为止,我遇到了
drop_duplicate()
function ,它只删除整行。
Edit: here's a text version of the example above:编辑:这是上面示例的文本版本:
timestamp,sensor_a,sensor_b,sensor_c,label
1,x1,,,0
2,,x2,,0
2,,,x3,0
3,x4,,,1
4,,,,1
5,,x6,,1
5,,,x7,1
I will assume that you store the data in a text file sensors.txt
, so we con consolidate the data based on timestamp
according to the following code我将假设您将数据存储在文本文件
sensors.txt
中,因此我们根据以下代码根据timestamp
合并数据
import pandas as pd
df = pd.read_csv('sensors.txt', delimiter=',', header=0)
df2 = df.groupby('timestamp').ffill()
df2['timestamp'] = df['timestamp']
df2 = df2.groupby('timestamp').bfill()
df2['timestamp'] = df['timestamp']
df2 = df2.drop_duplicates()
df = df2[['timestamp', 'sensor_a', 'sensor_b', 'sensor_c', 'label']]
print(df)
output output
timestamp sensor_a sensor_b sensor_c label
0 1 x1 NaN NaN 0
1 2 NaN x2 x3 0
3 3 x4 NaN NaN 1
4 4 NaN NaN NaN 1
5 5 NaN x6 x7 1
further editing will be based on your questions in the comments below进一步的编辑将基于您在下面的评论中的问题
Good Luck祝你好运
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.