仅将不同的数据从一个.csv附加到另一个.csv

Question

I have managed to use Python with the speedtest-cli package to run a speedtest of my Internet speed. 我已经设法将Python与speedtest-cli软件包结合使用来对我的Internet速度进行速度测试。 I run this every 15 min and append the results to a .csv file I call "speedtest.csv". 我每15分钟运行一次，并将结果附加到一个名为“ speedtest.csv”的.csv文件中。 I then have this .csv file emailed to me every 12 hours, which is a lot of data. 然后，我每隔12小时将这个.csv文件通过电子邮件发送给我，其中包含大量数据。

I am only interested in keeping the rows of data that return less than 13mbps Download speed. 我只对保持返回小于13mbps下载速度的数据行感兴趣。 Using the following code, I am able to filter for this data and append it to a second .csv file I call speedtestfilteronly.csv . 使用以下代码，我可以过滤此数据并将其附加到另一个名为speedtestfilteronly.csv .csv文件中。

import pandas as pd
df = pd.read_csv('c:\speedtest.csv', header=0)
df = df[df['Download'].map(lambda x: x < 13000000.0,)]
df.to_csv('c:\speedtestfilteronly.csv', mode='a', header=False)

The problem now is it appends all the rows that match my filter criteria every time I run this code. 现在的问题是，每次我运行此代码时，它都会附加与我的过滤条件匹配的所有行。 So if I run this code 4 times, I receive the same 4 sets of appended data in the "speedtestfilteronly.csv" file. 因此，如果我运行此代码4次，则在“ speedtestfilteronly.csv”文件中会收到相同的4组附加数据。

I am looking to only append unlike rows from speedtest.csv to speedtestfilteronly.csv. 我只想追加从speedtest.csv到speedtestfilteronly.csv的行。

How can I achieve this? 我该如何实现？

I have got the following code to work, except the only thing it is not doing is filtering the results to < 13000000.0 mb/s: Any other ideas? 我有以下代码可以工作，除了它唯一没有做的就是将结果过滤到<13000000.0 mb / s：还有其他想法吗？

import pandas as pd

df = pd.read_csv('c:\speedtest.csv', header=0)  
df = df[df['Download'].map(lambda x: x < 13000000.0,)]

history_df = pd.read_csv('c:\speedtest.csv')
master_df = pd.concat([history_df, df], axis=0)
new_master_df = master_df.drop_duplicates(keep="first")
new_master_df.to_csv('c:\emailspeedtest.csv', header=None, index=False)

Answer 1

There's a few different way you could approach this, one would be to read in your filtered dataset, append the new one in memory and then drop duplicates like this: 您可以采用几种不同的方法，一种方法是读取过滤后的数据集，将新的数据集追加到内存中，然后像这样删除重复项：

import pandas as pd

df = pd.read_csv('c:\speedtest.csv', header=0)
df = df[df['Download'].map(lambda x: x < 13000000.0,)]

history_df = pd.read_csv('c:\speedtestfilteronly.csv', header=None)
master_df = pd.concat([history_df, df], axis=0)
new_master_df = master_df.drop_duplicates(keep="first")
new_master_df.to_csv('c:\speedtestfilteronly.csv', header=None, index=False)

仅将不同的数据从一个.csv附加到另一个.csv

问题描述

1 个解决方案

解决方案1
2 2018-10-01 19:38:03

仅将不同的数据从一个.csv附加到另一个.csv

问题描述

1 个解决方案

解决方案1 2 2018-10-01 19:38:03

解决方案1
2 2018-10-01 19:38:03