简体   繁体   English

在熊猫中循环for DataFrame数据仅将最后一次迭代保存到Excel文件

[英]For loop in pandas DataFrame data only saves last iteration to excel file

I am looping through the rows of the DataFrame data below and checking if eg the value in column Power > 0 and then i want to export these data into an excel file. 我正在遍历下面的DataFrame数据的行,并检查是否例如Power> 0列中的值,然后要将这些数据导出到excel文件中。 This also works but it only writes the last iteration in the excel file, i have come to the conclusion that i need to use the append function in some way but i cannot figure out how to make it work. 这也有效,但是它只将最后一次迭代写入excel文件中,我得出的结论是,我需要以某种方式使用append函数,但我不知道如何使它起作用。

1.  Location    UnitName    Timestamp           Power        Windspeed   Yaw
2.  Bull Creek  F10         01/11/2014 00:00:00 7,563641548  3,957911002 280,5478821     
3.  Bull Creek  F10         01/11/2014 00:20:00 60,73444748  4,24157236  280,4075012
4.  Bull Creek  F10         01/11/2014 00:30:00 63,15441132  4,241089859 280,3903809
5.  Bull Creek  F10         01/11/2014 00:40:00 59,09280396  4,38904965  280,4152527
6.  Bull Creek  F10         01/11/2014 00:50:00 69,26197052  4,374599175 280,3750916
7.  Bull Creek  F10         01/11/2014 01:00:00 101,0624237  5,343887005 280,5173035
8.  Bull Creek  F10         01/11/2014 01:10:00 122,7936935  5,183885235 280,4681702
9.  Bull Creek  F10         01/11/2014 01:20:00 86,57110596  5,046733923 280,3834534     
10. Bull Creek  F10         01/11/2014 01:40:00 16,74042702  3,024427626 280,1408386
11. Bull Creek  F10         01/11/2014 01:50:00 12,5870142   2,931351769 280,1185913
12. Bull Creek  F10         01/11/2014 02:00:00 -1,029753685 3,116549245 279,9686279
13. Bull Creek  F10         01/11/2014 02:10:00 13,35998058  3,448055706 279,8687134
14. Bull Creek  F10         01/11/2014 02:20:00 17,42461395  2,943588415 280,1383057
15. Bull Creek  F10         01/11/2014 02:30:00 -9,614940643 2,744164819 280,6514893   
16. Bull Creek  F10         01/11/2014 02:50:00 -11,01966286 3,554833538 283,1451416
17. Bull Creek  F10         01/11/2014 03:00:00 -4,383010387 4,279259377 283,3281555


import pandas as pd
import os

os.chdir('C:\Users\NIK\.spyder2\PythonScripts')

fileREF = 'FilterDataREF.xlsx'

dataREF = pd.read_excel(fileREF, sheetname='Sheet1')

filePCU = 'FilterDataPCU.xlsx'

dataPCU = pd.read_excel(filePCU, sheetname='Ark1')

for i in range(len(dataREF)):
    for j in range(len(dataPCU)):
      if dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] and dataREF['Power'][i] > 0 and dataPCU['Power'][j] > 0:

    data_REF = pd.DataFrame([dataREF.loc[i]])
    data_PCU = pd.DataFrame([dataPCU.loc[j]])

    writer = pd.ExcelWriter('common_data.xlsx', engine='xlsxwriter')
    # Convert the dataframe to an XlsxWriter Excel object.
    data_REF.to_excel(writer, sheet_name='Sheet1')
    data_PCU.to_excel(writer, sheet_name='Sheet1', startcol=7)

    writer.save()

It saves all the values, you're just overwriting the previous iteration's output every time. 它保存所有值,您每次都只是覆盖前一次迭代的输出。

There are several possible solutions. 有几种可能的解决方案。 You can aggregate results by appending to a dataframe in each iteration, update your position in the excel and pass it as the start_row for to_excel in the next iteration, you could generate multiple excels by changing the filename, probably a whole lot of other options. 您可以通过在每次迭代中将其追加到数据框来聚合结果,更新您在excel中的位置,并将其作为下一次迭代中to_excel的start_row传递,您可以通过更改文件名(可能还有很多其他选项)来生成多个excel。

There's lots of ways to do this. 有很多方法可以做到这一点。 Might I suggest... instead of looping over each row of the DataFrames, try joining or merging them? 我可能建议...而不是遍历DataFrames的每一行,而是尝试加入合并它们?

merged_data = data_REF.merge(data_PCU, on=['Timestamp'], l_suffix='', r_suffix='PCU')

The above will inner join data_REF and data_PCU on the field Timestamp . 上面将在字段Timestamp上内部data_REFdata_PCU I did this since I saw you had dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] in your code. 我这样做是因为看到代码中有dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] Note that l_suffix='' means that any columns in data_REF that are similarly named in data_PCU will remain. 注意l_suffix=''表示data_REF中在data_PCU中类似命名的任何列都将保留。 Meanwhile, columns similarly named for data_PCU will have _PCU added to the suffix. 同时,为data_PCU命名类似的列会将_PCU添加到后缀。 So Timestamp_PCU as an example Timestamp_PCU为例

Once you have a merged DataFrame you can start doing something like 合并了DataFrame之后,您就可以开始执行类似的操作

pow_gt_zero = (merged_data['Power'] > 0) & (merged_data['Power_PCU'] > 0)
valid_df = merged_data.loc[pow_gt_zero]

Using .loc above, you are getting a subset of the DataFrame where the condition pow_gt_zero is satisfied. 使用上面的.loc ,您将获得满足条件pow_gt_zero的DataFrame的子集。

Now that you have the rows that meet your conditions, you can reference these Timestamps again. 现在您已经拥有满足条件的行,您可以再次引用这些时间戳。 You can use them to subset the original DataFrames so that you may write them out to Excel. 您可以使用它们来子集原始DataFrame,以便可以将它们写到Excel中。

data_REF = data_REF.loc[data_REF['Timestamp'].isin(valid_df['Timestamp'])
data_PCU = data_PCU.loc[data_PCU['Timestamp'].isin(valid_df['Timestamp'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM