[英]For loop in pandas DataFrame data only saves last iteration to excel file
I am looping through the rows of the DataFrame data below and checking if eg the value in column Power > 0 and then i want to export these data into an excel file. 我正在遍历下面的DataFrame数据的行,并检查是否例如Power> 0列中的值,然后要将这些数据导出到excel文件中。 This also works but it only writes the last iteration in the excel file, i have come to the conclusion that i need to use the append function in some way but i cannot figure out how to make it work.
这也有效,但是它只将最后一次迭代写入excel文件中,我得出的结论是,我需要以某种方式使用append函数,但我不知道如何使它起作用。
1. Location UnitName Timestamp Power Windspeed Yaw
2. Bull Creek F10 01/11/2014 00:00:00 7,563641548 3,957911002 280,5478821
3. Bull Creek F10 01/11/2014 00:20:00 60,73444748 4,24157236 280,4075012
4. Bull Creek F10 01/11/2014 00:30:00 63,15441132 4,241089859 280,3903809
5. Bull Creek F10 01/11/2014 00:40:00 59,09280396 4,38904965 280,4152527
6. Bull Creek F10 01/11/2014 00:50:00 69,26197052 4,374599175 280,3750916
7. Bull Creek F10 01/11/2014 01:00:00 101,0624237 5,343887005 280,5173035
8. Bull Creek F10 01/11/2014 01:10:00 122,7936935 5,183885235 280,4681702
9. Bull Creek F10 01/11/2014 01:20:00 86,57110596 5,046733923 280,3834534
10. Bull Creek F10 01/11/2014 01:40:00 16,74042702 3,024427626 280,1408386
11. Bull Creek F10 01/11/2014 01:50:00 12,5870142 2,931351769 280,1185913
12. Bull Creek F10 01/11/2014 02:00:00 -1,029753685 3,116549245 279,9686279
13. Bull Creek F10 01/11/2014 02:10:00 13,35998058 3,448055706 279,8687134
14. Bull Creek F10 01/11/2014 02:20:00 17,42461395 2,943588415 280,1383057
15. Bull Creek F10 01/11/2014 02:30:00 -9,614940643 2,744164819 280,6514893
16. Bull Creek F10 01/11/2014 02:50:00 -11,01966286 3,554833538 283,1451416
17. Bull Creek F10 01/11/2014 03:00:00 -4,383010387 4,279259377 283,3281555
import pandas as pd
import os
os.chdir('C:\Users\NIK\.spyder2\PythonScripts')
fileREF = 'FilterDataREF.xlsx'
dataREF = pd.read_excel(fileREF, sheetname='Sheet1')
filePCU = 'FilterDataPCU.xlsx'
dataPCU = pd.read_excel(filePCU, sheetname='Ark1')
for i in range(len(dataREF)):
for j in range(len(dataPCU)):
if dataREF['Timestamp'][i] == dataPCU['Timestamp'][j] and dataREF['Power'][i] > 0 and dataPCU['Power'][j] > 0:
data_REF = pd.DataFrame([dataREF.loc[i]])
data_PCU = pd.DataFrame([dataPCU.loc[j]])
writer = pd.ExcelWriter('common_data.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
data_REF.to_excel(writer, sheet_name='Sheet1')
data_PCU.to_excel(writer, sheet_name='Sheet1', startcol=7)
writer.save()
It saves all the values, you're just overwriting the previous iteration's output every time. 它保存所有值,您每次都只是覆盖前一次迭代的输出。
There are several possible solutions. 有几种可能的解决方案。 You can aggregate results by appending to a dataframe in each iteration, update your position in the excel and pass it as the start_row for to_excel in the next iteration, you could generate multiple excels by changing the filename, probably a whole lot of other options.
您可以通过在每次迭代中将其追加到数据框来聚合结果,更新您在excel中的位置,并将其作为下一次迭代中to_excel的start_row传递,您可以通过更改文件名(可能还有很多其他选项)来生成多个excel。
There's lots of ways to do this. 有很多方法可以做到这一点。 Might I suggest... instead of looping over each row of the DataFrames, try joining or merging them?
我可能建议...而不是遍历DataFrames的每一行,而是尝试加入或合并它们?
merged_data = data_REF.merge(data_PCU, on=['Timestamp'], l_suffix='', r_suffix='PCU')
The above will inner join data_REF
and data_PCU
on the field Timestamp
. 上面将在字段
Timestamp
上内部data_REF
和data_PCU
。 I did this since I saw you had dataREF['Timestamp'][i] == dataPCU['Timestamp'][j]
in your code. 我这样做是因为看到代码中有
dataREF['Timestamp'][i] == dataPCU['Timestamp'][j]
。 Note that l_suffix=''
means that any columns in data_REF that are similarly named in data_PCU will remain. 注意
l_suffix=''
表示data_REF中在data_PCU中类似命名的任何列都将保留。 Meanwhile, columns similarly named for data_PCU will have _PCU added to the suffix. 同时,为data_PCU命名类似的列会将_PCU添加到后缀。 So
Timestamp_PCU
as an example 以
Timestamp_PCU
为例
Once you have a merged DataFrame you can start doing something like 合并了DataFrame之后,您就可以开始执行类似的操作
pow_gt_zero = (merged_data['Power'] > 0) & (merged_data['Power_PCU'] > 0)
valid_df = merged_data.loc[pow_gt_zero]
Using .loc above, you are getting a subset of the DataFrame where the condition pow_gt_zero
is satisfied. 使用上面的.loc ,您将获得满足条件
pow_gt_zero
的DataFrame的子集。
Now that you have the rows that meet your conditions, you can reference these Timestamps again. 现在您已经拥有满足条件的行,您可以再次引用这些时间戳。 You can use them to subset the original DataFrames so that you may write them out to Excel.
您可以使用它们来子集原始DataFrame,以便可以将它们写到Excel中。
data_REF = data_REF.loc[data_REF['Timestamp'].isin(valid_df['Timestamp'])
data_PCU = data_PCU.loc[data_PCU['Timestamp'].isin(valid_df['Timestamp'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.