[英]Operations on Columns multiple files Pandas
I am trying to perform a some arithmetic operations in Python Pandas and merge the result in one of the file. 我正在尝试在Python Pandas中执行一些算术运算并将结果合并到文件之一中。
Path_1: File_1.csv, File_2.csv, ....
This path has several file which are supposed to be increasing in time intervals. 此路径有几个文件,应该在时间间隔中增加。 with the following columns 与以下列
File_1.csv | File_2.csv
Nos,12:00:00 | Nos,12:30:00
123,1451 485,5464
656,4544 456,4865
853,5484 658,4584
Path_2: Master_1.csv
Nos,00:00:00
123,2000
485,1500
656,1000
853,2500
456,4500
658,5000
I am trying to read the n
number of .csv
files from Path_1
and compare the col[1]
header timeseries with col[last]
timeseries of Master_1.csv
. 我正在尝试从Path_1
读取n
个.csv
文件,并将col[1]
标头时间序列与Master_1.csv
col[last]
时间序列进行Master_1.csv
。
If Master_1.csv
does not have that time it should create a new column with timeseries from path_1 .csv
files and update the values with respect col['Nos']
while subtracting them from col[1]
of Master_1.csv
. 如果Master_1.csv
没有那个时候应该创建从时间序列的新列path_1 .csv
文件,对于更新值col['Nos']
同时从他们减去col[1]
的Master_1.csv
。
If the col
with time from path_1 file
is present then look for col['Nos']
and then replace the NAN
with the subtracted values respect to that col['Nos']
. 如果存在来自path_1 file
带有时间的col
,则查找col['Nos']
,然后将NAN
替换为相对于该col['Nos']
的减去值。
ie 即
Expected Output in Master_1.csv Master_1.csv中的预期输出
Nos,00:00:00,12:00:00,12:30:00,
123,2000,549,NAN,
485,1500,NAN,3964,
656,1000,3544,NAN
853,2500,2984,NAN
456,4500,NAN,365
658,5000,NAN,-416
I can understand the arithmetic calculations but I am not able to loop in with respect to Nos
and timeseries
I have tried to put some code together and trying to work around looping. 我可以理解算术计算,但是我无法就Nos
和timeseries
进行循环,我试图将一些代码放在一起并尝试解决循环问题。 Need help in that context. 在这种情况下需要帮助。 Thanks 谢谢
import pandas as pd
import numpy as np
path_1 = '/'
path_2 = '/'
df_1 = pd.read_csv(os.path_1('/.*csv'), Index=None, columns=['Nos', 'timeseries'] #times series is different in every file eg: 12:00, 12:30, 17:30 etc
df_2 = pd.read_csv('master_1.csv', Index=None, columns=['Nos', '00:00:00']) #00:00:00 time series
for Nos in df_1 and df_2:
df_1['Nos'] = df_2['Nos']
new_tseries = df_2['00:00:00'] - df_1['timeseries']
merged.concat('master_1.csv', Index=None, columns=['Nos', '00:00:00', 'new_tseries'], axis=0) # new_timeseries is the dynamic time series that every .csv file will have from path_1
You can do it in three steps 您可以分三步完成
Here's some code you could try: 您可以尝试以下代码:
#read dataframes into a list
import glob
L = []
for fname in glob.glob(path_1+'*.csv'):
L.append(df.read_csv(fname))
#read master dataframe, and merge in other dataframes
df_2 = pd.read_csv('master_1.csv')
for df in L:
df_2 = pd.merge(df_2,df, on = 'Nos', how = 'left')
#for each column, caluculate the difference with the master column
df_2.apply(lambda x: x - df_2['00:00:00'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.