[英]Recursive pd.merge() output error
I want to be able to take a collection of CSV files who share a common index and time t
with each other and I want to merge them all together using one function called mergedf()
. 我希望能够收集一组共享相同索引和时间
t
的CSV文件,并且希望使用一个称为mergedf()
函数将它们全部合并在一起。 It looked to me like it worked except it printed the same set of values 3 times. 在我看来,它看起来像是可行的,只是它打印了3组相同的值。 It seems as though it is printing
filepath[0]
3 times based off of my if
statement. 似乎基于我的
if
语句,它正在打印filepath[0]
3次。 In addition, it could also be intdf
in the prepdf()
function. 此外,它也可以
intdf
在prepdf()
函数。
If you could help me spot my error that would be amazing. 如果您能帮助我发现我的错误,那将是惊人的。
In: 在:
def prepdf(path, mi, ma):
csv = pd.read_csv(path, usecols=[0,1], skiprows=1, names = ['t','b'])
df = DataFrame(csv)
fs = 2
T = 1/fs
ts = np.arange(mi, ma, T)
interpdata = {}
for key in ['b']:
spl = interpolate.interp1d(df['t'], df[key])
interpdata[key] = spl(ts)
interpframe = pd.DataFrame(interpdata, index=ts)
interpframe.index.name = 'ts'
interpframe.reset_index(inplace=True)
interpframe['t'] = interpframe['ts']
temp = interpframe.loc[interpframe['b'] > 0.5, 't']
interpframe.loc[interpframe['b'] > 0.5, 't'] = temp
interpframe['t'] = interpframe['t'].fillna(method='ffill')
interpframe.set_index('t', inplace=True)
inttmp = interp_frame
intdf = interp_frame.head(n=len(inttmp))
return intdf
PATHS = ['data1.csv', 'data2.csv', 'data3.csv']
filepath = [file for file in PATHS]
for path in PATHS:
df = prepdf(path, 650, 1000)
print(df)
print(len(PATHS))
def mergedf(n):
if len(PATHS)-1-n == 0:
return prepdf(filepath[0], 650, 1000)
else:
return pd.merge(prepdf(filepath[len(PATHS)-1-n], 650, 1000), mergedf(n+1), left_on='t', right_on='t')
mergedf(0)
Out(mergedf(0)): 出(mergedf(0)):
t b b_x b_y
0 650.0 0.105299 0.105299 0.105299
1 650.5 0.193072 0.193072 0.193072
2 651.0 0.115404 0.115404 0.115404
3 651.5 0.047509 0.047509 0.047509
4 652.0 0.119501 0.119501 0.119501
5 652.5 -0.187888 -0.187888 -0.187888
... ... ... ... ...
695 997.5 0.165262 0.165262 0.165262
696 998.0 -0.131729 -0.131729 -0.131729
697 998.5 0.038266 0.038266 0.038266
698 999.0 0.093568 0.093568 0.093568
699 999.5 0.022013 0.022013 0.022013
700 rows × 4 columns
Here is an example of a CSV DataFrame: 这是CSV数据框的示例:
t b
0 650.0 0.105299
1 650.5 0.193072
2 651.0 0.115404
3 651.5 0.047509
4 652.0 0.119501
5 652.5 -0.187888
... ...
IIUC: IIUC:
df = pd.concat([prepdf(x, 650, 1000) for x in PATHS], axis=1)
UPDATE: 更新:
i guess the problem of showing you the same data set three times was caused by the following lines: 我想向您显示三次相同的数据集的问题是由以下几行引起的:
intdf = interp_frame.head(n=len(inttmp))
return intdf
interp_frame
- is not defined in the function. interp_frame
在函数中未定义。 Most probably it was defined before in your Python environment (iPython, Jupyter, etc.) 它很可能是在您的Python环境(iPython,Jupyter等)中定义的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.