简体   繁体   English

递归pd.merge()输出错误

[英]Recursive pd.merge() output error

I want to be able to take a collection of CSV files who share a common index and time t with each other and I want to merge them all together using one function called mergedf() . 我希望能够收集一组共享相同索引和时间t的CSV文件,并且希望使用一个称为mergedf()函数将它们全部合并在一起。 It looked to me like it worked except it printed the same set of values 3 times. 在我看来,它看起来像是可行的,只是它打印了3组相同的值。 It seems as though it is printing filepath[0] 3 times based off of my if statement. 似乎基于我的if语句,它正在打印filepath[0] 3次。 In addition, it could also be intdf in the prepdf() function. 此外,它也可以intdfprepdf()函数。

If you could help me spot my error that would be amazing. 如果您能帮助我发现我的错误,那将是惊人的。

In: 在:

def prepdf(path, mi, ma):
    csv = pd.read_csv(path, usecols=[0,1], skiprows=1, names = ['t','b'])
    df = DataFrame(csv)

    fs = 2  
    T = 1/fs  
    ts = np.arange(mi, ma, T)

    interpdata = {}

    for key in ['b']:
        spl = interpolate.interp1d(df['t'], df[key])
        interpdata[key] = spl(ts)

    interpframe = pd.DataFrame(interpdata, index=ts)
    interpframe.index.name = 'ts'
    interpframe.reset_index(inplace=True)
    interpframe['t'] = interpframe['ts']
    temp = interpframe.loc[interpframe['b'] > 0.5, 't']
    interpframe.loc[interpframe['b'] > 0.5, 't'] = temp
    interpframe['t'] = interpframe['t'].fillna(method='ffill')
    interpframe.set_index('t', inplace=True)
    inttmp = interp_frame
    intdf = interp_frame.head(n=len(inttmp))

    return intdf   

PATHS = ['data1.csv', 'data2.csv', 'data3.csv']
filepath = [file for file in PATHS]

for path in PATHS:
    df = prepdf(path, 650, 1000)
    print(df)

print(len(PATHS))

def mergedf(n):
    if len(PATHS)-1-n == 0:
        return prepdf(filepath[0], 650, 1000)
    else:
        return pd.merge(prepdf(filepath[len(PATHS)-1-n], 650, 1000), mergedf(n+1), left_on='t', right_on='t')

mergedf(0)

Out(mergedf(0)): 出(mergedf(0)):

    t       b           b_x         b_y
0   650.0   0.105299    0.105299    0.105299
1   650.5   0.193072    0.193072    0.193072
2   651.0   0.115404    0.115404    0.115404
3   651.5   0.047509    0.047509    0.047509
4   652.0   0.119501    0.119501    0.119501
5   652.5   -0.187888   -0.187888   -0.187888
...     ...     ...     ...     ...
695     997.5   0.165262    0.165262    0.165262
696     998.0   -0.131729   -0.131729   -0.131729
697     998.5   0.038266    0.038266    0.038266
698     999.0   0.093568    0.093568    0.093568
699     999.5   0.022013    0.022013    0.022013

700 rows × 4 columns

Here is an example of a CSV DataFrame: 这是CSV数据框的示例:

     t         b
0    650.0  0.105299
1    650.5  0.193072
2    651.0  0.115404
3    651.5  0.047509
4    652.0  0.119501
5    652.5 -0.187888
     ...    ...

IIUC: IIUC:

df = pd.concat([prepdf(x, 650, 1000) for x in PATHS], axis=1)

UPDATE: 更新:

i guess the problem of showing you the same data set three times was caused by the following lines: 我想向您显示三次相同的数据集的问题是由以下几行引起的:

intdf = interp_frame.head(n=len(inttmp))

return intdf   

interp_frame - is not defined in the function. interp_frame在函数中未定义。 Most probably it was defined before in your Python environment (iPython, Jupyter, etc.) 它很可能是在您的Python环境(iPython,Jupyter等)中定义的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM