简体   繁体   English

Pandas加入分组和普通数据帧

[英]Pandas join grouped and normal dataframe

I'm using Pandas (0.9.1) to write a physics code. 我正在使用Pandas(0.9.1)编写物理代码。 I have two dataframes: 我有两个数据帧:

Levels: 级别:

class 'pandas.core.frame.DataFrame'>
Int64Index: 37331 entries, 0 to 37330
Data columns:
atomic_number    37331  non-null values
ion_number       37331  non-null values
level_number     37331  non-null values
energy           37331  non-null values
g                37331  non-null values
metastable       37331  non-null values

Lines: 行:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 314338 entries, 0 to 314337
Data columns:
id                    314338  non-null values
wavelength            314338  non-null values
atomic_number         314338  non-null values
ion_number            314338  non-null values
f_ul                  314338  non-null values
f_lu                  314338  non-null values
level_number_lower    314338  non-null values
level_number_upper    314338  non-null values
dtypes: float64(3), int64(7)

There's a couple of things I need to do: I need to join levels with lines (atom, ion, level): at first on atom, ion, level_number_upper and then atom, ion, level_number_lower. 我需要做一些事情:我需要用线(原子,离子,水平)连接水平:首先是原子,离子,level_number_upper然后是原子,离子,level_number_lower。 Is there a way to precompute the join - memory is not an issue, but speed is. 有没有办法预先计算连接 - 内存不是问题,但速度是。

I also need to group levels (on atom, ion) and do an operation on levels. 我还需要对水平(原子,离子)进行分组并在水平上进行操作。 I did this already (incredibly fast), but then had trouble joining the resulting series with the lines dataframe. 我已经这样做了(速度非常快),但是在使用行数据帧加入生成的系列时遇到了麻烦。

How do I do this? 我该怎么做呢?

Cheers Wolfgang 干杯沃尔夫冈

update v1: 更新v1:

To show what I want to join merge here a code snippet 要显示我想加入的内容,请在此处合并代码段

def calc_group_func(group):
    return np.sum(group['g']*np.exp(-group['energy'])
grouped_data = levels.group_by('atomic_number', 'ion_number')
grouped_data.apply(calc_group_func)

and then I want to join/merge grouped data with lines on atomic_number and ion_number 然后我想加入/合并分组数据与atomic_number和ion_number上的行

There may be a better way, but perhaps df.merge() would work here. 可能有更好的方法,但也许df.merge()可以在这里工作。 df.merge() works on two DataFrames, so the values computed for each (atom, ion) pair, which are in a Series after apply(), need to be placed in a DataFrame first, at which time the final column name can also be specified. df.merge()适用于两个DataFrame,因此为apply()之后的系列中的每个(atom,ion)对计算的值需要首先放在DataFrame中,此时最终的列名称可以也可以指定。

In [9]: grouped_vals = grouped_data.apply(calc_group_func)

In [10]: grouped_vals
Out[10]: 
atomic_number  ion_number
0              0             0.517541
               1             0.046833
1              0             0.253188
               1             0.440194

In [11]: lines.merge(pd.DataFrame({'group_val': grouped_vals}),
   ....:             left_on=['atomic_number', 'ion_number'],
   ....:             right_index=True)
Out[11]: 
    atomic_number  ion_number  group_val
id                                      
a               0           0   0.517541
b               0           0   0.517541
c               0           1   0.046833
d               0           1   0.046833
e               1           0   0.253188
f               1           0   0.253188
g               1           1   0.440194
h               1           1   0.440194

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM