Pandas加入分组和普通数据帧

Question

I'm using Pandas (0.9.1) to write a physics code. 我正在使用Pandas（0.9.1）编写物理代码。 I have two dataframes: 我有两个数据帧：

Levels: 级别：

class 'pandas.core.frame.DataFrame'>
Int64Index: 37331 entries, 0 to 37330
Data columns:
atomic_number    37331  non-null values
ion_number       37331  non-null values
level_number     37331  non-null values
energy           37331  non-null values
g                37331  non-null values
metastable       37331  non-null values

Lines: 行：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 314338 entries, 0 to 314337
Data columns:
id                    314338  non-null values
wavelength            314338  non-null values
atomic_number         314338  non-null values
ion_number            314338  non-null values
f_ul                  314338  non-null values
f_lu                  314338  non-null values
level_number_lower    314338  non-null values
level_number_upper    314338  non-null values
dtypes: float64(3), int64(7)

There's a couple of things I need to do: I need to join levels with lines (atom, ion, level): at first on atom, ion, level_number_upper and then atom, ion, level_number_lower. 我需要做一些事情：我需要用线（原子，离子，水平）连接水平：首先是原子，离子，level_number_upper然后是原子，离子，level_number_lower。 Is there a way to precompute the join - memory is not an issue, but speed is. 有没有办法预先计算连接 - 内存不是问题，但速度是。

I also need to group levels (on atom, ion) and do an operation on levels. 我还需要对水平（原子，离子）进行分组并在水平上进行操作。 I did this already (incredibly fast), but then had trouble joining the resulting series with the lines dataframe. 我已经这样做了（速度非常快），但是在使用行数据帧加入生成的系列时遇到了麻烦。

How do I do this? 我该怎么做呢？

Cheers Wolfgang 干杯沃尔夫冈

update v1: 更新v1：

To show what I want to join merge here a code snippet 要显示我想加入的内容，请在此处合并代码段

def calc_group_func(group):
    return np.sum(group['g']*np.exp(-group['energy'])
grouped_data = levels.group_by('atomic_number', 'ion_number')
grouped_data.apply(calc_group_func)

and then I want to join/merge grouped data with lines on atomic_number and ion_number 然后我想加入/合并分组数据与atomic_number和ion_number上的行

Answer 1

There may be a better way, but perhaps df.merge() would work here. 可能有更好的方法，但也许df.merge（）可以在这里工作。 df.merge() works on two DataFrames, so the values computed for each (atom, ion) pair, which are in a Series after apply(), need to be placed in a DataFrame first, at which time the final column name can also be specified. df.merge（）适用于两个DataFrame，因此为apply（）之后的系列中的每个（atom，ion）对计算的值需要首先放在DataFrame中，此时最终的列名称可以也可以指定。

In [9]: grouped_vals = grouped_data.apply(calc_group_func)

In [10]: grouped_vals
Out[10]: 
atomic_number  ion_number
0              0             0.517541
               1             0.046833
1              0             0.253188
               1             0.440194

In [11]: lines.merge(pd.DataFrame({'group_val': grouped_vals}),
   ....:             left_on=['atomic_number', 'ion_number'],
   ....:             right_index=True)
Out[11]: 
    atomic_number  ion_number  group_val
id                                      
a               0           0   0.517541
b               0           0   0.517541
c               0           1   0.046833
d               0           1   0.046833
e               1           0   0.253188
f               1           0   0.253188
g               1           1   0.440194
h               1           1   0.440194

Pandas加入分组和普通数据帧

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-12-16 03:21:12

Pandas加入分组和普通数据帧

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-12-16 03:21:12

解决方案1
2 已采纳 2012-12-16 03:21:12