[英]Filling NaN values with those contained in a different DataFrame
The problem looks like this:问题看起来像这样:
I have a dataframe left
with a 2-level multiindex, representing events tpc
occurring at point onset
in the time region mc
.我有一个带有 2 级多tpc
的数据框,表示在时区mc
onset
时发生的事件left
。 Every event occurs in a layer defined by (staff, voice)
:每个事件都发生在由(staff, voice)
定义的层中:
mc onset staff voice tpc dynamics chords
section ix
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 NaN NaN
3 0 0 1 1 4 NaN NaN
4 0 0 1 1 1 NaN NaN
5 0 0 1 1 0 NaN NaN
6 0 3/4 2 2 1 NaN NaN
7 0 3/4 2 1 1 NaN NaN
Then, there is the dataframe right
with other events ('dynamic', 'chords')
, which need to be filled into left
:然后,数据框right
与其他事件('dynamic', 'chords')
需要填充到left
:
mc onset staff voice dynamics chords
0 0 0 1 1 f NaN
1 0 0 1 1 NaN I
2 0 1/2 2 1 p NaN
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
right
need to appear in left
right
的所有事件都需要出现在left
left
events in the same layer, fill in the respective column of left
for those events (ie, join on ['mc', 'onset', 'staff', 'voice']
; eg rows 0, 1, 4)如果它们与同一层中的left
事件同时发生,请为这些事件填写相应的left
列(即加入['mc', 'onset', 'staff', 'voice']
;例如第 0 行, 1, 4)left
events in the same staff
, fill in the respective column of left
for those events (ie, join on ['mc', 'onset', 'staff']
; eg row 4)否则,如果它们与同一staff
中的left
事件同时发生,请为这些事件填写相应的left
列(即加入['mc', 'onset', 'staff']
;例如第 4 行)left
events in some other layer, fill in the respective column of left
for those events (ie, join on ['mc', 'onset']
, eg row 3)否则,如果它们与其他层中的left
事件同时发生,请为这些事件填写相应的left
列(即加入['mc', 'onset']
,例如第 3 行)left
events, throw a warning and keep them for further treatment (eg row 2)否则,如果它们没有与left
事件同时发生,则发出警告并保留它们以供进一步处理(例如第 2 行)right
occur simultaneously, throw a warning and concatenate values (eg rows 3 & 4)如果right
的两个相同类型的事件同时发生,则抛出警告并连接值(例如第 3 行和第 4 行) mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6
7 0 3/4 2 1 1 NaN I6I64
WARNING: These events could not be attached:
mc onset staff voice dynamics chords
2 0 1/2 2 1 p NaN
WARNING: These events are simultaneous:
mc onset staff voice dynamics chords
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
Since I would like to avoid an approach where I iterate through right
, I tried the following:由于我想避免迭代right
的方法,因此我尝试了以下操作:
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
This approach does not work because after the first merge, match
looks like this:这种方法不起作用,因为在第一次合并后, match
看起来像这样:
mc onset staff voice dynamics chords tpc
0 2 0 0 1 1 f NaN 0
3 0 0 1 1 f NaN 4
4 0 0 1 1 f NaN 1
5 0 0 1 1 f NaN 0
2 0 0 1 1 NaN I 0
3 0 0 1 1 NaN I 4
4 0 0 1 1 NaN I 1
5 0 0 1 1 NaN I 0
7 0 3/4 2 1 NaN I64 1
Since the index of match
is not unique, the assignment left = match
is not fully working ( dynamics
are missing in the result) and the commented out approach with fillna
silently doesn't do anything.由于match
的索引不是唯一的,分配left = match
没有完全起作用(结果中缺少dynamics
)并且fillna
的注释掉的方法静默地没有做任何事情。 Also, it bothers me to do the same merge twice in order to get the left_index
for correct assignment and then the right_index
for dropping the matched rows.此外,为了获得用于正确分配的left_index
和用于删除匹配行的right_index
,我不得不两次执行相同的合并,这让我很困扰。
Facing these problems, I preprocess right
before the join to unite simultaneous events in one row:面对这些问题,我在连接right
进行预处理以将同时发生的事件合并为一行:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print(f"WARNING:Two simultaneous events in row {df.iloc[0].name}")
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
(For some unknown reason, the commented out approach with fillna
again doesn't do anything. The issue of doing the same merge twice remains.) The result is one I could live with, however, it does not differentiate between the layers of right
and therefore looks like this: (出于某种未知原因,注释掉的fillna
方法再次没有做任何事情。两次执行相同合并的问题仍然存在。)结果是我可以接受的结果,但是,它没有区分right
的层因此看起来像这样:
mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 f I
1 0 0 2 1 0 f I
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6I64
7 0 3/4 2 1 1 NaN I6I64
WARNING:Two simultaneous events at:
mc onset
3 0 3/4
WARNING: These events could not be attached:
mc onset dynamics chords
1 0 1/2 p NaN
How would this typically be solved?这通常如何解决?
Here is the source code for reproduction:这是复制的源代码:
import pandas as pd
import numpy as np
from fractions import Fraction
left_dict = {'mc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 0,
(0, 4): 0,
(0, 5): 0,
(0, 6): 0,
(0, 7): 0},
'onset': {(0, 0): Fraction(0, 1),
(0, 1): Fraction(0, 1),
(0, 2): Fraction(0, 1),
(0, 3): Fraction(0, 1),
(0, 4): Fraction(0, 1),
(0, 5): Fraction(0, 1),
(0, 6): Fraction(3, 4),
(0, 7): Fraction(3, 4)},
'staff': {(0, 0): 2,
(0, 1): 2,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 2},
'voice': {(0, 0): 1,
(0, 1): 1,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 1},
'tpc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 4,
(0, 4): 1,
(0, 5): 0,
(0, 6): 1,
(0, 7): 1},
'dynamics': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan},
'chords': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan}}
left = pd.DataFrame.from_dict(left_dict)
right_dict = {'mc': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'onset': {0: Fraction(0, 1),
1: Fraction(0, 1),
2: Fraction(1, 2),
3: Fraction(3, 4),
4: Fraction(3, 4)},
'staff': {0: 1, 1: 1, 2: 2, 3: 1, 4: 2},
'voice': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'dynamics': {0: 'f', 1: np.nan, 2: 'p', 3: np.nan, 4: np.nan},
'chords': {0: np.nan, 1: 'I', 2: np.nan, 3: 'I6', 4: 'I64'}}
right = pd.DataFrame.from_dict(right_dict)
attempt1 = True
if attempt1:
left_features = ['mc', 'onset', 'staff', 'voice', 'tpc']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
#left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
else:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print("WARNING:Two simultaneous events at:")
print(df.iloc[:1][['mc', 'onset']])
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
After all it has turned out that the easiest way to solve my problem is using loops:毕竟,解决我的问题的最简单方法是使用循环:
isnan = lambda num: num != num
right_features = ['dynamics', 'chords']
for i, r in right.iterrows():
same_os = left.loc[(left.mc == r.mc) & (left.onset == r.onset)]
if len(same_os) > 0:
same_staff = same_os.loc[same_os.staff == r.staff]
same_voice = same_staff.loc[same_staff.voice == r.voice]
if len(same_voice) > 0:
fill = same_voice
elif len(same_staff) > 0:
fill = same_staff
else:
fill = same_os
for f in right_features:
if not isnan(r[f]):
F = left.loc[fill.index, f]
notna = F.notna()
if notna.any():
print(f"WARNING:Feature existed and was concatenated: {F[notna]}")
left.loc[F[notna].index, f] += r[f]
left.loc[F[~notna].index, f] = r[f]
else:
left.loc[fill.index, f] = r[f]
else:
print(f"WARNING:Event could not be attached: {r}")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.