[英]Filling NaN values with those contained in a different DataFrame
問題看起來像這樣:
我有一個帶有 2 級多tpc
的數據框,表示在時區mc
onset
時發生的事件left
。 每個事件都發生在由(staff, voice)
定義的層中:
mc onset staff voice tpc dynamics chords
section ix
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 NaN NaN
3 0 0 1 1 4 NaN NaN
4 0 0 1 1 1 NaN NaN
5 0 0 1 1 0 NaN NaN
6 0 3/4 2 2 1 NaN NaN
7 0 3/4 2 1 1 NaN NaN
然后,數據框right
與其他事件('dynamic', 'chords')
需要填充到left
:
mc onset staff voice dynamics chords
0 0 0 1 1 f NaN
1 0 0 1 1 NaN I
2 0 1/2 2 1 p NaN
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
right
的所有事件都需要出現在left
left
事件同時發生,請為這些事件填寫相應的left
列(即加入['mc', 'onset', 'staff', 'voice']
;例如第 0 行, 1, 4)staff
中的left
事件同時發生,請為這些事件填寫相應的left
列(即加入['mc', 'onset', 'staff']
;例如第 4 行)left
事件同時發生,請為這些事件填寫相應的left
列(即加入['mc', 'onset']
,例如第 3 行)left
事件同時發生,則發出警告並保留它們以供進一步處理(例如第 2 行)right
的兩個相同類型的事件同時發生,則拋出警告並連接值(例如第 3 行和第 4 行) mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6
7 0 3/4 2 1 1 NaN I6I64
WARNING: These events could not be attached:
mc onset staff voice dynamics chords
2 0 1/2 2 1 p NaN
WARNING: These events are simultaneous:
mc onset staff voice dynamics chords
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
由於我想避免迭代right
的方法,因此我嘗試了以下操作:
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
這種方法不起作用,因為在第一次合並后, match
看起來像這樣:
mc onset staff voice dynamics chords tpc
0 2 0 0 1 1 f NaN 0
3 0 0 1 1 f NaN 4
4 0 0 1 1 f NaN 1
5 0 0 1 1 f NaN 0
2 0 0 1 1 NaN I 0
3 0 0 1 1 NaN I 4
4 0 0 1 1 NaN I 1
5 0 0 1 1 NaN I 0
7 0 3/4 2 1 NaN I64 1
由於match
的索引不是唯一的,分配left = match
沒有完全起作用(結果中缺少dynamics
)並且fillna
的注釋掉的方法靜默地沒有做任何事情。 此外,為了獲得用於正確分配的left_index
和用於刪除匹配行的right_index
,我不得不兩次執行相同的合並,這讓我很困擾。
面對這些問題,我在連接right
進行預處理以將同時發生的事件合並為一行:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print(f"WARNING:Two simultaneous events in row {df.iloc[0].name}")
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
(出於某種未知原因,注釋掉的fillna
方法再次沒有做任何事情。兩次執行相同合並的問題仍然存在。)結果是我可以接受的結果,但是,它沒有區分right
的層因此看起來像這樣:
mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 f I
1 0 0 2 1 0 f I
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6I64
7 0 3/4 2 1 1 NaN I6I64
WARNING:Two simultaneous events at:
mc onset
3 0 3/4
WARNING: These events could not be attached:
mc onset dynamics chords
1 0 1/2 p NaN
這通常如何解決?
這是復制的源代碼:
import pandas as pd
import numpy as np
from fractions import Fraction
left_dict = {'mc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 0,
(0, 4): 0,
(0, 5): 0,
(0, 6): 0,
(0, 7): 0},
'onset': {(0, 0): Fraction(0, 1),
(0, 1): Fraction(0, 1),
(0, 2): Fraction(0, 1),
(0, 3): Fraction(0, 1),
(0, 4): Fraction(0, 1),
(0, 5): Fraction(0, 1),
(0, 6): Fraction(3, 4),
(0, 7): Fraction(3, 4)},
'staff': {(0, 0): 2,
(0, 1): 2,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 2},
'voice': {(0, 0): 1,
(0, 1): 1,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 1},
'tpc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 4,
(0, 4): 1,
(0, 5): 0,
(0, 6): 1,
(0, 7): 1},
'dynamics': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan},
'chords': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan}}
left = pd.DataFrame.from_dict(left_dict)
right_dict = {'mc': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'onset': {0: Fraction(0, 1),
1: Fraction(0, 1),
2: Fraction(1, 2),
3: Fraction(3, 4),
4: Fraction(3, 4)},
'staff': {0: 1, 1: 1, 2: 2, 3: 1, 4: 2},
'voice': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'dynamics': {0: 'f', 1: np.nan, 2: 'p', 3: np.nan, 4: np.nan},
'chords': {0: np.nan, 1: 'I', 2: np.nan, 3: 'I6', 4: 'I64'}}
right = pd.DataFrame.from_dict(right_dict)
attempt1 = True
if attempt1:
left_features = ['mc', 'onset', 'staff', 'voice', 'tpc']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
#left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
else:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print("WARNING:Two simultaneous events at:")
print(df.iloc[:1][['mc', 'onset']])
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
畢竟,解決我的問題的最簡單方法是使用循環:
isnan = lambda num: num != num
right_features = ['dynamics', 'chords']
for i, r in right.iterrows():
same_os = left.loc[(left.mc == r.mc) & (left.onset == r.onset)]
if len(same_os) > 0:
same_staff = same_os.loc[same_os.staff == r.staff]
same_voice = same_staff.loc[same_staff.voice == r.voice]
if len(same_voice) > 0:
fill = same_voice
elif len(same_staff) > 0:
fill = same_staff
else:
fill = same_os
for f in right_features:
if not isnan(r[f]):
F = left.loc[fill.index, f]
notna = F.notna()
if notna.any():
print(f"WARNING:Feature existed and was concatenated: {F[notna]}")
left.loc[F[notna].index, f] += r[f]
left.loc[F[~notna].index, f] = r[f]
else:
left.loc[fill.index, f] = r[f]
else:
print(f"WARNING:Event could not be attached: {r}")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.