The problem looks like this:
I have a dataframe left
with a 2-level multiindex, representing events tpc
occurring at point onset
in the time region mc
. Every event occurs in a layer defined by (staff, voice)
:
mc onset staff voice tpc dynamics chords
section ix
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 NaN NaN
3 0 0 1 1 4 NaN NaN
4 0 0 1 1 1 NaN NaN
5 0 0 1 1 0 NaN NaN
6 0 3/4 2 2 1 NaN NaN
7 0 3/4 2 1 1 NaN NaN
Then, there is the dataframe right
with other events ('dynamic', 'chords')
, which need to be filled into left
:
mc onset staff voice dynamics chords
0 0 0 1 1 f NaN
1 0 0 1 1 NaN I
2 0 1/2 2 1 p NaN
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
right
need to appear in left
left
events in the same layer, fill in the respective column of left
for those events (ie, join on ['mc', 'onset', 'staff', 'voice']
; eg rows 0, 1, 4)left
events in the same staff
, fill in the respective column of left
for those events (ie, join on ['mc', 'onset', 'staff']
; eg row 4)left
events in some other layer, fill in the respective column of left
for those events (ie, join on ['mc', 'onset']
, eg row 3)left
events, throw a warning and keep them for further treatment (eg row 2)right
occur simultaneously, throw a warning and concatenate values (eg rows 3 & 4) mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 NaN NaN
1 0 0 2 1 0 NaN NaN
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6
7 0 3/4 2 1 1 NaN I6I64
WARNING: These events could not be attached:
mc onset staff voice dynamics chords
2 0 1/2 2 1 p NaN
WARNING: These events are simultaneous:
mc onset staff voice dynamics chords
3 0 3/4 1 1 NaN I6
4 0 3/4 2 1 NaN I64
Since I would like to avoid an approach where I iterate through right
, I tried the following:
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
This approach does not work because after the first merge, match
looks like this:
mc onset staff voice dynamics chords tpc
0 2 0 0 1 1 f NaN 0
3 0 0 1 1 f NaN 4
4 0 0 1 1 f NaN 1
5 0 0 1 1 f NaN 0
2 0 0 1 1 NaN I 0
3 0 0 1 1 NaN I 4
4 0 0 1 1 NaN I 1
5 0 0 1 1 NaN I 0
7 0 3/4 2 1 NaN I64 1
Since the index of match
is not unique, the assignment left = match
is not fully working ( dynamics
are missing in the result) and the commented out approach with fillna
silently doesn't do anything. Also, it bothers me to do the same merge twice in order to get the left_index
for correct assignment and then the right_index
for dropping the matched rows.
Facing these problems, I preprocess right
before the join to unite simultaneous events in one row:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print(f"WARNING:Two simultaneous events in row {df.iloc[0].name}")
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
(For some unknown reason, the commented out approach with fillna
again doesn't do anything. The issue of doing the same merge twice remains.) The result is one I could live with, however, it does not differentiate between the layers of right
and therefore looks like this:
mc onset staff voice tpc dynamics chords
0 0 0 0 2 1 0 f I
1 0 0 2 1 0 f I
2 0 0 1 1 0 f I
3 0 0 1 1 4 f I
4 0 0 1 1 1 f I
5 0 0 1 1 0 f I
6 0 3/4 2 2 1 NaN I6I64
7 0 3/4 2 1 1 NaN I6I64
WARNING:Two simultaneous events at:
mc onset
3 0 3/4
WARNING: These events could not be attached:
mc onset dynamics chords
1 0 1/2 p NaN
How would this typically be solved?
Here is the source code for reproduction:
import pandas as pd
import numpy as np
from fractions import Fraction
left_dict = {'mc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 0,
(0, 4): 0,
(0, 5): 0,
(0, 6): 0,
(0, 7): 0},
'onset': {(0, 0): Fraction(0, 1),
(0, 1): Fraction(0, 1),
(0, 2): Fraction(0, 1),
(0, 3): Fraction(0, 1),
(0, 4): Fraction(0, 1),
(0, 5): Fraction(0, 1),
(0, 6): Fraction(3, 4),
(0, 7): Fraction(3, 4)},
'staff': {(0, 0): 2,
(0, 1): 2,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 2},
'voice': {(0, 0): 1,
(0, 1): 1,
(0, 2): 1,
(0, 3): 1,
(0, 4): 1,
(0, 5): 1,
(0, 6): 2,
(0, 7): 1},
'tpc': {(0, 0): 0,
(0, 1): 0,
(0, 2): 0,
(0, 3): 4,
(0, 4): 1,
(0, 5): 0,
(0, 6): 1,
(0, 7): 1},
'dynamics': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan},
'chords': {(0, 0): np.nan,
(0, 1): np.nan,
(0, 2): np.nan,
(0, 3): np.nan,
(0, 4): np.nan,
(0, 5): np.nan,
(0, 6): np.nan,
(0, 7): np.nan}}
left = pd.DataFrame.from_dict(left_dict)
right_dict = {'mc': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'onset': {0: Fraction(0, 1),
1: Fraction(0, 1),
2: Fraction(1, 2),
3: Fraction(3, 4),
4: Fraction(3, 4)},
'staff': {0: 1, 1: 1, 2: 2, 3: 1, 4: 2},
'voice': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'dynamics': {0: 'f', 1: np.nan, 2: 'p', 3: np.nan, 4: np.nan},
'chords': {0: np.nan, 1: 'I', 2: np.nan, 3: 'I6', 4: 'I64'}}
right = pd.DataFrame.from_dict(right_dict)
attempt1 = True
if attempt1:
left_features = ['mc', 'onset', 'staff', 'voice', 'tpc']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
#left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) == 0:
break
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
else:
def unite_vals(df):
r = pd.Series(index=right_features)
for col in right_features:
u = df[col][df[col].notna()].unique()
if len(u) > 1:
r[col] = ''.join(str(val) for val in u)
print("WARNING:Two simultaneous events at:")
print(df.iloc[:1][['mc', 'onset']])
elif len(u) == 1:
r[col] = u[0]
return r
left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
print("WARNING: These events could not be attached:")
print(right)
print(left)
After all it has turned out that the easiest way to solve my problem is using loops:
isnan = lambda num: num != num
right_features = ['dynamics', 'chords']
for i, r in right.iterrows():
same_os = left.loc[(left.mc == r.mc) & (left.onset == r.onset)]
if len(same_os) > 0:
same_staff = same_os.loc[same_os.staff == r.staff]
same_voice = same_staff.loc[same_staff.voice == r.voice]
if len(same_voice) > 0:
fill = same_voice
elif len(same_staff) > 0:
fill = same_staff
else:
fill = same_os
for f in right_features:
if not isnan(r[f]):
F = left.loc[fill.index, f]
notna = F.notna()
if notna.any():
print(f"WARNING:Feature existed and was concatenated: {F[notna]}")
left.loc[F[notna].index, f] += r[f]
left.loc[F[~notna].index, f] = r[f]
else:
left.loc[fill.index, f] = r[f]
else:
print(f"WARNING:Event could not be attached: {r}")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.