簡體   English   中英

用不同 DataFrame 中包含的值填充 NaN 值

[英]Filling NaN values with those contained in a different DataFrame

問題看起來像這樣:

問題

我有一個帶有 2 級多tpc的數據框,表示在時區mc onset時發生的事件left 每個事件都發生在由(staff, voice)定義的層中:

            mc onset  staff  voice  tpc  dynamics  chords
section ix                                               
0       0    0     0      2      1    0       NaN     NaN
        1    0     0      2      1    0       NaN     NaN
        2    0     0      1      1    0       NaN     NaN
        3    0     0      1      1    4       NaN     NaN
        4    0     0      1      1    1       NaN     NaN
        5    0     0      1      1    0       NaN     NaN
        6    0   3/4      2      2    1       NaN     NaN
        7    0   3/4      2      1    1       NaN     NaN

然后,數據框right與其他事件('dynamic', 'chords')需要填充到left

   mc onset  staff  voice dynamics chords
0   0     0      1      1        f    NaN
1   0     0      1      1      NaN      I
2   0   1/2      2      1        p    NaN
3   0   3/4      1      1      NaN     I6
4   0   3/4      2      1      NaN    I64

填寫規則如下:

  1. right的所有事件都需要出現在left
  2. 如果它們與同一層中的left事件同時發生,請為這些事件填寫相應的left列(即加入['mc', 'onset', 'staff', 'voice'] ;例如第 0 行, 1, 4)
  3. 否則,如果它們與同一staff中的left事件同時發生,請為這些事件填寫相應的left列(即加入['mc', 'onset', 'staff'] ;例如第 4 行)
  4. 否則,如果它們與其他層中的left事件同時發生,請為這些事件填寫相應的left列(即加入['mc', 'onset'] ,例如第 3 行)
  5. 否則,如果它們沒有與left事件同時發生,則發出警告並保留它們以供進一步處理(例如第 2 行)
  6. 如果right的兩個相同類型的事件同時發生,則拋出警告並連接值(例如第 3 行和第 4 行)

預期結果

     mc onset  staff  voice  tpc dynamics chords
0 0   0     0      2      1    0      NaN    NaN
  1   0     0      2      1    0      NaN    NaN
  2   0     0      1      1    0      f        I
  3   0     0      1      1    4      f        I
  4   0     0      1      1    1      f        I
  5   0     0      1      1    0      f        I
  6   0   3/4      2      2    1      NaN     I6
  7   0   3/4      2      1    1      NaN  I6I64
WARNING: These events could not be attached:
   mc onset  staff  voice dynamics chords
2   0   1/2      2      1        p    NaN
WARNING: These events are simultaneous:
   mc onset  staff  voice dynamics chords
3   0   3/4      1      1      NaN     I6
4   0   3/4      2      1      NaN    I64

嘗試 1

由於我想避免迭代right的方法,因此我嘗試了以下操作:

left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
for on in join_on:
    match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
    left_ix = match.index
    left.loc[left_ix, match.columns] = match
    # left.loc[left_ix].fillna(match, inplace=True)
    right_ix = right.merge(left[left_features], on=on, right_index=True).index
    right.drop(right_ix, inplace=True)
    if len(right) == 0:
        break
if len(right) > 0:
    print("WARNING: These events could not be attached:")
    print(right)

這種方法不起作用,因為在第一次合並后, match看起來像這樣:

     mc onset  staff  voice dynamics chords  tpc
0 2   0     0      1      1        f    NaN    0
  3   0     0      1      1        f    NaN    4
  4   0     0      1      1        f    NaN    1
  5   0     0      1      1        f    NaN    0
  2   0     0      1      1      NaN      I    0
  3   0     0      1      1      NaN      I    4
  4   0     0      1      1      NaN      I    1
  5   0     0      1      1      NaN      I    0
  7   0   3/4      2      1      NaN    I64    1

由於match的索引不是唯一的,分配left = match沒有完全起作用(結果中缺少dynamics )並且fillna的注釋掉的方法靜默地沒有做任何事情。 此外,為了獲得用於正確分配的left_index和用於刪除匹配行的right_index ,我不得不兩次執行相同的合並,這讓我很困擾。

嘗試 2

面對這些問題,我在連接right進行預處理以將同時發生的事件合並為一行:

def unite_vals(df):
    r = pd.Series(index=right_features)
    for col in right_features:
        u = df[col][df[col].notna()].unique()
        if len(u) > 1:
            r[col] = ''.join(str(val) for val in u)
            print(f"WARNING:Two simultaneous events in row {df.iloc[0].name}")
        elif len(u) == 1:
            r[col] = u[0]
    return r

left_features = ['mc', 'onset', 'staff', 'voice']
right_features = ['dynamics', 'chords']
on = ['mc', 'onset']
right = right.groupby(on).apply(unite_vals).reset_index()
match = right.merge(left[left_features], on=on, left_index=True)
left_ix = match.index
left.loc[left_ix, match.columns] = match
# left.loc[left_ix].fillna(match, inplace=True)
right_ix = right.merge(left[left_features], on=on, right_index=True).index
right.drop(right_ix, inplace=True)
if len(right) > 0:
    print("WARNING: These events could not be attached:")
    print(right)

(出於某種未知原因,注釋掉的fillna方法再次沒有做任何事情。兩次執行相同合並的問題仍然存在。)結果是我可以接受的結果,但是,它沒有區分right的層因此看起來像這樣:

     mc onset  staff  voice  tpc dynamics chords
0 0   0     0      2      1    0        f      I
  1   0     0      2      1    0        f      I
  2   0     0      1      1    0        f      I
  3   0     0      1      1    4        f      I
  4   0     0      1      1    1        f      I
  5   0     0      1      1    0        f      I
  6   0   3/4      2      2    1      NaN  I6I64
  7   0   3/4      2      1    1      NaN  I6I64
WARNING:Two simultaneous events at:
   mc onset
3   0   3/4
WARNING: These events could not be attached:
   mc onset dynamics chords
1   0   1/2        p    NaN

這通常如何解決?

這是復制的源代碼:

import pandas as pd
import numpy as np
from fractions import Fraction
left_dict = {'mc': {(0, 0): 0,
  (0, 1): 0,
  (0, 2): 0,
  (0, 3): 0,
  (0, 4): 0,
  (0, 5): 0,
  (0, 6): 0,
  (0, 7): 0},
 'onset': {(0, 0): Fraction(0, 1),
  (0, 1): Fraction(0, 1),
  (0, 2): Fraction(0, 1),
  (0, 3): Fraction(0, 1),
  (0, 4): Fraction(0, 1),
  (0, 5): Fraction(0, 1),
  (0, 6): Fraction(3, 4),
  (0, 7): Fraction(3, 4)},
 'staff': {(0, 0): 2,
  (0, 1): 2,
  (0, 2): 1,
  (0, 3): 1,
  (0, 4): 1,
  (0, 5): 1,
  (0, 6): 2,
  (0, 7): 2},
 'voice': {(0, 0): 1,
  (0, 1): 1,
  (0, 2): 1,
  (0, 3): 1,
  (0, 4): 1,
  (0, 5): 1,
  (0, 6): 2,
  (0, 7): 1},
 'tpc': {(0, 0): 0,
  (0, 1): 0,
  (0, 2): 0,
  (0, 3): 4,
  (0, 4): 1,
  (0, 5): 0,
  (0, 6): 1,
  (0, 7): 1},
 'dynamics': {(0, 0): np.nan,
  (0, 1): np.nan,
  (0, 2): np.nan,
  (0, 3): np.nan,
  (0, 4): np.nan,
  (0, 5): np.nan,
  (0, 6): np.nan,
  (0, 7): np.nan},
 'chords': {(0, 0): np.nan,
  (0, 1): np.nan,
  (0, 2): np.nan,
  (0, 3): np.nan,
  (0, 4): np.nan,
  (0, 5): np.nan,
  (0, 6): np.nan,
  (0, 7): np.nan}}
left = pd.DataFrame.from_dict(left_dict)

right_dict = {'mc': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'onset': {0: Fraction(0, 1),
  1: Fraction(0, 1),
  2: Fraction(1, 2),
  3: Fraction(3, 4),
  4: Fraction(3, 4)},
 'staff': {0: 1, 1: 1, 2: 2, 3: 1, 4: 2},
 'voice': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
 'dynamics': {0: 'f', 1: np.nan, 2: 'p', 3: np.nan, 4: np.nan},
 'chords': {0: np.nan, 1: 'I', 2: np.nan, 3: 'I6', 4: 'I64'}}
right = pd.DataFrame.from_dict(right_dict)

attempt1 = True
if attempt1:
    left_features = ['mc', 'onset', 'staff', 'voice', 'tpc']
    right_features = ['dynamics', 'chords']
    join_on = [['mc', 'onset', 'staff', 'voice'], ['mc', 'onset', 'staff'], ['mc', 'onset']]
    for on in join_on:
        match = right[on + right_features].merge(left[left_features], on=on, left_index=True)
        left_ix = match.index
        left.loc[left_ix, match.columns] = match
        #left.loc[left_ix].fillna(match, inplace=True)
        right_ix = right.merge(left[left_features], on=on, right_index=True).index
        right.drop(right_ix, inplace=True)
        if len(right) == 0:
            break
    if len(right) > 0:
        print("WARNING: These events could not be attached:")
        print(right)
    print(left)
else:
    def unite_vals(df):
        r = pd.Series(index=right_features)
        for col in right_features:
            u = df[col][df[col].notna()].unique()
            if len(u) > 1:
                r[col] = ''.join(str(val) for val in u)
                print("WARNING:Two simultaneous events at:")
                print(df.iloc[:1][['mc', 'onset']])
            elif len(u) == 1:
                r[col] = u[0]
        return r

    left_features = ['mc', 'onset', 'staff', 'voice']
    right_features = ['dynamics', 'chords']
    on = ['mc', 'onset']
    right = right.groupby(on).apply(unite_vals).reset_index()
    match = right.merge(left[left_features], on=on, left_index=True)
    left_ix = match.index
    left.loc[left_ix, match.columns] = match
    # left.loc[left_ix].fillna(match, inplace=True)
    right_ix = right.merge(left[left_features], on=on, right_index=True).index
    right.drop(right_ix, inplace=True)
    if len(right) > 0:
        print("WARNING: These events could not be attached:")
        print(right)
    print(left)

畢竟,解決我的問題的最簡單方法是使用循環:

isnan = lambda num:  num != num
right_features = ['dynamics', 'chords']
for i, r in right.iterrows():
    same_os = left.loc[(left.mc == r.mc) & (left.onset == r.onset)]
    if len(same_os) > 0:
        same_staff = same_os.loc[same_os.staff == r.staff]
        same_voice = same_staff.loc[same_staff.voice == r.voice]
        if len(same_voice) > 0:
            fill = same_voice
        elif len(same_staff) > 0:
            fill = same_staff
        else:
            fill = same_os

        for f in right_features:
            if not isnan(r[f]):
                F = left.loc[fill.index, f]
                notna = F.notna()
                if notna.any():
                    print(f"WARNING:Feature existed and was concatenated: {F[notna]}")
                    left.loc[F[notna].index, f] += r[f]
                    left.loc[F[~notna].index, f] = r[f]
                else:
                    left.loc[fill.index, f] = r[f]
    else:
        print(f"WARNING:Event could not be attached: {r}")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM