使用另一個多索引系列遮罩數據框

Question

我有一個數據框，我想用多索引系列的布爾值掩蓋（轉換為NaN），其中該系列的多索引也是該數據框中的列名。 例如，如果df為：

df = pd.DataFrame({ 'A': (188, 750, 1330, 1385, 188, 750, 810, 1330, 1385),
                     'B': (1, 2, 4, 5, 1, 2, 3, 4, 5),
                     'C': (2, 5, 7, 2, 5, 5, 3, 7, 2),
                     'D': ('foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'bar') })

    A    B  C   D
0   188  1  2   foo
1   750  2  5   foo
2   1330 4  7   foo
3   1385 5  2   foo
4   188  1  5   bar
5   750  2  5   bar
6   810  3  3   bar
7   1330 4  7   bar
8   1385 5  2   bar

而多索引系列ser為：

arrays = [('188', '750', '810', '1330', '1385'),
          ('1', '2', '3', '4', '5')]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['A', 'B'])
ser = pd.Series([False, False, True, False, True], index=index)

A     B
188   1    False
750   2    False
810   3    True
1330  4    False
1385  5    True
dtype: bool

我怎么能掩蓋（轉換為NAN）在列的值C在df ，其中的條目是False的系列ser ，以結束與最后的數據幀，將是這樣的：

    A    B  C   D
0   188  1  2   foo
1   750  2  5   foo
2   1330 4  7   foo
3   1385 5  NaN foo
4   188  1  5   bar
5   750  2  5   bar
6   810  3  NaN bar
7   1330 4  7   bar
8   1385 5  NaN bar

Answer 1

更改ser的初始化步驟：

arrays = [('188', '750', '810', '1330', '1385'),
          ('1', '2', '3', '4', '5')]
# Note: The change is in this step - make the levels numeric.
tuples = list(zip(*map(pd.to_numeric, arrays)))
index = pd.MultiIndex.from_tuples(tuples, names=['A', 'B'])
ser = pd.Series([False, False, True, False, True], index=index)

初始化index的級別，使其具有與“ A”和“ B”相同的dtype。 希望這不是問題。

這將讓我們建立使用更簡單的解決loc和基於索引的選擇和分配。

u = df.set_index(['A', 'B'])
u.loc[ser.index[ser], 'C'] = np.nan

u.reset_index()
      A  B    C    D
0   188  1  2.0  foo
1   750  2  5.0  foo
2  1330  4  7.0  foo
3  1385  5  NaN  foo
4   188  1  5.0  bar
5   750  2  5.0  bar
6   810  3  NaN  bar
7  1330  4  7.0  bar
8  1385  5  NaN  bar

如果遇到給定ser且需要更改索引pd.Index.set_levels則可以使用pd.Index.set_levels的列表理解來快速重建它。

ser.index = ser.index.set_levels([l.astype(int) for l in ser.index.levels]) 
# Alternative,
# ser.index = ser.index.set_levels([
#     pd.to_numeric(l) for l in ser.index.levels])

現在，這有效：

u = df.set_index(['A', 'B'])
u.loc[ser.index[ser], 'C'] = np.nan

u.reset_index()

      A  B    C    D
0   188  1  2.0  foo
1   750  2  5.0  foo
2  1330  4  7.0  foo
3  1385  5  NaN  foo
4   188  1  5.0  bar
5   750  2  5.0  bar
6   810  3  NaN  bar
7  1330  4  7.0  bar
8  1385  5  NaN  bar

注意loc的ser.index[ser]索引步驟，我們使用ser的索引而不是直接使用index 。

Answer 2

采用：

# Converting ser to a dataframe 
ndf = pd.DataFrame(ser).reset_index()

# Fetching B values against which C values needs to be mapped to NaN
idx = ndf[ndf.iloc[:,2] == True].B.values

# Fetching df index where C values needs to be mapped to NaN
idx_ = df[df.B.isin(idx)].index

# Mapping of C values to NaN
df.loc[idx_,'C'] = np.NaN


+---+------+---+-----+-----+
|   |   A  | B |  C  |  D  |
+---+------+---+-----+-----+
| 0 |  188 | 1 | 2.0 | foo |
| 1 |  750 | 2 | 5.0 | foo |
| 2 | 1330 | 4 | 7.0 | foo |
| 3 | 1385 | 5 | NaN | foo |
| 4 |  188 | 1 | 5.0 | bar |
| 5 |  750 | 2 | 5.0 | bar |
| 6 |  810 | 3 | NaN | bar |
| 7 | 1330 | 4 | 7.0 | bar |
| 8 | 1385 | 5 | NaN | bar |
+---+------+---+-----+-----+

Answer 3

將isin用於兩個MultiIndex之間的檢查成員資格：

#convert columns to strings for same types of levels
df[['A','B']] = df[['A','B']].astype(str)
df.loc[df.set_index(['A','B']).index.isin(ser.index[ser]), 'C'] = np.nan
print (df)
      A  B    C    D
0   188  1  2.0  foo
1   750  2  5.0  foo
2  1330  4  7.0  foo
3  1385  5  NaN  foo
4   188  1  5.0  bar
5   750  2  5.0  bar
6   810  3  NaN  bar
7  1330  4  7.0  bar
8  1385  5  NaN  bar

使用另一個多索引系列遮罩數據框

問題描述

3 個解決方案

解決方案1
2 2018-12-18 11:53:32

解決方案2
1 2018-12-18 11:49:11

解決方案3
1 已采納 2018-12-18 12:11:00

使用另一個多索引系列遮罩數據框

問題描述

3 個解決方案

解決方案1 2 2018-12-18 11:53:32

解決方案2 1 2018-12-18 11:49:11

解決方案3 1 已采納 2018-12-18 12:11:00

解決方案1
2 2018-12-18 11:53:32

解決方案2
1 2018-12-18 11:49:11

解決方案3
1 已采納 2018-12-18 12:11:00