[英]Pandas: overwrite values in a multiindex dataframe based on a non-multiindex mask
This is the first time I'm posting a question myself here.这是我第一次在这里发布自己的问题。 So far I've almost always found solutions to my problems in existing questions (What a great forum and community.), Please bear with me, though.
到目前为止,我几乎总是在现有问题中找到解决我的问题的方法(多好的论坛和社区。),不过请多多包涵。 if this question (or a very similar one) has been asked and answered elsewhere on stackoverflow.
如果这个问题(或一个非常相似的问题)已经在 stackoverflow 的其他地方被问到和回答过。
I have a multiindex dataframe ( test_data
) that contains different variables (outer level) for the same set of cities (inner level) and the same range of years (columns) that looks like this:我有一个多索引 dataframe (
test_data
),它包含同一组城市(内部级别)和相同年份范围(列)的不同变量(外部级别),如下所示:
1990 1991 1992 1993 1994
VAR CITY
1 Berlin 40 41 42 43 44
Paris 36 35 34 33 32
London 30 30 30 30 30
2 Berlin 35 34 33 32 31
Paris 39 38 39 40 41
London 45 44 43 42 41
3 Berlin 24 25 26 27 28
Paris 24 24 25 26 27
London 29 29 29 30 31
2m Berlin 1 2 3 4 5
Paris 2 3 4 5 6
London 3 4 5 6 7
which one could obtain from this code:哪一个可以从这段代码中获得:
test_dict = {(1,'Berlin'): [40,41,42,43,44],
(1,'Paris'): [36,35,34,33,32],
(1,'London'): [30,30,30,30,30],
(2,'Berlin'): [35,34,33,32,31],
(2,'Paris'): [39,38,39,40,41],
(2,'London'): [45,44,43,42,41],
(3,'Berlin'): [24,25,26,27,28],
(3,'Paris'): [24,24,25,26,27],
(3,'London'): [29,29,29,30,31],
('2m','Berlin'): [1,2,3,4,5],
('2m','Paris'): [2,3,4,5,6],
('2m','London'): [3,4,5,6,7]}
test_data = pd.DataFrame(test_dict, index=[1990,1991,1992,1993,1994]).transpose()
Now I want to set all values for variables 1 and 2 to NaN where the sum of variables 1 to 3 is less than 98 or greater than 102, ie Berlin 1994, Paris 1991, as well as London 1990 and 1991 (see below).现在我想将变量 1 和 2 的所有值设置为 NaN,其中变量 1 到 3 的总和小于 98 或大于 102,即 1994 年柏林、1991 年巴黎以及 1990 年和 1991 年伦敦(见下文)。
I have assigned a new DataFrame我已经分配了一个新的 DataFrame
df_sum = test_data.loc[[1,2,3]].sum(level=1)
df_sum
1990 1991 1992 1993 1994
Berlin 99 100 101 102 103
Paris 99 97 98 99 100
London 104 103 102 102 102
and set并设置
mask = (df_sum < 98) | (df_sum > 102)
mask
1990 1991 1992 1993 1994
Berlin False False False False True
Paris False True False False False
London True True False False False
df_sum
and mask
are non-multiindex DataFrames obviously and have the same dimensions as test_data.loc[1], ... Now I would like to do something like df_sum
和mask
显然是非多索引数据帧,并且具有与 test_data.loc[1] 相同的维度,......现在我想做类似的事情
for var in [1,2]: test_data.loc[var][mask] = np.nan
I understand why this doesn't work and yields a SettingWithCopy warning.我明白为什么这不起作用并产生 SettingWithCopy 警告。 However, so far I didn't manage to figure out a(n elegant) way to do this.
但是,到目前为止,我还没有找到一种(优雅的)方法来做到这一点。 I found this thread ( Pandas: Apply mask to multiindex dataframe ) and thought this was probably the right direction but the difference is that there the mask has the same dimensions as the original multiindex dataframe.
我找到了这个线程 ( Pandas: Apply mask to multiindex dataframe ) 并认为这可能是正确的方向,但不同之处在于掩码与原始多索引 dataframe 具有相同的尺寸。
Any help is much appreciated.任何帮助深表感谢。
edit: I wouldn't think that this would be an elegant solution but it doesn't even work and I really don't understand this behaviour:编辑:我认为这不是一个优雅的解决方案,但它甚至不起作用,我真的不明白这种行为:
for var in range(1,4):
tmp = test_data.loc[var].copy()
tmp[test_mask] = np.nan
test_data.loc[var] = tmp.copy()
This leads to test_data.loc[1]
, ...loc[2]
and ...loc[3]
being all NaNs although tmp
has only 4 NaNs after applying test_mask
.这导致
test_data.loc[1]
、 ...loc[2]
和...loc[3]
都是 NaN,尽管tmp
在应用test_mask
后只有 4 个 NaN。
You're getting the warning because you're working with views, not copies, but if I understand correctly all you need is this:您收到警告是因为您使用的是视图,而不是副本,但如果我理解正确,您需要的是:
# you had this bit, but groupby syntax is preferred
df_sum = test_data.loc[[1,2,3]].groupby(level=1).sum()
for city, years in ((df_sum < 98) | (df_sum > 102)).iterrows():
# get the years for which condition is True
for year in years[years].index:
test_data.loc[(slice(1,2), city), year] = np.nan
This uses the slice
syntax for Multi-index selection, you can read more about it here .这使用
slice
语法进行多索引选择,您可以在此处阅读更多相关信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.