正确使用 map 和 loc

Question

这是我的代码

df[df.loc[:,'medium']=='appsflyer']['source'] = df[df.loc[:,'medium']=='appsflyer'].['source'].map(lambda x: x if x in appsflyer else "Others")

变量“appsflyer”是一个包含我想要保留的所有值的列表。 如果它不在列表中，我想将该值标记为“其他”。

因为我只想更改 appsflyer 媒体中的值，所以我使用 loc 运算符来分割我的日期帧。

代码正在运行，没有警告或错误，但值根本没有改变。

这里出了什么问题？

Answer 1

通过在作业中使用链式索引，您正在创建 dataframe 的副本。请参阅： https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a -复制

df[df.loc[:, "medium"] == "appsflyer"]["source"].map(lambda x: x if x in appsflyer else "other")

这部分产生了正确的 output，但是由于结果被分配给了一个副本，所以原始 dataframe 中的值没有被修改。

您可以做的是直接对 dataframe 进行切片，以便只选择要更改的行。 例如：

df = pd.DataFrame({
"medium": ["appsflyer", "appsflyer", "not_appsflyer", "not_appsflyer"], 
"source": [1, 2, 3, 4]})
appsflyer = [1, 4]

df
          medium  source
0      appsflyer       1
1      appsflyer       2
2  not_appsflyer       3
3  not_appsflyer       4

我们需要 select 行 a) 在中间列中具有值“appsflyer”和 b) 在源列中其值不在 appsflyer 列表中（因此不是 1 或 4）。

这两种情况的掩码如下所示：

mask = (df.medium == "appsflyer") & ~(df.source.isin(appsflyer))

现在，我们可以简单地将其用作 loc 中的行索引，并避免链接多个索引：

df.loc[mask, "source"] = "other"

df
          medium source
0      appsflyer      1
1      appsflyer  other
2  not_appsflyer      3
3  not_appsflyer      4

Answer 2

IIUC，尝试where ：

保留行的原始“来源”，其中：
- 媒体不是“appsflyer”或
- 媒体是“appsflyer”，“来源”在appsflyer列表中
对于 rest，将来源更改为“其他”

df["source"] = df["source"].where(df["medium"].ne("appsflyer")|df["source"].isin(appsflyer)|,"Others")

正确使用 map 和 loc

问题描述

2 个解决方案

解决方案1
1 2022-06-09 15:43:59

解决方案2
0 2022-06-09 14:40:23

正确使用 map 和 loc

问题描述

2 个解决方案

解决方案1 1 2022-06-09 15:43:59

解决方案2 0 2022-06-09 14:40:23

解决方案1
1 2022-06-09 15:43:59

解决方案2
0 2022-06-09 14:40:23