简体   繁体   English

正确使用 map 和 loc

[英]Using map and loc properly

Here's my code这是我的代码

df[df.loc[:,'medium']=='appsflyer']['source'] = df[df.loc[:,'medium']=='appsflyer'].['source'].map(lambda x: x if x in appsflyer else "Others")

Where the variable "appsflyer" is a list with all the values i want to keep.变量“appsflyer”是一个包含我想要保留的所有值的列表。 If it's not in the list, i want to mark the value as 'others'.如果它不在列表中,我想将该值标记为“其他”。

Since i only want to change the values in the appsflyer medium, i used the loc operator to slice my dateframe.因为我只想更改 appsflyer 媒体中的值,所以我使用 loc 运算符来分割我的日期帧。

The code is running, there are no warnings or errors, but the values didn't change at all.代码正在运行,没有警告或错误,但值根本没有改变。

What's wrong here?这里出了什么问题?

By using chained indices in the assignment, you are creating a copy of the dataframe. See: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy通过在作业中使用链式索引,您正在创建 dataframe 的副本。请参阅: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a -复制

df[df.loc[:, "medium"] == "appsflyer"]["source"].map(lambda x: x if x in appsflyer else "other")

This part produces the correct output, but since the result is assigned to a copy, the values in the original dataframe are not modified.这部分产生了正确的 output,但是由于结果被分配给了一个副本,所以原始 dataframe 中的值没有被修改。

What you can do instead is to directly slice the dataframe so that only the rows that you want to change are selected.您可以做的是直接对 dataframe 进行切片,以便只选择要更改的行。 For example:例如:

df = pd.DataFrame({
"medium": ["appsflyer", "appsflyer", "not_appsflyer", "not_appsflyer"], 
"source": [1, 2, 3, 4]})
appsflyer = [1, 4]
df
          medium  source
0      appsflyer       1
1      appsflyer       2
2  not_appsflyer       3
3  not_appsflyer       4

We need to select rows that a) have the value "appsflyer" in the medium column and b) whose value in the source column is not in the appsflyer list (so not 1 or 4).我们需要 select 行 a) 在中间列中具有值“appsflyer”和 b) 在源列中其值不在 appsflyer 列表中(因此不是 1 或 4)。

The mask for these two conditions looks like this:这两种情况的掩码如下所示:

mask = (df.medium == "appsflyer") & ~(df.source.isin(appsflyer))

Now, we can simply use that as the row index in loc and avoid chaining multiple indices:现在,我们可以简单地将其用作 loc 中的行索引,并避免链接多个索引:

df.loc[mask, "source"] = "other"

df
          medium source
0      appsflyer      1
1      appsflyer  other
2  not_appsflyer      3
3  not_appsflyer      4

IIUC, try with where : IIUC,尝试where

  • keep the original "source" for rows where:保留行的原始“来源”,其中:
    • the medium is not "appsflyer" OR媒体不是“appsflyer”或
    • the medium is "appsflyer" and the "source" is in the appsflyer list媒体是“appsflyer”,“来源”在appsflyer列表中
  • for the rest, change the source to "Others"对于 rest,将来源更改为“其他”
df["source"] = df["source"].where(df["medium"].ne("appsflyer")|df["source"].isin(appsflyer)|,"Others")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM