应用循环，同时用另一列中的值替换 dataframe 中的 NaN 值

Question

Lets say that i have a dataframe like this:可以说我有一个像这样的 dataframe：

    Col1                       Col2
0  AAA_BBB_123_DD              123
1  AAA_123_BBB_DD              123
2  123_AAA_BBB_DD              123
3  123_AAA_BB_DDD              NaN
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_789_BBB_DD              NaN
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD              NaN

What I want is, for NaN values in Col2, I want to check the string in Col1, split it by "_" and if it contains something put it to Col2.我想要的是，对于 Col2 中的 NaN 值，我想检查 Col1 中的字符串，将其拆分为“_”，如果它包含某些内容，则将其放入 Col2。

In a normal scenario without dataframe if i have a string like this 123_AAA_BB_DDD i would do this:在没有 dataframe 的正常情况下，如果我有这样的字符串123_AAA_BB_DDD我会这样做：

str = 123_AAA_BB_DDD
values = ['123','456','789']
split_str = str.split("_")
for i in split_str:
    if any(value in i for value in values):
        col2_value = i
    else:
        col2_value = 'Not Found'

My desirable output would look like this:我想要的 output 看起来像这样：

    Col1                       Col2
0  AAA_BBB_123_DD              123
1  AAA_123_BBB_DD              123
2  123_AAA_BBB_DD              123
3  123_AAA_BB_DDD              123
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_789_BBB_DD              789
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD           Not Found

EDITED:编辑：

The solution worked good for the cases where the values from the list are matched exactly to the string in Col1, eg (123 in list and 123 in Col1 string).该解决方案适用于列表中的值与 Col1 中的字符串完全匹配的情况，例如（列表中的 123 和 Col1 字符串中的 123）。 But if i have something like this: AAA_PORT123_BBB_DD the soultion will put like 'Not Found' in the Col2, so lets say i have a df like this:但是，如果我有这样的东西：AAA_PORT123_BBB_DD，灵魂会在 Col2 中放置“未找到”，所以可以说我有这样的 df：

    Col1                       Col2
0  AAA_BBB_PORT123_DD        PORT123
1  AAA_123_BBB_DD              123
2  STD123_AAA_BBB_DD          STD123
3  123_AAA_BB_DDD              NaN
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_MAN789_BBB_DD          NaN
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD              NaN

My desirable output would be:我理想的 output 将是：

    Col1                       Col2
0  AAA_BBB_PORT123_DD        PORT123
1  AAA_123_BBB_DD              123
2  STD123_AAA_BBB_DD          STD123
3  123_AAA_BB_DDD              123
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_MAN789_BBB_DD          MAN789
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD            Not Found

Answer 1

For rows with missing values in Col2 call custom function for matched first value from list value , for run function only for matched rows use DataFrame.loc with mask in both sides:对于Col2中缺少值的行，请调用自定义 function 以从列表value中匹配第一个值，对于运行 function 仅对匹配的行使用DataFrame.loc和掩码：

values = ['123','456','789']

m = df['Col2'].isna()

f = lambda x: next((y for y in x.split('_') if y in values), 'Not Found')
df.loc[m, 'Col2'] = df.loc[m, 'Col1'].apply(f)
print (df)
             Col1       Col2
0  AAA_BBB_123_DD      123.0
1  AAA_123_BBB_DD      123.0
2  123_AAA_BBB_DD      123.0
3  123_AAA_BB_DDD        123
4  456_AAA_BBB_DD      456.0
5  AAA_BBB_456_DD      456.0
6  AAA_789_BBB_DD        789
7  AAA_BBB_789_DD      789.0
8  AAA_000_BBB_DD  Not Found

Answer 2

df = pd.DataFrame([
    {
        "col1": "456_AAA_BBB_DD",
        "col2": "123",
    },
    {
        "col1": "456_AAA_BBB_DD",
        "col2": np.NaN,
    },
    {
        "col1": "000_AAA_BBB_DD",
        "col2": np.NaN,
    }
])
df["col2"] = df["col2"]
values = ['123','456','789']
df.loc[df['col2'].isnull(), 'col2'] = df['col1'].str.split("_").apply(lambda row: next((x for x in row if x in values), "Not Found"))

initial Dataframe初始 Dataframe

             col1 col2
0  456_AAA_BBB_DD  123
1  456_AAA_BBB_DD  NaN
2  000_AAA_BBB_DD  NaN

output: output：

             col1       col2
0  456_AAA_BBB_DD        123
1  456_AAA_BBB_DD        456
2  000_AAA_BBB_DD  Not Found

df.loc[df['col2'].isnull(), 'col2'] will update only the column col2 if col2 is null如果col2是 null df.loc[df['col2'].isnull(), 'col2']将仅更新列col2

first we will split col1 with df['col1'].str.split("_")首先我们将col1与df['col1'].str.split("_")

Then we search through the list if an element is in values x for x in a row if x in values will return a generator object.然后我们在列表中搜索元素是否在values x for x in a row if x in values中将返回生成器 object。

next allows us to take only the first value of the generator. next允许我们只取生成器的第一个值。 The second parameter of the function is the default value function的第二个参数为默认值

应用循环，同时用另一列中的值替换 dataframe 中的 NaN 值

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-11-18 12:47:46

解决方案2
1 2021-11-18 12:49:22

应用循环，同时用另一列中的值替换 dataframe 中的 NaN 值

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-11-18 12:47:46

解决方案2 1 2021-11-18 12:49:22

解决方案1
1 已采纳 2021-11-18 12:47:46

解决方案2
1 2021-11-18 12:49:22