简体   繁体   English

应用循环,同时用另一列中的值替换 dataframe 中的 NaN 值

[英]Apply for loop while replacing NaN values in a dataframe with values from another column

Lets say that i have a dataframe like this:可以说我有一个像这样的 dataframe:

    Col1                       Col2
0  AAA_BBB_123_DD              123
1  AAA_123_BBB_DD              123
2  123_AAA_BBB_DD              123
3  123_AAA_BB_DDD              NaN
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_789_BBB_DD              NaN
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD              NaN

What I want is, for NaN values in Col2, I want to check the string in Col1, split it by "_" and if it contains something put it to Col2.我想要的是,对于 Col2 中的 NaN 值,我想检查 Col1 中的字符串,将其拆分为“_”,如果它包含某些内容,则将其放入 Col2。

In a normal scenario without dataframe if i have a string like this 123_AAA_BB_DDD i would do this:在没有 dataframe 的正常情况下,如果我有这样的字符串123_AAA_BB_DDD我会这样做:

str = 123_AAA_BB_DDD
values = ['123','456','789']
split_str = str.split("_")
for i in split_str:
    if any(value in i for value in values):
        col2_value = i
    else:
        col2_value = 'Not Found'

My desirable output would look like this:我想要的 output 看起来像这样:

    Col1                       Col2
0  AAA_BBB_123_DD              123
1  AAA_123_BBB_DD              123
2  123_AAA_BBB_DD              123
3  123_AAA_BB_DDD              123
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_789_BBB_DD              789
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD           Not Found

EDITED:编辑:

The solution worked good for the cases where the values from the list are matched exactly to the string in Col1, eg (123 in list and 123 in Col1 string).该解决方案适用于列表中的值与 Col1 中的字符串完全匹配的情况,例如(列表中的 123 和 Col1 字符串中的 123)。 But if i have something like this: AAA_PORT123_BBB_DD the soultion will put like 'Not Found' in the Col2, so lets say i have a df like this:但是,如果我有这样的东西:AAA_PORT123_BBB_DD,灵魂会在 Col2 中放置“未找到”,所以可以说我有这样的 df:

    Col1                       Col2
0  AAA_BBB_PORT123_DD        PORT123
1  AAA_123_BBB_DD              123
2  STD123_AAA_BBB_DD          STD123
3  123_AAA_BB_DDD              NaN
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_MAN789_BBB_DD          NaN
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD              NaN

My desirable output would be:我理想的 output 将是:

    Col1                       Col2
0  AAA_BBB_PORT123_DD        PORT123
1  AAA_123_BBB_DD              123
2  STD123_AAA_BBB_DD          STD123
3  123_AAA_BB_DDD              123
4  456_AAA_BBB_DD              456
5  AAA_BBB_456_DD              456
6  AAA_MAN789_BBB_DD          MAN789
7  AAA_BBB_789_DD              789
8  AAA_000_BBB_DD            Not Found

For rows with missing values in Col2 call custom function for matched first value from list value , for run function only for matched rows use DataFrame.loc with mask in both sides:对于Col2中缺少值的行,请调用自定义 function 以从列表value中匹配第一个值,对于运行 function 仅对匹配的行使用DataFrame.loc和掩码:

values = ['123','456','789']

m = df['Col2'].isna()

f = lambda x: next((y for y in x.split('_') if y in values), 'Not Found')
df.loc[m, 'Col2'] = df.loc[m, 'Col1'].apply(f)
print (df)
             Col1       Col2
0  AAA_BBB_123_DD      123.0
1  AAA_123_BBB_DD      123.0
2  123_AAA_BBB_DD      123.0
3  123_AAA_BB_DDD        123
4  456_AAA_BBB_DD      456.0
5  AAA_BBB_456_DD      456.0
6  AAA_789_BBB_DD        789
7  AAA_BBB_789_DD      789.0
8  AAA_000_BBB_DD  Not Found
df = pd.DataFrame([
    {
        "col1": "456_AAA_BBB_DD",
        "col2": "123",
    },
    {
        "col1": "456_AAA_BBB_DD",
        "col2": np.NaN,
    },
    {
        "col1": "000_AAA_BBB_DD",
        "col2": np.NaN,
    }
])
df["col2"] = df["col2"]
values = ['123','456','789']
df.loc[df['col2'].isnull(), 'col2'] = df['col1'].str.split("_").apply(lambda row: next((x for x in row if x in values), "Not Found"))

initial Dataframe初始 Dataframe

             col1 col2
0  456_AAA_BBB_DD  123
1  456_AAA_BBB_DD  NaN
2  000_AAA_BBB_DD  NaN

output: output:

             col1       col2
0  456_AAA_BBB_DD        123
1  456_AAA_BBB_DD        456
2  000_AAA_BBB_DD  Not Found

df.loc[df['col2'].isnull(), 'col2'] will update only the column col2 if col2 is null如果col2是 null df.loc[df['col2'].isnull(), 'col2']将仅更新列col2

first we will split col1 with df['col1'].str.split("_")首先我们将col1df['col1'].str.split("_")

Then we search through the list if an element is in values x for x in a row if x in values will return a generator object.然后我们在列表中搜索元素是否在values x for x in a row if x in values中将返回生成器 object。

next allows us to take only the first value of the generator. next允许我们只取生成器的第一个值。 The second parameter of the function is the default value function的第二个参数为默认值

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过将一列中的 NaN 值替换为另一列中的非 NaN 值来更新 Pandas dataframe - Updating a Pandas dataframe by replacing NaN values in a column with not NaN values from another column Python:在匹配不同列中的值后,用来自另一个数据帧的值替换特定列中的 NaN - Python: Replacing NaN in a specific column by values from another dataframe after matching values in a different column 根据另一列的值用 nan 替换值 - Replacing values with nan based on values of another column 用另一列中的正则表达式替换一列中的 NaN 值 - Replacing NaN values in one column with regex from another column 用来自另一个 dataframe 的值替换 dataframe 列列表值 - Replacing dataframe column list values with values from another dataframe 有条件地将一个 DataFrame 列中的值替换为另一列中的值 - Conditionally replacing values in one DataFrame column with values from another column 从第二列替换列中的 NaN 值 - Replacing NaN values in a column from a second column 连接 Pandas DataFrame 中的列值,用逗号替换“NaN”值 - Concatenate column values in Pandas DataFrame replacing “NaN” values with comma 使用基于(非唯一)列值的其他行中的值替换 DataFrame 行中的 NaN 值 - Replacing NaN values in a DataFrame row with values from other rows based on a (non-unique) column value 将一列从一个 DataFrame 复制到另一个会给出 NaN 值? - Copying a column from one DataFrame to another gives NaN values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM