[英]Apply for loop while replacing NaN values in a dataframe with values from another column
Lets say that i have a dataframe like this:可以说我有一个像这样的 dataframe:
Col1 Col2
0 AAA_BBB_123_DD 123
1 AAA_123_BBB_DD 123
2 123_AAA_BBB_DD 123
3 123_AAA_BB_DDD NaN
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_789_BBB_DD NaN
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD NaN
What I want is, for NaN values in Col2, I want to check the string in Col1, split it by "_" and if it contains something put it to Col2.我想要的是,对于 Col2 中的 NaN 值,我想检查 Col1 中的字符串,将其拆分为“_”,如果它包含某些内容,则将其放入 Col2。
In a normal scenario without dataframe if i have a string like this 123_AAA_BB_DDD
i would do this:在没有 dataframe 的正常情况下,如果我有这样的字符串
123_AAA_BB_DDD
我会这样做:
str = 123_AAA_BB_DDD
values = ['123','456','789']
split_str = str.split("_")
for i in split_str:
if any(value in i for value in values):
col2_value = i
else:
col2_value = 'Not Found'
My desirable output would look like this:我想要的 output 看起来像这样:
Col1 Col2
0 AAA_BBB_123_DD 123
1 AAA_123_BBB_DD 123
2 123_AAA_BBB_DD 123
3 123_AAA_BB_DDD 123
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_789_BBB_DD 789
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD Not Found
EDITED:编辑:
The solution worked good for the cases where the values from the list are matched exactly to the string in Col1, eg (123 in list and 123 in Col1 string).该解决方案适用于列表中的值与 Col1 中的字符串完全匹配的情况,例如(列表中的 123 和 Col1 字符串中的 123)。 But if i have something like this: AAA_PORT123_BBB_DD the soultion will put like 'Not Found' in the Col2, so lets say i have a df like this:
但是,如果我有这样的东西:AAA_PORT123_BBB_DD,灵魂会在 Col2 中放置“未找到”,所以可以说我有这样的 df:
Col1 Col2
0 AAA_BBB_PORT123_DD PORT123
1 AAA_123_BBB_DD 123
2 STD123_AAA_BBB_DD STD123
3 123_AAA_BB_DDD NaN
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_MAN789_BBB_DD NaN
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD NaN
My desirable output would be:我理想的 output 将是:
Col1 Col2
0 AAA_BBB_PORT123_DD PORT123
1 AAA_123_BBB_DD 123
2 STD123_AAA_BBB_DD STD123
3 123_AAA_BB_DDD 123
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_MAN789_BBB_DD MAN789
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD Not Found
For rows with missing values in Col2
call custom function for matched first value from list value
, for run function only for matched rows use DataFrame.loc
with mask in both sides:对于
Col2
中缺少值的行,请调用自定义 function 以从列表value
中匹配第一个值,对于运行 function 仅对匹配的行使用DataFrame.loc
和掩码:
values = ['123','456','789']
m = df['Col2'].isna()
f = lambda x: next((y for y in x.split('_') if y in values), 'Not Found')
df.loc[m, 'Col2'] = df.loc[m, 'Col1'].apply(f)
print (df)
Col1 Col2
0 AAA_BBB_123_DD 123.0
1 AAA_123_BBB_DD 123.0
2 123_AAA_BBB_DD 123.0
3 123_AAA_BB_DDD 123
4 456_AAA_BBB_DD 456.0
5 AAA_BBB_456_DD 456.0
6 AAA_789_BBB_DD 789
7 AAA_BBB_789_DD 789.0
8 AAA_000_BBB_DD Not Found
df = pd.DataFrame([
{
"col1": "456_AAA_BBB_DD",
"col2": "123",
},
{
"col1": "456_AAA_BBB_DD",
"col2": np.NaN,
},
{
"col1": "000_AAA_BBB_DD",
"col2": np.NaN,
}
])
df["col2"] = df["col2"]
values = ['123','456','789']
df.loc[df['col2'].isnull(), 'col2'] = df['col1'].str.split("_").apply(lambda row: next((x for x in row if x in values), "Not Found"))
initial Dataframe初始 Dataframe
col1 col2
0 456_AAA_BBB_DD 123
1 456_AAA_BBB_DD NaN
2 000_AAA_BBB_DD NaN
output: output:
col1 col2
0 456_AAA_BBB_DD 123
1 456_AAA_BBB_DD 456
2 000_AAA_BBB_DD Not Found
df.loc[df['col2'].isnull(), 'col2']
will update only the column col2
if col2
is null如果
col2
是 null df.loc[df['col2'].isnull(), 'col2']
将仅更新列col2
first we will split col1
with df['col1'].str.split("_")
首先我们将
col1
与df['col1'].str.split("_")
Then we search through the list if an element is in values
x for x in a row if x in values
will return a generator object.然后我们在列表中搜索元素是否在
values
x for x in a row if x in values
中将返回生成器 object。
next
allows us to take only the first value of the generator. next
允许我们只取生成器的第一个值。 The second parameter of the function is the default value function的第二个参数为默认值
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.