[英]Replacing values in DataFrame column based on values in another column
To try, I have: 试试,我有:
test = pd.DataFrame([[1,'A', 'B', 'A B r'], [0,'A', 'B', 'A A A'], [2,'B', 'C', 'B a c'], [1,'A', 'B', 's A B'], [1,'A', 'B', 'A'], [0,'B', 'C', 'x']])
replace = [['x', 'y', 'z'], ['r', 's', 't'], ['a', 'b', 'c']]
I would like to replace parts of values in the last column with 0 only if they exist in the replace
list at position corresponding to the number in the first column for that row. 我想将最后一列中的部分值替换为0,只要它们存在于replace
列表中与该行第一列中的数字对应的位置。
For example, looking at the first three rows: 例如,查看前三行:
So, since 'r' is in replace[1]
, that cell becomes AB 0
. 因此,由于'r'在replace[1]
,因此该单元格变为AB 0
。 'A' is not in replace[0]
, so it stays as AAA
, 'a' and 'c' are both in replace[2]
, so it becomes B 0 0
, etc. 'A'不是replace[0]
,因此它保持为AAA
,'a'和'c'都在replace[2]
,因此它变为B 0 0
等。
I tried something like 我试过类似的东西
test[3] = test[3].apply(lambda x: ' '.join([n if n not in replace[test[0]] else 0 for n in test.split()]))
but it's not changing anything. 但它没有改变任何东西。
IIUC, use zip
and a list comprehension to accomplish this. IIUC,使用zip
和列表理解来实现这一目标。
I've simplified and created a custom replace_
function, but feel free to use regex
to perform the replacement if needed. 我已经简化并创建了一个自定义的replace_
函数,但如果需要,可以随意使用regex
来执行替换。
def replace_(st, reps):
for old,new in reps:
st = st.replace(old,new)
return st
df['new'] = [replace_(b, zip(replace[a], ['0']*3)) for a,b in zip(df[0], df[3])]
Outputs 输出
0 1 2 3 new
0 1 A B A B r A B 0
1 0 A B A A A A A A
2 2 B C B a c B 0 0
3 1 A B s A B 0 A B
4 1 A B A A
5 0 B C x 0
Use list comprehension with lookup in sets: 使用列表理解和集合中的查找:
test[3] = [' '.join('0' if i in set(replace[a]) else i for i in b.split())
for a,b in zip(test[0], test[3])]
print (test)
0 1 2 3
0 1 A B A B 0
1 0 A B A A A
2 2 B C B 0 0
3 1 A B 0 A B
4 1 A B A
5 0 B C 0
Or convert to sets before for improve performance: 或者之前转换为集合以提高性能:
r = [set(x) for x in replace]
test[3]=[' '.join('0' if i in r[a] else i for i in b.split()) for a,b in zip(test[0], test[3])]
Finally I know what you need 最后我知道你需要什么
s=pd.Series(replace).reindex(test[0])
[ "".join([dict.fromkeys(y,'0').get(c, c) for c in x]) for x,y in zip(test[3],s)]
['A B 0', 'A A A', 'B 0 0', '0 A B', 'A', '0']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.