[英]Removing entries from Pandas DF beginning with letter and two numbers
I am curious as to how to remove string entries from a Pandas DF beginning with a letter and two numbers and replacing with NaN. 我很好奇如何从熊猫DF中删除以字母和两个数字开头并用NaN代替的字符串条目。
A B C D
Apple Pear N45 82f John
Cat P48 hH2 Mary Sponge
Hat P67 De1 Bed S90 GGGF
I would like to replace all entries across the DF beginning with a letter and two numbers with NaN. 我想用NaN替换DF中所有以字母和两个数字开头的条目。
I have tried something along the lines of 我已经尝试了一些方法
for columns in df.columns[1:]:
for i in columns:
if i[0].isalpha() and i[1].isdigit and i.[2].isdigit():
i.replace(i,None)
Unfortunately this not seem to function. 不幸的是,这似乎不起作用。 Any help would be appreciated.
任何帮助,将不胜感激。
You can try this: 您可以尝试以下方法:
df.mask(df.apply(lambda r: r.str.contains('[a-zA-Z]{1}\d{2}')))
Output: 输出:
A B C D
0 Apple Pear NaN John
1 Cat NaN Mary Sponge
2 Hat NaN Bed NaN
I like @coldspeed's stack too: 我也喜欢@coldspeed的堆栈:
df[~df.stack().str.contains('[a-zA-Z]{1}\d{2}').unstack()]
Output: 输出:
A B C D
0 Apple Pear NaN John
1 Cat NaN Mary Sponge
2 Hat NaN Bed NaN
Use stack
and str.extract
with a pattern that does not match what you want to match (this way, they're replaced with NaNs). 使用
stack
和str.extract
的模式与您要匹配的模式不匹配(这样,它们将被NaN取代)。
df.stack().str.extract(r'(^[^a-z]\D{2}.*)').unstack()[0]
A B C D
0 Apple Pear NaN John
1 Cat NaN Mary Sponge
2 Hat NaN Bed NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.