[英]Removing specific patterns from rows in pandas
我想弄清楚如何计算以数字开头的行,例如:
My_col
24 was 2020 - There is a lot -
23 aka 2018 - how many ...
23 was 2020 - wonderful!
no numbers this time
并且,仅当以数字开头时,删除-
之前三个单词之前的单词:
My_col
There is a lot -
how many ...
wonderful!
no numbers this time
使用 SQL 我会按如下方式进行检查:
SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(My_Col), 1, 1)) = 1
THEN 'yes'
ELSE 'no'
END AS StartsWithNumber
FROM my_data
我认为之前删除单词-
我应该考虑使用np.where
或regex
然后apply
。
df = pd.DataFrame({'My_col': [
"24 was 2020 - There is a lot -",
"no numbers this time"] })
df['My_col'].apply(
lambda x: x[x.find("-")+1:].strip() if x[0].isdigit() else x)
输出:
0 There is a lot -
1 no numbers this time
使用df.replace()
和正则表达式。 我在第 4 行添加了一个-
以显示不删除单词:
import pandas as pd
data = {'My_col':['24 was 2020 - There is a lot -', '23 aka 2018 - how many ...', '23 was 2020 - wonderful!', 'no numbers this - time']}
df = pd.DataFrame(data)
df['My_col'].replace(r'^\d.*?-','', regex=True, inplace = True)
print(df)
My_col
0 There is a lot -
1 how many ...
2 wonderful!
3 no numbers this - time
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.