繁体   English   中英

从 Pandas 的行中删除特定模式

[英]Removing specific patterns from rows in pandas

我想弄清楚如何计算以数字开头的行,例如:

My_col

24 was 2020 - There is a lot -
23 aka 2018 -  how many ...
23 was 2020 - wonderful!
no numbers this time

并且,仅当以数字开头时,删除-之前三个单词之前的单词:

My_col

There is a lot -
how many ...
wonderful!
no numbers this time

使用 SQL 我会按如下方式进行检查:

SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(My_Col), 1, 1)) = 1 
         THEN 'yes' 
         ELSE 'no' 
       END AS StartsWithNumber
FROM my_data 

我认为之前删除单词-我应该考虑使用np.whereregex然后apply

df = pd.DataFrame({'My_col': [
          "24 was 2020 - There is a lot -", 
          "no numbers this time"] })

df['My_col'].apply(
    lambda x: x[x.find("-")+1:].strip() if x[0].isdigit() else x)

输出:

0        There is a lot -
1    no numbers this time

使用df.replace()和正则表达式。 我在第 4 行添加了一个-以显示不删除单词:

import pandas as pd

data = {'My_col':['24 was 2020 - There is a lot -', '23 aka 2018 -  how many ...', '23 was 2020 - wonderful!', 'no numbers this - time']}
df = pd.DataFrame(data)

df['My_col'].replace(r'^\d.*?-','', regex=True, inplace = True)
print(df)

                   My_col
0        There is a lot -
1            how many ...
2              wonderful!
3  no numbers this - time

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM