从 Pandas 的行中删除特定模式

Question

我想弄清楚如何计算以数字开头的行，例如：

My_col

24 was 2020 - There is a lot -
23 aka 2018 -  how many ...
23 was 2020 - wonderful!
no numbers this time

并且，仅当以数字开头时，删除-之前三个单词之前的单词：

My_col

There is a lot -
how many ...
wonderful!
no numbers this time

使用 SQL 我会按如下方式进行检查：

SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(My_Col), 1, 1)) = 1 
         THEN 'yes' 
         ELSE 'no' 
       END AS StartsWithNumber
FROM my_data

我认为之前删除单词-我应该考虑使用np.where或regex然后apply 。

Answer 1

df = pd.DataFrame({'My_col': [
          "24 was 2020 - There is a lot -", 
          "no numbers this time"] })

df['My_col'].apply(
    lambda x: x[x.find("-")+1:].strip() if x[0].isdigit() else x)

输出：

0        There is a lot -
1    no numbers this time

Answer 2

使用df.replace()和正则表达式。 我在第 4 行添加了一个-以显示不删除单词：

import pandas as pd

data = {'My_col':['24 was 2020 - There is a lot -', '23 aka 2018 -  how many ...', '23 was 2020 - wonderful!', 'no numbers this - time']}
df = pd.DataFrame(data)

df['My_col'].replace(r'^\d.*?-','', regex=True, inplace = True)
print(df)

                   My_col
0        There is a lot -
1            how many ...
2              wonderful!
3  no numbers this - time

从 Pandas 的行中删除特定模式

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-10-07 18:29:12

解决方案2
0 2020-10-07 18:39:04

从 Pandas 的行中删除特定模式

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-10-07 18:29:12

解决方案2 0 2020-10-07 18:39:04

解决方案1
1 已采纳 2020-10-07 18:29:12

解决方案2
0 2020-10-07 18:39:04