从 python 中的字符串集中删除不需要的字符

Question

我正在尝试清理一组字符串以删除不需要的字符。

输入

Lethal Lunch t5+ 0 0 D 10 t5+ Michael Bell . Alex Jary7 .
Muscika 1 v5+ W5+ 0 0 D 5 v5+ W5+ D O'Meara . Cam Hardie . C5
Typhoon Ten 1 0 0 D 13 R Hannon . Luke Catton7 .
Wentworth Falls 1 cp5+ 0 0 C D 45 cp5+ G Harker . Connor Beasley .
One Night Stand 0 0 D 34 W Jarvis . Silvestre De Sousa . 30 C1 C5
Dancinginthewoods 1 0 0 D 24 D Ivory . 14 Jamie Spencer . 30
Case Key 1 v3 0 0 D 13 v3 M Appleby . Andrew Mullen . 14

通缉 Output

Lethal Lunch
Muscika
Typhoon Ten
Wentworth Falls
One Night Stand
Dancinginthewoods 
Case Key

我试过这个

re.findall('([a-zA-Z ]*)\d*.*',final_df.loc[index, 'Horse'])

这会删除数字后的所有内容，但会将 t 留在第一个条目上。 我想知道是否有更好的方法？

Answer 1

我会改用re.split ：

for d in data.splitlines():
    print(re.split(r'\s+t?[0-9]\+?', d)[0])

结果

Lethal Lunch 
Muscika 
Typhoon Ten 
Wentworth Falls 
One Night Stand 
Dancinginthewoods 
Case Key

解释：它在指定模式匹配的地方分割字符串，然后取第一部分。 您可能想要调整它以便其他模式也匹配。

在 Pandas

我刚刚注意到您似乎正在使用 Pandas - 假设您的 df 看起来像这样：

                                               Horse
0  Lethal Lunch t5+ 0 0 D 10 t5+ Michael Bell . A...
1  Muscika 1 v5+ W5+ 0 0 D 5 v5+ W5+ D O'Meara . ...
2  Typhoon Ten 1 0 0 D 13 R Hannon . Luke Catton7 .
3  Wentworth Falls 1 cp5+ 0 0 C D 45 cp5+ G Harke...
4  One Night Stand 0 0 D 34 W Jarvis . Silvestre ...
5  Dancinginthewoods 1 0 0 D 24 D Ivory . 14 Jami...
6  Case Key 1 v3 0 0 D 13 v3 M Appleby . Andrew M...

你可以做

from operator import itemgetter

df["name"] = df.Horse.str.split('\s+t?[0-9]\+?').map(itemgetter(0))

得到这个：

                                               Horse               name
0  Lethal Lunch t5+ 0 0 D 10 t5+ Michael Bell . A...       Lethal Lunch
1  Muscika 1 v5+ W5+ 0 0 D 5 v5+ W5+ D O'Meara . ...            Muscika
2  Typhoon Ten 1 0 0 D 13 R Hannon . Luke Catton7 .        Typhoon Ten
3  Wentworth Falls 1 cp5+ 0 0 C D 45 cp5+ G Harke...    Wentworth Falls
4  One Night Stand 0 0 D 34 W Jarvis . Silvestre ...    One Night Stand
5  Dancinginthewoods 1 0 0 D 24 D Ivory . 14 Jami...  Dancinginthewoods
6  Case Key 1 v3 0 0 D 13 v3 M Appleby . Andrew M...           Case Key

Answer 2

像这样的东西应该工作：

filtered_text = list()

for line in text:
    part = ""
    for word in text.split(" "):
        if len(word) <= 3:
            break
        else:
            part = str(part) + " " + str(word)

    part = part[1:] # skip first space
    filtered_text.append(part)

Answer 3

这样的事情就足够了吗？

input = [
    "Lethal Lunch t5+ 0 0 D 10 t5+ Michael Bell . Alex Jary7 .",
    "Muscika 1 v5+ W5+ 0 0 D 5 v5+ W5+ D O'Meara . Cam Hardie . C5",
    "Typhoon Ten 1 0 0 D 13 R Hannon . Luke Catton7 .",
    "Wentworth Falls 1 cp5+ 0 0 C D 45 cp5+ G Harker . Connor Beasley .",
    "One Night Stand 0 0 D 34 W Jarvis . Silvestre De Sousa . 30 C1 C5",
    "Dancinginthewoods 1 0 0 D 24 D Ivory . 14 Jamie Spencer . 30",
    "Case Key 1 v3 0 0 D 13 v3 M Appleby . Andrew Mullen . 14",
]

for inp in input:
    print(re.findall(r'\b[a-zA-Z ]+\b', inp)[0])

我们基本上会忽略带有数字或奇怪符号的单词。 output：

Lethal Lunch 
Muscika 
Typhoon Ten 
Wentworth Falls 
One Night Stand 
Dancinginthewoods 
Case Key

从 python 中的字符串集中删除不需要的字符

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-06-11 10:49:05

结果

在 Pandas

解决方案2
0 2021-06-11 10:55:05

解决方案3
0 2021-06-11 11:03:12

从 python 中的字符串集中删除不需要的字符

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-06-11 10:49:05

结果

在 Pandas

解决方案2 0 2021-06-11 10:55:05

解决方案3 0 2021-06-11 11:03:12

解决方案1
1 已采纳 2021-06-11 10:49:05

解决方案2
0 2021-06-11 10:55:05

解决方案3
0 2021-06-11 11:03:12