简体   繁体   English

正则表达式:如何用空格/连字符(不包括数字)捕获单词?

[英]Regex: How to capture words with spaces/hyphens excluding numbers?

I have a dataset that looks like this: 我有一个看起来像这样的数据集:

Column1
-------
abcd - efghi 1234
aasdas - asdas 54321
asda-asd 2344
aasdas(asd) 5234

I want to be able to pull everything out that will exclude a number so it will look like this: 我希望能够提取出排除数字的所有内容,因此它看起来像这样:

Column2
-------
abcd - efghi
aasdas - asdas
asda-asd
aasdas(asd)

This is my current regex: 这是我当前的正则表达式:

df['Column2'] = df['Column1'].str.extract('([A-Z]\w{0,})', expand=True)

But it only extracts out the first word that excludes parenthesis and hyphens. 但是它只会提取出不包括括号和连字符的第一个单词。 Any help will be appreciated...thank you! 任何帮助将不胜感激...谢谢!

Like using replace 喜欢使用replace

df.Column1.str.replace('\d+','')
Out[775]: 
0      abcd-efghi 
1    aasdas-asdas 
2        asda-asd 
3     aasdas(asd) 
Name: Column1, dtype: object
#df.Column1=df.Column1.str.replace('\d+','')

Just removing numbers will leave you with unwanted space characters. 仅删除数字会留下多余的空格字符。

This list comprehension removes all digits and keeps space characters, but removes them on the outside. 此列表理解将删除所有数字并保留空格字符,但在外部将其删除。

df['Column2'] = df['Column1'].apply(
                   lambda x: ''.join([i for i in x if not i.isdigit()]).strip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM