简体   繁体   English

如何从python中的字符串中提取一定长度的数字? [重复]

[英]How to extract certain length of numbers from a string in python? [duplicate]

I have a dataframe which looks like this:我有一个看起来像这样的数据框:

description     
1906 RES 330 ML
1906 RES 330ML
RES 335 c/6
RES 332 c/12

I want to extract the three consecutive digits of numbers and save it in a new column 'volume'.我想提取数字的三个连续数字并将其保存在新的“音量”列中。 My code is like this:我的代码是这样的:

df['volume'] = df['description'].str.extract('([([\d]*[\d]){3,3}?])')

EXPECTED RESULTS SHOULD BE LIKE THIS:预期结果应该是这样的:

volume
330
330
335
332

However, it gives the results like this:但是,它给出了这样的结果:

volume
1906
1906
335
332

Can anyone help me fix this code?谁能帮我修复这个代码? Thanks so much!!!非常感谢!!!

Might be overkill, but if you want to make sure you don't capture numbers that are part of 4 digit numbers, you might use this:可能有点矫枉过正,但如果您想确保不捕获属于 4 位数字的数字,您可以使用以下命令:

df['volume'] = df.description.str.extract(r'(?<!\d)(\d{3})(?!\d)', expand=False)    
print(df)

       description volume
0  1906 RES 330 ML    330
1   1906 RES 330ML    330
2      RES 335 c/6    335
3     RES 332 c/12    332

Specify expand=False , so that matches are returned as one pd.Series only.指定expand=False ,以便匹配仅作为一个pd.Series返回。


The regex:正则表达式:

  • (?<!\\d) - specifies that anything before a set of 3 digits is something that is not a digit (?<!\\d) - 指定在一组 3 位数字之前的任何东西都不是数字
  • (\\d{3}) - matches 3 digits (\\d{3}) - 匹配 3 个数字
  • (?!\\d) - specifies that anything after a set of 3 digits is something that is not a digit (?!\\d) - 指定一组 3 位数字之后的任何内容都不是数字

You need to你需要

  • not match any number of digits, three times, so delete the [\\d]*不匹配任何数字,三次,所以删除[\\d]*
  • not match 3 digits within anything looking like a "word",不匹配任何看起来像“单词”的 3 位数字,
    especially not other digits, so use word boundary \\b尤其不是其他数字,所以使用词边界\\b
  • not allow optional ?不允许可选?
  • not overdo the character set thing []不要过分字符集的事情[]

You do not need to:您不需要:

  • use two capture groups ()使用两个捕获组()

This regex will find exactly three digits, alone:此正则表达式将仅找到三位数字:

\b(\d{3})\b

The regex you are looking for is \\b[\\d]{3}\\b您正在寻找的正则表达式是\\b[\\d]{3}\\b

for more information on \\b see docs有关\\b更多信息,请参阅文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM