简体   繁体   English

从 python 中的字符串中提取数字

[英]Extract numbers from a string in python

I am trying to extract only area numbers from a column in pandas dataframe: 568 sq mi (1,471 km2) here I only want 568 the space between the number and sq is unbreakable space.我试图从 pandas dataframe: 568 sq mi (1,471 km2) 的列中仅提取区域编号,我只想要 568 数字和 sq 之间的空间是牢不可破的空间。

you can probably do this你可能可以这样做

df[col].apply(lambda x:x[:3])

this will extract the starting number for the whole column, change the df with your data frame name and col with your column name这将提取整列的起始编号,使用您的数据框名称更改 df 并使用您的列名称更改 col

So try using regex on the string.所以尝试在字符串上使用正则表达式。

Eg:例如:

import re
str = "568.78 sq mi (1,471 km2)"
num = re.findall(r"[0-9]+", str)
print(num[0])

Output: Output:

568.78

Since its on the columns of dataframe try something like this.由于它在 dataframe 的列上,请尝试这样的操作。

def fmt(row):
   number = re.findall(r"[0-9,.]+",row)
   return number[0]

numbers = list(map(fmt,df[col]))
df['fmt area'] = numbers

Think this should work.认为这应该有效。

You can use the.str method and use extract and pass regular expression pattern.您可以使用 .str 方法并使用提取和传递正则表达式模式。

import pandas as pd
df = pd.DataFrame({"Area":["568 sq mi (1,471 km2)"]})

df["area changed"] = df.Area.str.extract(r"(\d+ \w+)")

输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM