[英]Extracting number from in pandas dataframe column with 2 conditions
i have this dataframe with 1 column in python and i need to extract out the 2 numbers that come after a "#" character or the string "door" .我有这个 dataframe 在 python 中有 1 列,我需要提取“#”字符或字符串“门”之后的 2 个数字。 Example of this would be这方面的例子是
String_column
#12-123,456
mom101, door 101, pop10
i only want the 2 numbers that come after the # sign or the word door.我只想要 # 符号或单词 door 之后的 2 个数字。 how would i go about doing this.我将如何 go 这样做。 This is what i currently have but i think this only takes in the numbers that come after the # key这是我目前拥有的,但我认为这只包含 # 键之后的数字
import pandas as pd
df = pd.read_csv(data.csv)
df['qwerty'] = df.string_column.str.extract(
r'(?<=#)(\d+)', expand=False
).fillna(0).astype(int)
You can use df.loc
combined with apply
which will get all the indexes that are true.您可以将df.loc
与apply
结合使用,这将获得所有为真的索引。
Here is a simple example这是一个简单的例子
In [5]: df= pd.DataFrame({'String_column':['not useful', 'door useful1', '! useful 2', 'not useful']})
In [6]: df
Out[6]:
String_column
0 not useful
1 door useful1
2 ! useful 2
3 not useful
Now using our function现在使用我们的 function
In [7]: df.loc[df['String_column'].apply(lambda x: True if x.startswith('!') or x.startswith('door') else False)]
Out[7]:
String_column
1 door useful1
2 ! useful 2
We used startswith to match all our conditions to get the useful values that starts with '.'我们使用startswith来匹配我们所有的条件以获得以'.'开头的有用值。 or 'door'.或“门”。
IIUC, you can use a non capturing group to list your different options ( #
or door\s*
): IIUC,您可以使用非捕获组来列出您的不同选项( #
或door\s*
):
df['num'] = (df['String_column'].str.extract(r'(?:#|door\s*)(\d+)', expand=False)
.fillna(0).astype(int)
)
output: output:
String_column num
0 #12-123,456 12
1 mom101, door 101, pop10 101
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.