简体   繁体   English

在 2 个条件下从 pandas dataframe 列中提取数字

[英]Extracting number from in pandas dataframe column with 2 conditions

i have this dataframe with 1 column in python and i need to extract out the 2 numbers that come after a "#" character or the string "door" .我有这个 dataframe 在 python 中有 1 列,我需要提取“#”字符或字符串“门”之后的 2 个数字。 Example of this would be这方面的例子是

String_column
#12-123,456
mom101, door 101, pop10

i only want the 2 numbers that come after the # sign or the word door.我只想要 # 符号或单词 door 之后的 2 个数字。 how would i go about doing this.我将如何 go 这样做。 This is what i currently have but i think this only takes in the numbers that come after the # key这是我目前拥有的,但我认为这只包含 # 键之后的数字

import pandas as pd

df = pd.read_csv(data.csv)
df['qwerty'] = df.string_column.str.extract(
     r'(?<=#)(\d+)', expand=False
).fillna(0).astype(int)

You can use df.loc combined with apply which will get all the indexes that are true.您可以将df.locapply结合使用,这将获得所有为真的索引。

Here is a simple example这是一个简单的例子

In [5]: df=  pd.DataFrame({'String_column':['not useful', 'door useful1', '! useful 2', 'not useful']})                                         

In [6]: df                                                                                                                                      
Out[6]: 
  String_column
0    not useful
1  door useful1
2    ! useful 2
3    not useful

Now using our function现在使用我们的 function

In [7]: df.loc[df['String_column'].apply(lambda x: True if x.startswith('!') or x.startswith('door') else False)]                               
Out[7]: 
  String_column
1  door useful1
2    ! useful 2

We used startswith to match all our conditions to get the useful values that starts with '.'我们使用startswith来匹配我们所有的条件以获得以'.'开头的有用值。 or 'door'.或“门”。

IIUC, you can use a non capturing group to list your different options ( # or door\s* ): IIUC,您可以使用非捕获组来列出您的不同选项( #door\s* ):

df['num'] = (df['String_column'].str.extract(r'(?:#|door\s*)(\d+)', expand=False)
             .fillna(0).astype(int)
            )

output: output:

             String_column  num
0              #12-123,456   12
1  mom101, door 101, pop10  101

regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM