简体   繁体   中英

Create a new column using str.contains and where the condition fails, set it to null (NaN)

I am trying to create a new column in my pandas dataframe, but only with a value if another column contains a certain string.

My dataframe looks something like this:

    raw                                     val1    val2  
0   Vendor Invoice Numbe Inv Date                        
1   Vendor: Company Name 1                  123     456   
2   13445 07708-20-2019 US                  432     676   
3   79935 19028808-15-2019 US               444     234   
4   Vendor: company Name 2                  234     234  

I am trying to create a new column, vendor that transforms the dataframe into:

    raw                                     val1    val2  vendor
0   Vendor Invoice Numbe Inv Date                         Vendor Invoice Numbe Inv Date
1   Vendor: Company Name 1                  123     456   Vendor: Company Name 1 
2   13445 07708-20-2019 US                  432     676   NaN
3   79935 19028808-15-2019 US               444     234   NaN
4   Vendor: company Name 2                  234     234   company Name 2  
5   Vendor: company Name 2                  928     528   company Name 2  

However, whenever I try,

df['vendor'] = df.loc[df['raw'].str.contains('Vendor', na=False), 'raw']

I get the error

ValueError: cannot reindex from a duplicate axis

I know that at index 4 and 5 it's the same value for the company, but what am I doing wrong and how to I add the new column to my dataframe?

The problem is df.loc[df['raw'].str.contains('Vendor', na=False), 'raw'] as different length than df .

You can try np.where , which assigns a new columns by an np.array of the same size, so it doesn't need index alignment.

df['vendor'] = np.where(df['raw'].str.contains('Vendor'), df['raw'], np.NaN)

您可以.extract()Vendor:后面的字符串部分Vendor:使用正向后面:

df['vendor'] = df['raw'].str.extract(r'(?<=Vendor:\\s)(.*)')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM