简体   繁体   English

Pandas:根据源列中的值将值写入目标列,而不覆盖目标列中的任何现有值

[英]Pandas: writing values to a target column based on values in a source column without overwriting any existing values in the target column

I have the following data frame:我有以下数据框:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Manufacturer':['Mercedes', 'BMW', 'Mercedes', 'Audi', 'Honda', 'Aston Martin', 'Audi', 'Jeep', 'Land Rover'],
                       'Color':['Blue', 'White', 'Black', 'Green', 'Red', 'White', 'Silver', 'Silver', 'Blue'],
                       'Country':['United States', '["United States", "Mexico"]', 'Ireland', 'Japan', '["United States","Canada"]', 'Sweden', 'United Kingdom', 'United Kingdom', '["Brazil","United States","Canada"]'],
                       'Region':['Americas','','Europe','Asia','','Europe', 'Europe', 'Europe', '']    
                  })


    Manufacturer    Color   Country                               Region
0   Mercedes        Blue    United States                         Americas
1   BMW             White   ['United States','Mexico']  
2   Mercedes        Black   Ireland                               Europe
3   Audi            Green   Japan                                 Asia
4   Honda           Red     ['Canada','United States']  
5   Aston Martin    White   Sweden                                Europe
6   Audi            Silver  United Kingdom                        Europe
7   Jeep            Silver  United Kingdom                        Europe
8   Land Rover      Blue    ['Brazil','United States','Canada']  

I would like to write "Americas" to the Region column if:如果出现以下情况,我想在Region列中写入“美洲”:

a) there is no existing value in the Region column, and a) Region列中没有现有值,并且

b) the Country column has "United States" somewhere in the string b) Country列在字符串的某处有“United States”

It's possible to use np.where , as follows:可以使用np.where ,如下所示:

df['Region'] = np.where(df['Country'].str.contains('United States'), 'Americas', '**ERROR**')

But, this approach overwrites the existing values in the Region column:但是,这种方法会覆盖Region列中的现有值:

    Manufacturer    Color   Country                                Region
0   Mercedes        Blue    United States                          Americas
1   BMW             White   ["United States", "Japan"]             Americas
2   Mercedes        Black   Ireland                                **ERROR**
3   Audi            Green   Japan                                  **ERROR**
4   Honda           Red     ["United States","Canada"]             Americas
5   Aston Martin    White   Sweden                                 **ERROR**
6   Audi            Silver  United Kingdom                         **ERROR**
7   Jeep            Silver  United Kingdom                         **ERROR**
8   Land Rover      Blue    ["Brazil","United States","Canada"]    Americas

What's the best way to do this without overwriting any existing values in the Region column?在不覆盖Region列中的任何现有值的情况下执行此操作的最佳方法是什么?

Thanks in advance!提前致谢!

You can easily do with your own approach by a little twisting in your code.通过稍微修改代码,您可以轻松地使用自己的方法。 I hope this code will solve your problem:我希望这段代码能解决你的问题:


df['Region'] = np.where((df['Region'].isnull()|(df['Region']==''))&(df['Country'].str.contains('United States')), 'Americas', df['Region'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM