[英]Pandas: writing values to a target column based on values in a source column without overwriting any existing values in the target column
I have the following data frame:我有以下数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Manufacturer':['Mercedes', 'BMW', 'Mercedes', 'Audi', 'Honda', 'Aston Martin', 'Audi', 'Jeep', 'Land Rover'],
'Color':['Blue', 'White', 'Black', 'Green', 'Red', 'White', 'Silver', 'Silver', 'Blue'],
'Country':['United States', '["United States", "Mexico"]', 'Ireland', 'Japan', '["United States","Canada"]', 'Sweden', 'United Kingdom', 'United Kingdom', '["Brazil","United States","Canada"]'],
'Region':['Americas','','Europe','Asia','','Europe', 'Europe', 'Europe', '']
})
Manufacturer Color Country Region
0 Mercedes Blue United States Americas
1 BMW White ['United States','Mexico']
2 Mercedes Black Ireland Europe
3 Audi Green Japan Asia
4 Honda Red ['Canada','United States']
5 Aston Martin White Sweden Europe
6 Audi Silver United Kingdom Europe
7 Jeep Silver United Kingdom Europe
8 Land Rover Blue ['Brazil','United States','Canada']
I would like to write "Americas" to the Region
column if:如果出现以下情况,我想在
Region
列中写入“美洲”:
a) there is no existing value in the Region
column, and a)
Region
列中没有现有值,并且
b) the Country
column has "United States" somewhere in the string b)
Country
列在字符串的某处有“United States”
It's possible to use np.where
, as follows:可以使用
np.where
,如下所示:
df['Region'] = np.where(df['Country'].str.contains('United States'), 'Americas', '**ERROR**')
But, this approach overwrites the existing values in the Region
column:但是,这种方法会覆盖
Region
列中的现有值:
Manufacturer Color Country Region
0 Mercedes Blue United States Americas
1 BMW White ["United States", "Japan"] Americas
2 Mercedes Black Ireland **ERROR**
3 Audi Green Japan **ERROR**
4 Honda Red ["United States","Canada"] Americas
5 Aston Martin White Sweden **ERROR**
6 Audi Silver United Kingdom **ERROR**
7 Jeep Silver United Kingdom **ERROR**
8 Land Rover Blue ["Brazil","United States","Canada"] Americas
What's the best way to do this without overwriting any existing values in the Region
column?在不覆盖
Region
列中的任何现有值的情况下执行此操作的最佳方法是什么?
Thanks in advance!提前致谢!
You can easily do with your own approach by a little twisting in your code.通过稍微修改代码,您可以轻松地使用自己的方法。 I hope this code will solve your problem:
我希望这段代码能解决你的问题:
df['Region'] = np.where((df['Region'].isnull()|(df['Region']==''))&(df['Country'].str.contains('United States')), 'Americas', df['Region'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.