[英]Pandas: writing values to a target column based on values in a source column without overwriting any existing values in the target column
我有以下數據框:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Manufacturer':['Mercedes', 'BMW', 'Mercedes', 'Audi', 'Honda', 'Aston Martin', 'Audi', 'Jeep', 'Land Rover'],
'Color':['Blue', 'White', 'Black', 'Green', 'Red', 'White', 'Silver', 'Silver', 'Blue'],
'Country':['United States', '["United States", "Mexico"]', 'Ireland', 'Japan', '["United States","Canada"]', 'Sweden', 'United Kingdom', 'United Kingdom', '["Brazil","United States","Canada"]'],
'Region':['Americas','','Europe','Asia','','Europe', 'Europe', 'Europe', '']
})
Manufacturer Color Country Region
0 Mercedes Blue United States Americas
1 BMW White ['United States','Mexico']
2 Mercedes Black Ireland Europe
3 Audi Green Japan Asia
4 Honda Red ['Canada','United States']
5 Aston Martin White Sweden Europe
6 Audi Silver United Kingdom Europe
7 Jeep Silver United Kingdom Europe
8 Land Rover Blue ['Brazil','United States','Canada']
如果出現以下情況,我想在Region
列中寫入“美洲”:
a) Region
列中沒有現有值,並且
b) Country
列在字符串的某處有“United States”
可以使用np.where
,如下所示:
df['Region'] = np.where(df['Country'].str.contains('United States'), 'Americas', '**ERROR**')
但是,這種方法會覆蓋Region
列中的現有值:
Manufacturer Color Country Region
0 Mercedes Blue United States Americas
1 BMW White ["United States", "Japan"] Americas
2 Mercedes Black Ireland **ERROR**
3 Audi Green Japan **ERROR**
4 Honda Red ["United States","Canada"] Americas
5 Aston Martin White Sweden **ERROR**
6 Audi Silver United Kingdom **ERROR**
7 Jeep Silver United Kingdom **ERROR**
8 Land Rover Blue ["Brazil","United States","Canada"] Americas
在不覆蓋Region
列中的任何現有值的情況下執行此操作的最佳方法是什么?
提前致謝!
通過稍微修改代碼,您可以輕松地使用自己的方法。 我希望這段代碼能解決你的問題:
df['Region'] = np.where((df['Region'].isnull()|(df['Region']==''))&(df['Country'].str.contains('United States')), 'Americas', df['Region'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.