[英]Filling out empty cells based on another column
I want to match/map the missing value in a dataframe based on another column. 我想匹配/映射基于另一列的数据框中的缺失值。 For example,
例如,
City State Country
Chicago IL United States
Boston MA United States
San Diego
Los Angeles CA United States
San Francisco
Sacramento
Vancouver BC Canada
So, if I want to fill in the empty cells of the provinces and countries of those three cities same as Los Angeles. 因此,如果我要填写与洛杉矶相同的三个城市的省和国家的空白单元格。 What should I do?
我该怎么办?
Below is my code but I'm completely stuck in it. 下面是我的代码,但我完全陷入其中。
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df.loc[df['City'] == CA_cities, 'State' = 'CA' and 'Country' = 'United States']
Any help will be greatly appreciated. 任何帮助将不胜感激。
You can use groupby
with mask created by isin
, then replace NaN
s by back and forward filling: 您可以使用
groupby
通过创建面膜isin
,然后替换NaN
的前进和后退填充S:
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Boston MA United States
2 San Diego CA United States
3 Los Angeles CA United States
4 San Francisco CA United States
5 Sacramento CA United States
6 Vancouver BC Canada
More general solution is create groups of cities, eg in dictionaries, swap keys
wih values and map
column: 更通用的解决方案是创建城市组,例如在词典中,使用值和
map
列交换keys
:
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 NaN NaN
2 Boston MA United States
3 San Diego NaN NaN
4 Los Angeles CA United States
5 San Francisco NaN NaN
6 Sacramento NaN NaN
7 Vancouver BC Canada
cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'],
'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}
df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 IL United States
2 Boston MA United States
3 San Diego CA United States
4 Los Angeles CA United States
5 San Francisco CA United States
6 Sacramento CA United States
7 Vancouver BC Canada
Detail : 详细说明 :
print (df['City'].map(d).fillna(df['City']))
0 IL
1 IL
2 Boston
3 CA
4 CA
5 CA
6 CA
7 Vancouver
Name: City, dtype: object
print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA',
'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}
Or just split it , and using fillna
. 或者只是将其拆分,然后使用
fillna
。
CA_cities = ['SanDiego', 'LosAngeles', 'SanFrancisco', 'Sacramento']
s=df.loc[df.City.isin(CA_cities),:]
t=df.loc[~df.City.isin(CA_cities),:]
pd.concat([s.fillna({'State':'CA','Country':'UnitedStates'}),t])
Out[1023]:
City State Country
2 SanDiego CA UnitedStates
3 LosAngeles CA UnitedStates
4 SanFrancisco CA UnitedStates
5 Sacramento CA UnitedStates
0 Chicago IL UnitedStates
1 Boston MA UnitedStates
6 Vancouver BC Canada
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.