简体   繁体   English

根据另一列填写空白单元格

[英]Filling out empty cells based on another column

I want to match/map the missing value in a dataframe based on another column. 我想匹配/映射基于另一列的数据框中的缺失值。 For example, 例如,

         City         State              Country
      Chicago            IL        United States
       Boston            MA        United States
    San Diego            
  Los Angeles            CA        United States
San Francisco
   Sacramento     
    Vancouver            BC               Canada

So, if I want to fill in the empty cells of the provinces and countries of those three cities same as Los Angeles. 因此,如果我要填写与洛杉矶相同的三个城市的省和国家的空白单元格。 What should I do? 我该怎么办?

Below is my code but I'm completely stuck in it. 下面是我的代码,但我完全陷入其中。

CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df.loc[df['City'] == CA_cities, 'State' = 'CA' and 'Country' = 'United States']

Any help will be greatly appreciated. 任何帮助将不胜感激。

You can use groupby with mask created by isin , then replace NaN s by back and forward filling: 您可以使用groupby通过创建面膜isin ,然后替换NaN的前进和后退填充S:

CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']

df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
            City State        Country
0        Chicago    IL  United States
1         Boston    MA  United States
2      San Diego    CA  United States
3    Los Angeles    CA  United States
4  San Francisco    CA  United States
5     Sacramento    CA  United States
6      Vancouver    BC         Canada

More general solution is create groups of cities, eg in dictionaries, swap keys wih values and map column: 更通用的解决方案是创建城市组,例如在词典中,使用值和map列交换keys

print (df)
            City State        Country
0        Chicago    IL  United States
1       Chicago1   NaN            NaN
2         Boston    MA  United States
3      San Diego   NaN            NaN
4    Los Angeles    CA  United States
5  San Francisco   NaN            NaN
6     Sacramento   NaN            NaN
7      Vancouver    BC         Canada

cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'], 
          'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}

df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
            City State        Country
0        Chicago    IL  United States
1       Chicago1    IL  United States
2         Boston    MA  United States
3      San Diego    CA  United States
4    Los Angeles    CA  United States
5  San Francisco    CA  United States
6     Sacramento    CA  United States
7      Vancouver    BC         Canada

Detail : 详细说明

print (df['City'].map(d).fillna(df['City']))
0           IL
1           IL
2       Boston
3           CA
4           CA
5           CA
6           CA
7    Vancouver
Name: City, dtype: object

print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA', 
 'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}

Or just split it , and using fillna . 或者只是将其拆分,然后使用fillna

CA_cities = ['SanDiego', 'LosAngeles', 'SanFrancisco', 'Sacramento']
s=df.loc[df.City.isin(CA_cities),:]
t=df.loc[~df.City.isin(CA_cities),:]
pd.concat([s.fillna({'State':'CA','Country':'UnitedStates'}),t])
Out[1023]: 
           City State       Country
2      SanDiego    CA  UnitedStates
3    LosAngeles    CA  UnitedStates
4  SanFrancisco    CA  UnitedStates
5    Sacramento    CA  UnitedStates
0       Chicago    IL  UnitedStates
1        Boston    MA  UnitedStates
6     Vancouver    BC        Canada

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM