[英]Fill missing values based on another column in a pandas DataFrame
I'm working with Pandas and numpy, For the following data frame, lets call it 'data', for the Borough values with data['Borough'] == 'Unspecified', I need to use the zip code in the Incident Zip field to the left of it to do a lookup on the Incident Zip column for the matching zip code and Borough. 我正在使用Pandas和numpy,对于以下数据框,我们称之为“数据”,对于具有数据['Borough'] =='未指定'的Borough值,我需要使用Incident Zip中的邮政编码在其左侧的字段,在Incident Zip列上查找匹配的邮政编码和Borough。 Once this is found, 'Unspecified' should be replaced by the Borough name.
找到后,应将“未指定”替换为自治市镇名称。 Here is the testing link: https://colab.research.google.com/drive/1PgPbS7KxOrNfok3jtKoC55vXAXzK2E_N#scrollTo=poYboz-jhRCN Click Runtime -> Runall
以下是测试链接: https ://colab.research.google.com/drive/1PgPbS7KxOrNfok3jtKoC55vXAXzK2E_N#scrollTo=poYboz-jhRCN点击运行时 - > Runall
Created Date Complaint Type Incident Zip Borough
0 09/14/2017 04:40:33 PM New Tree Request 11374 QUEENS
1 03/18/2017 10:09:57 AM General Construc 11420 QUEENS
2 03/29/2017 12:38:28 PM General Construc 11230 Unspecified
3 06/05/2017 12:47:55 PM New Tree Request 10028 Unspecified
4 06/15/2017 11:56:36 AM Dead/Dying Tree 10467 BRONX
5 06/19/2017 10:30:46 AM New Tree Request 11230 MANHATTAN
6 06/29/2017 09:25:59 AM New Tree Request 10028 MANHATTAN
7 07/01/2017 12:23:20 PM Damaged Tree 10467 BRONX
8 07/01/2017 11:47:03 AM Damaged Tree 10467 BRONX
9 07/10/2017 10:27:37 AM General Construc 11385 QUEENS
10 07/13/2017 09:20:53 PM General Construc 11385 QUEENS
IIUC, you want to use other values in the DataFrame to fill missing values. 在IIUC中,您希望使用DataFrame中的其他值来填充缺失值。 You can do this with
map
. 你可以用
map
做到这一点。
First, generate a Series mapping Zip codes to the Borough. 首先,生成一系列映射Zip代码到自治市镇。
mapping = (df.query('Borough != "Unspecified"')
.drop_duplicates('Incident Zip')
.set_index('Incident Zip')
.Borough)
mapping
Incident Zip
11374 QUEENS
11420 QUEENS
10467 BRONX
11230 MANHATTAN
10028 MANHATTAN
11385 QUEENS
Name: Borough, dtype: object
Now, pass this to map
and fill unmapped values as "Unspecified" using fillna
. 现在,将其传递给
map
并使用fillna
未映射的值填充为“未指定”。
df['Borough'] = df['Incident Zip'].map(mapping).fillna('Unspecified')
df
Created Date Complaint Type Incident Zip Borough
0 09/14/2017 04:40:33 PM New Tree Request 11374 QUEENS
1 03/18/2017 10:09:57 AM General Construc 11420 QUEENS
2 03/29/2017 12:38:28 PM General Construc 11230 MANHATTAN
3 06/05/2017 12:47:55 PM New Tree Request 10028 MANHATTAN
4 06/15/2017 11:56:36 AM Dead/Dying Tree 10467 BRONX
5 06/19/2017 10:30:46 AM New Tree Request 11230 MANHATTAN
6 06/29/2017 09:25:59 AM New Tree Request 10028 MANHATTAN
7 07/01/2017 12:23:20 PM Damaged Tree 10467 BRONX
8 07/01/2017 11:47:03 AM Damaged Tree 10467 BRONX
9 07/10/2017 10:27:37 AM General Construc 11385 QUEENS
Alternatively: 或者:
df.Borough.replace('Unspecified',np.nan,inplace=True)
df.Borough = df.sort_values(by='Incident Zip').groupby('Incident Zip')['Borough'].apply(lambda x : x.ffill().bfill())
>>df
Created Date Complaint Type Incident Zip Borough
0 09/14/2017 04:40:33 PM New Tree Request 11374 QUEENS
1 03/18/2017 10:09:57 AM General Construc 11420 QUEENS
2 03/29/2017 12:38:28 PM General Construc 11230 MANHATTAN
3 2017-05-06 12:47:55 New Tree Request 10028 MANHATTAN
4 06/15/2017 11:56:36 AM Dead/Dying Tree 10467 BRONX
5 06/19/2017 10:30:46 AM New Tree Request 11230 MANHATTAN
6 06/29/2017 09:25:59 AM New Tree Request 10028 MANHATTAN
7 2017-01-07 12:23:20 Damaged Tree 10467 BRONX
8 2017-01-07 11:47:03 Damaged Tree 10467 BRONX
9 2017-10-07 10:27:37 General Construc 11385 QUEENS
10 07/13/2017 09:20:53 PM General Construc 11385 QUEENS
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.