简体   繁体   English

根据pandas DataFrame中的另一列填充缺失值

[英]Fill missing values based on another column in a pandas DataFrame

I'm working with Pandas and numpy, For the following data frame, lets call it 'data', for the Borough values with data['Borough'] == 'Unspecified', I need to use the zip code in the Incident Zip field to the left of it to do a lookup on the Incident Zip column for the matching zip code and Borough. 我正在使用Pandas和numpy,对于以下数据框,我们称之为“数据”,对于具有数据['Borough'] =='未指定'的Borough值,我需要使用Incident Zip中的邮政编码在其左侧的字段,在Incident Zip列上查找匹配的邮政编码和Borough。 Once this is found, 'Unspecified' should be replaced by the Borough name. 找到后,应将“未指定”替换为自治市镇名称。 Here is the testing link: https://colab.research.google.com/drive/1PgPbS7KxOrNfok3jtKoC55vXAXzK2E_N#scrollTo=poYboz-jhRCN Click Runtime -> Runall 以下是测试链接: https ://colab.research.google.com/drive/1PgPbS7KxOrNfok3jtKoC55vXAXzK2E_N#scrollTo=poYboz-jhRCN点击运行时 - > Runall

Created Date               Complaint Type   Incident Zip    Borough
0   09/14/2017 04:40:33 PM  New Tree Request    11374       QUEENS
1   03/18/2017 10:09:57 AM  General Construc    11420       QUEENS
2   03/29/2017 12:38:28 PM  General Construc    11230       Unspecified
3   06/05/2017 12:47:55 PM  New Tree Request    10028       Unspecified
4   06/15/2017 11:56:36 AM  Dead/Dying Tree     10467       BRONX
5   06/19/2017 10:30:46 AM  New Tree Request    11230       MANHATTAN
6   06/29/2017 09:25:59 AM  New Tree Request    10028       MANHATTAN
7   07/01/2017 12:23:20 PM  Damaged Tree        10467       BRONX
8   07/01/2017 11:47:03 AM  Damaged Tree        10467       BRONX
9   07/10/2017 10:27:37 AM  General Construc    11385       QUEENS
10  07/13/2017 09:20:53 PM  General Construc    11385       QUEENS

IIUC, you want to use other values in the DataFrame to fill missing values. 在IIUC中,您希望使用DataFrame中的其他值来填充缺失值。 You can do this with map . 你可以用map做到这一点。

First, generate a Series mapping Zip codes to the Borough. 首先,生成一系列映射Zip代码到自治市镇。

mapping = (df.query('Borough != "Unspecified"')
             .drop_duplicates('Incident Zip')
             .set_index('Incident Zip')
             .Borough)
mapping

Incident Zip
11374       QUEENS
11420       QUEENS
10467        BRONX
11230    MANHATTAN
10028    MANHATTAN
11385       QUEENS
Name: Borough, dtype: object

Now, pass this to map and fill unmapped values as "Unspecified" using fillna . 现在,将其传递给map并使用fillna未映射的值填充为“未指定”。

df['Borough'] = df['Incident Zip'].map(mapping).fillna('Unspecified')

df
             Created Date    Complaint Type  Incident Zip    Borough
0  09/14/2017 04:40:33 PM  New Tree Request         11374     QUEENS
1  03/18/2017 10:09:57 AM  General Construc         11420     QUEENS
2  03/29/2017 12:38:28 PM  General Construc         11230  MANHATTAN
3  06/05/2017 12:47:55 PM  New Tree Request         10028  MANHATTAN
4  06/15/2017 11:56:36 AM   Dead/Dying Tree         10467      BRONX
5  06/19/2017 10:30:46 AM  New Tree Request         11230  MANHATTAN
6  06/29/2017 09:25:59 AM  New Tree Request         10028  MANHATTAN
7  07/01/2017 12:23:20 PM      Damaged Tree         10467      BRONX
8  07/01/2017 11:47:03 AM      Damaged Tree         10467      BRONX
9  07/10/2017 10:27:37 AM  General Construc         11385     QUEENS

Alternatively: 或者:

df.Borough.replace('Unspecified',np.nan,inplace=True)
df.Borough = df.sort_values(by='Incident Zip').groupby('Incident Zip')['Borough'].apply(lambda x : x.ffill().bfill())
>>df



             Created Date   Complaint Type   Incident Zip    Borough
0   09/14/2017 04:40:33 PM  New Tree Request         11374     QUEENS
1   03/18/2017 10:09:57 AM  General Construc         11420     QUEENS
2   03/29/2017 12:38:28 PM  General Construc         11230  MANHATTAN
3      2017-05-06 12:47:55  New Tree Request         10028  MANHATTAN
4   06/15/2017 11:56:36 AM   Dead/Dying Tree         10467      BRONX
5   06/19/2017 10:30:46 AM  New Tree Request         11230  MANHATTAN
6   06/29/2017 09:25:59 AM  New Tree Request         10028  MANHATTAN
7      2017-01-07 12:23:20      Damaged Tree         10467      BRONX
8      2017-01-07 11:47:03      Damaged Tree         10467      BRONX
9      2017-10-07 10:27:37  General Construc         11385     QUEENS
10  07/13/2017 09:20:53 PM  General Construc         11385     QUEENS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 Pandas 数据框中的另一列值填充列中的缺失值? - How to fill missing values in a column based on another column values in a Pandas dataframe? 如何基于另一列填充 Pandas 中的数字缺失值 - How to Fill Numeric missing Values In Pandas Based On Another Column 在Pandas Dataframe列中填写缺少的日期值 - Fill the missing date values in a Pandas Dataframe column 根据另一列填充 pandas 中的缺失数据 - Fill missing data in pandas based on another column 如何根据pandas中的列填充缺失值? - how to fill missing values based on column in pandas? 根据另一个 Pandas DataFrame 中的值替换 Pandas DataFrame 中的缺失值 - Replacing missing values in a Pandas DataFrame based on values in another Pandas DataFrame 如何使用 Pandas 中的另一个 DataFrame 填充 DataFrame 中的缺失值 - How to fill missing values in DataFrame using another DataFrame in Pandas Pandas 从另一个 dataframe 填充 dataframe 中的缺失值 - Pandas fill missing values in dataframe from another dataframe pandas函数根据匹配列填充其他数据帧中的缺失值? - pandas function to fill missing values from other dataframe based on matching column? 如何使用熊猫中的for循环根据另一列的条件填充一列中的缺失值? - How to fill in missing values in one column based on a condition form another column using for loops in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM