简体   繁体   English

用熊猫数据框中另一列的相同值填充空值

[英]fill up empty values with same value of another column in pandas dataframe

i have a pandas dataframe like the following: 我有一个如下所示的熊猫数据框:

在此处输入图片说明

How do I fill up the empty cells with same policy numbers for same product type as they exist? 如何为空单元格填充相同产品类型的相同策略编号?

Any suggestion would be very much appreciated. 任何建议将不胜感激。 Thank you 谢谢

Sorry for the confusion, I am adding my sample dataframe now: 抱歉,我现在要添加示例数据框:

sample=[{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M49763','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M86968','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'566D158635 ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'655D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'789D158635','PRODUCT TYPE':'TED'}] sample = [{'POLICY NUMBER':'','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M49763' ,'PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M86968','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'','PRODUCT TYPE':'TED'}, {'POLICY NUMBER':'566D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'655D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'789D158635', 'PRODUCT TYPE':'TED'}]

pd.DataFrame(sample) pd.DataFrame(样本)

please note that the empty cells have " " in them too, they are not NaN across the whole dataframe 请注意,空单元格中也包含“”,它们在整个数据框中都不是NaN

Adding to the question above. 添加到上面的问题。 If I have the altered dataframe as above. 如果我有如上所述的更改的数据帧。 How do i get to the following dataframe: 我如何到达以下数据框:

在此处输入图片说明

I think you need groupby + transform : 我认为你需要groupby + transform

If only one same category per group and no data are empty string s : 如果每个组只有一个相同类别并且没有数据,则为空string s:

df['POLICY NUMBER'] = (df.groupby('PRODUCT TYPE')['POLICY NUMBER']
                         .transform(lambda x: x[x != ''].iat[0]))

print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M86968          MED
1     433M86968          MED
2     433M86968          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    566D158635          TED

Or if posible there are not always empty stings, but sometimes there are wtrailing whitespaces , need strip : 或者,如果可能的话,并非总是空wtrailing whitespaces ,但有时会有wtrailing whitespaces ,需要使用strip

df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip().groupby(df['PRODUCT TYPE'])
                                  .transform(lambda x: x[x != ''].iat[0]))

print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M86968          MED
1     433M86968          MED
2     433M86968          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    566D158635          TED

Solution with sorting and transform last value: 排序和转换last值的解决方案:

df['POLICY NUMBER'] = (df.sort_values(['PRODUCT TYPE','POLICY NUMBER'])
                         .groupby('PRODUCT TYPE')['POLICY NUMBER']
                         .transform('last'))
print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M86968          MED
1     433M86968          MED
2     433M86968          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    566D158635          TED

EDIT: You need replace empty strings by NaN s and then use bfill for back forward filling NaN s with ffill for forward fillin NaNs: 编辑:您需要用NaN替换空字符串,然后使用bfill进行反向向前填充NaN ,而ffill进行向前填充NaNs:

df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip()
                                          .replace('',np.nan)
                                          .groupby(df['PRODUCT TYPE'])
                                          .transform(lambda x: x.bfill().ffill()))

print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M49763          MED
1     433M49763          MED
2     433M49763          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    789D158635          TED  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果其他两个列在Pandas中具有匹配的值,如何用另一个数据框的值填充空列的值? - How to fill empty column values with another dataframe's value if two other columns have matching values in Pandas? dataframe 根据具有相同索引的值填充空列值(使用 PANDAS) - dataframe fill empty column values based on values with same index (using PANDAS) 熊猫从另一个数据帧填充一个数据帧上的空值 - Pandas fill empty values on one dataframe from another dataframe 如果数据值列中的空值已经存在于另一行中,则使用该值填充该值 - Fill in empty value in a dataframe column with the same value if it already exists in another row 根据Pandas中第二列的条件,用另一行的同一列的值填充特定行的列中的值 - Fill values in a column of a particular row with the value of same column from another row based on a condition on second column in Pandas 用另一个 Dataframe 的值填充 Dataframe(不是相同的列名) - Fill Dataframe with values from another Dataframe (not the same column names) 使用pandas在csv文件的同一行上填充下一列值的行中的空值 - Fill empty values from a row with the value of next column on the same row on csv file with pandas 用pandas DataFrame中另一列的值填充一列 - Fill one column with value of another column in pandas DataFrame 将 DataFrame 中某些列和行的值替换为同一 dataframe 和 Pandas 中的另一列的值 - Replace values of certain column and rows in a DataFrame with the value of another column in the same dataframe with Pandas 如何用同一列中的值填充 null 列中的 Pyspark Dataframe 值,其在另一列中的对应值相同 - How to fill null values in a Pyspark Dataframe column with values from the same column, whose corresponding value in another column is same
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM