简体   繁体   中英

How to replace a string value with the means of a column's groups in the entire dataframe

I have a large dataset with 400columns and 30,000 rows. The dataset is all numerical but some columns have weird string values in them (denoted as "#?") instead of being blank. This changes the dtypes of the columns that have "#?" into object type. (150 columns object dtype)

I need to convert all the columns into float or int dtypes, and then fill the normal NaN values in the data, with means of a column's groups. (eg: means of X, means of Y in each column)

col1 col2 col3
X    21    32 
X    NaN   3
Y    Nan   5 

My end goal is to apply this to the entire data:

df.groupby("col1").transform(lambda x: x.fillna(x.mean()))

But I can't apply this for the columns that have "#?"in them, they get dropped. I tried replacing the #? with a numerical value, and then convert all the columns into float dtype, which works, but the replaced values also should be included in the above code.

I thought about replacing #? with an weird value like -123.456 so that it doesn't get mixed with actual data points, and maybe replace all the -123.456 with the means of column groups but the -123.456 would need to be excluded from the mean. But I just don't know how that would even work. If I convert it back to NaN again, the dtype changes back to object.

I think the best way to go about it would be directly replacing the #? with the column group means.

Any ideas?

edit: I'm so dumb lol

df=df.replace('#?', '').astype(float, errors = 'ignore')

this works.

Use:

print (df)
  col1 col2  col3
0    X   21    32
1    X   #?     3
2    Y  NaN     5

df = (df.set_index('col1')
        .replace(r'#\?', np.nan, regex=True)
        .astype(float)
        .groupby("col1")
        .transform(lambda x: x.fillna(x.mean())))
print (df)
      col2  col3
col1            
X     21.0  32.0
X     21.0   3.0
Y      NaN   5.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM