简体   繁体   中英

DASK: Replace infinite (inf) values in single column

I have a dask dataframe in which I have a few inf values appearing. I wish to areplace these on a per column basis, because where inf exists I can replace with a value that is appropriate to the upper bounds that can be expected from that column.

I'm having some trouble understandingthe documentation , or rather translating it into something I can use to replace infinite values.

What I have been trying is roughly around the below, replacing inf with 1000 - however the inf value seems to remain in place, unchanged.

Any advice on how to do this would be excellent. Because this is a huge dataframe (10m rows, 40 cols) I'd prefer to do it in a fashion that doesn't use lamba or loops- which the below should basically achieve, but doesn't.

ddf['mycolumn'].replace(np.inf,1000)

Following @Enzo's comment, make sure you are assigning the replaced values back to the original column:

import numpy as np
import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame([1, 2, np.inf], columns=['a'])
ddf = dd.from_pandas(df, npartitions=2)
ddf['a'] = ddf['a'].replace(np.inf, 1000)

# check results with: ddf.compute()
#         a
# 0     1.0
# 1     2.0
# 2  1000.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM