简体   繁体   中英

How to ignore nan when taking mode of a dataset?

I am trying to ignore nan 's in my dataset, but unsure what to put and where within my function that finds multimodal data. right now, if a row has majority nan values, it will say that the mode is nan which is not true. How can i ignore it?

import pandas as pd
#example data
df = {'A':'nan','B':'nan','C':'Blue', 'D':'nan','E':'Blue', 'Index':[0]}
df = pd.DataFrame(df).set_index('Index')

def find_mode(x):

    if len(x) > 1: #
        #Creates dictionary of values in x and their count
        d = {}
        for value in x:
            if value not in d:
                d[value] = 1
            else:
                d[value] += 1

        if len(d) == 1:
            return [value]
        else:
            # Finds most common value
            i = 0
            for value in d:
                if i < d[value]:
                i = d[value]

            # All values with greatest number of occurrences can be a mode if:
            # other values with less number of occurrences exist
            modes = []
            counter = 0
            for value in d:
                if d[value] == i:
                    mode = (value, i)
                    modes.append(mode)
                    counter += mode[1] # Create the counter that sums the number of most common occurrences

            # Example [1, 2, 2, 3, 3]
            # 2 appears twice, 3 appears twice, [2, 3] are a mode
            # because sum of counter for them: 2+2 != 5
            if counter != len(x):
                return [mode[0] for mode in modes]
             else:
            return 'NA'
    else:
        return x

mode = []
for x in df.itertuples(index = True):
    m = find_mode(x)
    mode.append(m)

It looks like complicated code for something that pandas can handle natively:

# ensure having real NAs
df = df.replace('nan', pd.NA)

# get mode per row
out = df.mode(axis=1)[0]

output:

Index
0    Blue
dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM