I am trying to ignore nan
's in my dataset, but unsure what to put and where within my function that finds multimodal data. right now, if a row has majority nan
values, it will say that the mode is nan
which is not true. How can i ignore it?
import pandas as pd
#example data
df = {'A':'nan','B':'nan','C':'Blue', 'D':'nan','E':'Blue', 'Index':[0]}
df = pd.DataFrame(df).set_index('Index')
def find_mode(x):
if len(x) > 1: #
#Creates dictionary of values in x and their count
d = {}
for value in x:
if value not in d:
d[value] = 1
else:
d[value] += 1
if len(d) == 1:
return [value]
else:
# Finds most common value
i = 0
for value in d:
if i < d[value]:
i = d[value]
# All values with greatest number of occurrences can be a mode if:
# other values with less number of occurrences exist
modes = []
counter = 0
for value in d:
if d[value] == i:
mode = (value, i)
modes.append(mode)
counter += mode[1] # Create the counter that sums the number of most common occurrences
# Example [1, 2, 2, 3, 3]
# 2 appears twice, 3 appears twice, [2, 3] are a mode
# because sum of counter for them: 2+2 != 5
if counter != len(x):
return [mode[0] for mode in modes]
else:
return 'NA'
else:
return x
mode = []
for x in df.itertuples(index = True):
m = find_mode(x)
mode.append(m)
It looks like complicated code for something that pandas can handle natively:
# ensure having real NAs
df = df.replace('nan', pd.NA)
# get mode per row
out = df.mode(axis=1)[0]
output:
Index
0 Blue
dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.