简体   繁体   中英

Python DataFrame count how many different elements

I need to count how many different elements are in my DataFrame (df).

My df has the day of the month (as a number: 1,2,3 ... 31) in which a certain variable was measured. There are 3 columns that describe the number of the day. There are multiple measurements in one day so my columns have repeated values. I need to know how many days in a month was that variable measured ignoring how many times a day was that measurement done. So I was thinking that counting the days ignoring repeated values.

As an example the data of my df would look like this:

col1 col2 col3   
 2    2   2
 2    2   3
 3    3   3
 3    4   8

I need an output that tells me that in that DataFrame the numbers are 2, 3, 4 and 8.

Thanks!

Just do:

df=pd.DataFrame({"col1": [2,2,3,3], "col2": [2,2,3,4], "col3": [2,3,3,8]})

df.stack().unique()

Outputs:

[2 3 4 8]

You can use the function drop_duplicates into your dataframe, like:

import pandas as pd
df = pd.DataFrame({'a':[2,2,3], 'b':[2,2,3], 'c':[2,2,3]})

   a  b  c
0  2  2  2
1  2  2  2
2  3  3  3

df = df.drop_duplicates()
print(df['a'].count())
out: 2

Or you can use numpy to get the unique values in the dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'X' : [2, 2, 3, 3], 'Y' : [2,2,3,4], 'Z' : [2,3,3,8]})

df_unique = np.unique(np.array(df))

print(df_unique) 
#Output [2 3 4 8]
#for the count of days:
print(len(df_unique))
#Output 4

How about:

Assuming this is your initial df:

   col1  col2  col3
0     2     2     2
1     2     2     2
2     3     3     3

Then:

count_df = pd.DataFrame()

for i in df.columns:
    df2 = df[i].value_counts()
    count_df = pd.concat([count_df, df2], axis=1)

final_df = count_df.sum(axis=1)
final_df = pd.DataFrame(data=final_df, columns=['Occurrences'])
print(final_df)

   Occurrences
2            6
3            3

You can use pandas.unique() like so:

pd.unique(df.to_numpy().flatten())

I have done some basic benchmarking, this method appears to be the fastest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM