简体   繁体   中英

Need help getting the frequency of each number in a pandas dataframe

I am trying to find a simple way of converting a pandas dataframe into another dataframe with frequency of each feature. I'll provide an example of what I'm trying to do below

Current dataframe example (feature labels are just index values here):

   0   1   2   3   4   ...   n
0  2   3   1   4   2         ~
1  4   3   4   3   2         ~
2  2   3   2   3   2         ~
3  1   3   0   3   2         ~
...
m  ~   ~   ~   ~   ~         ~

Dataframe I would like to convert this to:

   0   1   2   3   4   ...   n
0  0   1   2   1   1         ~
1  0   0   1   2   2         ~
2  0   0   3   2   0         ~
3  1   1   1   2   0         ~
...
m  ~   ~   ~   ~   ~         ~

As you can see, the column label corresponds to the possible numbers within the dataframe and each frequency of that number per row is put into that specific feature for the row in question. Is there a simple way to do this with python? I have a large dataframe that I am trying to transform into a dataframe of frequencies for feature selection.

If any more information is needed I will update my post.

Use pd.value_counts with apply :

df.apply(pd.value_counts, axis=1).fillna(0)

     0    1    2    3    4
0  0.0  1.0  2.0  1.0  1.0
1  0.0  0.0  1.0  2.0  2.0
2  0.0  0.0  3.0  2.0  0.0
3  1.0  1.0  1.0  2.0  0.0

Alternative DataFrame.melt with pd.crosstab

df2 = df.T.melt()
pd.crosstab(df2['variable'], df2['value'])

Numpy

The value of this is speed. But OBVIOUSLY more complicated.

n, k = df.shape
i = df.index.to_numpy().repeat(k)
j = np.ravel(df)
m = j.max() + 1

a = np.zeros((n, m), int)

np.add.at(a, (i, j), 1)

pd.DataFrame(a, df.index, range(m))

   0  1  2  3  4
0  0  1  2  1  1
1  0  0  1  2  2
2  0  0  3  2  0
3  1  1  1  2  0

This produces an index i that will correspond to the values in df that I assign to j . I'll use these indices to add one at positions of an array a designated by the indices in i and j

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM