简体   繁体   中英

Python: Count occurrences of each number in a python data-frame

I have a dataset for itemset mining. I want to find occurences of each unique number ie Candidate 1 itemsets.

The shape of the data is 3000x1. I'm unable to figure out how to count the unique occurences.

List of distict values of the data are stored.

Using the ndarray distinct, how can I find the frequency of each item in the dataset?

Update Got the solution with @jojo help.

df = pd.read_csv('sample.csv', sep=',')
all_values = dataset.values.ravel()
notNan = np.logical_not(np.isnan(all_values))
distinct, counts = np.unique(all_values[notNan], return_counts=True)

First note that if you have a normal csv (comma separated) you should use sep=',' . This is because '\t' is assuming TAB as delimiter.

Also, consider adding header=None in your read_csv call, as otherwise the first line will be taken as column names in your data-frame.

Lastly, since the column appear to have different lengths, you will have nan values in all columns that are shorter than the longest one, to remove them you can mask all nan values when getting unique values. Something like values[np.logical_not(np.isnan(values))] , but see below.


Putting things together:

dataset = pd.read_csv('dataset.csv', sep=',', header=None)

all_values = dataset.values.ravel()

You can directly use unique from numpy which allows to get the counts of each unique value:

import numpy as np
notNan = np.logical_not(np.isnan(all_values))
distinct, counts = np.unique(all_values[notNan], return_counts=True)

If you care for the frequency, simply divide counts by all_values[notNan].size .


Here is a simple example (from the docs linked above) to highlight how np.unique works:

>>> import numpy as np
>>> a = np.array([1, 2, 6, 4, 2, 3, 2])
>>> values, counts = np.unique(a, return_counts=True)
>>> values  # list of all unique values in a
array([1, 2, 3, 4, 6])
>>> counts  # count of the occurrences of each value in values
array([1, 3, 1, 1, 1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM