I have a dataset for itemset mining. I want to find occurences of each unique number ie Candidate 1 itemsets.
The shape of the data is 3000x1. I'm unable to figure out how to count the unique occurences.
List of distict values of the data are stored.
Using the ndarray distinct, how can I find the frequency of each item in the dataset?
Update Got the solution with @jojo help.
df = pd.read_csv('sample.csv', sep=',')
all_values = dataset.values.ravel()
notNan = np.logical_not(np.isnan(all_values))
distinct, counts = np.unique(all_values[notNan], return_counts=True)
First note that if you have a normal csv (comma separated) you should use sep=','
. This is because '\t'
is assuming TAB as delimiter.
Also, consider adding header=None
in your read_csv
call, as otherwise the first line will be taken as column names in your data-frame.
Lastly, since the column appear to have different lengths, you will have nan
values in all columns that are shorter than the longest one, to remove them you can mask all nan
values when getting unique values. Something like values[np.logical_not(np.isnan(values))]
, but see below.
Putting things together:
dataset = pd.read_csv('dataset.csv', sep=',', header=None)
all_values = dataset.values.ravel()
You can directly use unique
from numpy which allows to get the counts of each unique value:
import numpy as np
notNan = np.logical_not(np.isnan(all_values))
distinct, counts = np.unique(all_values[notNan], return_counts=True)
If you care for the frequency, simply divide counts
by all_values[notNan].size
.
Here is a simple example (from the docs linked above) to highlight how np.unique
works:
>>> import numpy as np
>>> a = np.array([1, 2, 6, 4, 2, 3, 2])
>>> values, counts = np.unique(a, return_counts=True)
>>> values # list of all unique values in a
array([1, 2, 3, 4, 6])
>>> counts # count of the occurrences of each value in values
array([1, 3, 1, 1, 1])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.