I am trying to identify which word is the most counted in a pandas dataframe (df_temp in my code). Also I have this:
l = df_temp['word'].count_values()
l is then obviously a pandas series where the first row points toward the most counted index (in my case the most counted word) in df_temp['word']. Although I can see the word in my console, I cannot get it properly. The only way I found so far is to transform it into a dictionary so I have:
dl = dict(l)
and then I can easily retrieve my index...after sorting the dictionary. Obviously this does the job, but I am pretty sure you have a smarter solution as this one is very dirty and inelegant.
The index
of the result of value_counts()
are your values:
l.index
will give you the values that were counted
Example:
In [163]:
df = pd.DataFrame({'a':['hello','world','python','hello','python','python']})
df
Out[163]:
a
0 hello
1 world
2 python
3 hello
4 python
5 python
In [165]:
df['a'].value_counts()
Out[165]:
python 3
hello 2
world 1
Name: a, dtype: int64
In [164]:
df['a'].value_counts().index
Out[164]:
Index(['python', 'hello', 'world'], dtype='object')
So basically you can get a specific word count by indexing the series:
In [167]:
l = df['a'].value_counts()
l['hello']
Out[167]:
2
Using Pandas you can find the most frequent value in the word
column:
df['word'].value_counts().idxmax()
and this code below will give you the count for that value, which is the max count in that column:
df['word'].value_counts().max()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.