Plot word count on x axis and its occurrence on y axis from pandas df

Question

The goal is to plot something like this:

I have the following dummy df. Note that data = number of words = x axis

data = [13,2,2,13,14,5,6,2,2,2,1,1,1,1,1,1,1,1,9,200,12,3,1,1,1,1,1,2,5,4,5,5,6,7,3,2,3,4,6,5,4,7,4,7,4,7,1,1,32,7,9,4,6,2,2,3,2,1,1]
my_df = pd.DataFrame(data=data, columns=['number_of_words'])

Now I need to calculate the y-axis, namely the occurrences of the number of words. Eg How often is number of words = 1 and how often = 9 and so on... I did it this way:

data = my_df['number_of_words'].value_counts()

Then I created a new df with that:

df_occurrences = pd.DataFrame(data=data)
df_occurrences.rename(columns={"number_of_words": "occurrences"}, inplace=True)

Now I wanted to merge them but their length is different because my_df includes duplicates.

Thus, I removed the duplicates.

my_df.drop_duplicates(subset ="number_of_words", keep=False, inplace=True)

my_df and df_occurrences now have a different length and I cannot merge and plot them anymore...

Any idea what went wrong?

Answer 1

I used set and count method. The loop iterate over set(data) and count method count the number of occurrences of an item in the list. I use the sorted function. b is the zero item and c is the first item in the nested list. b is x-axis and c is y-axis in plot.

d = sorted([[x,data.count(x)] for x in set(data)])
b = []
c = []
for i,j in d:
   b.append(i)
   c.append(j)
plt.plot(b,c)

Answer 2

As user BigBen wrote in the comment to the original question post, my_df.value_counts().sort_index().plot() is all I needed to do. The other approaches mentioned by Quang Hoang and keithpjolley in the same comment section also work.

Plot word count on x axis and its occurrence on y axis from pandas df

Question

2 answers

solution1
1 2021-03-10 16:05:47

solution2
0 ACCPTED 2021-03-10 15:42:24

Plot word count on x axis and its occurrence on y axis from pandas df

Question

2 answers

solution1 1 2021-03-10 16:05:47

solution2 0 ACCPTED 2021-03-10 15:42:24

solution1
1 2021-03-10 16:05:47

solution2
0 ACCPTED 2021-03-10 15:42:24