简体   繁体   中英

Plot word count on x axis and its occurrence on y axis from pandas df

The goal is to plot something like this:

阴谋

I have the following dummy df. Note that data = number of words = x axis

data = [13,2,2,13,14,5,6,2,2,2,1,1,1,1,1,1,1,1,9,200,12,3,1,1,1,1,1,2,5,4,5,5,6,7,3,2,3,4,6,5,4,7,4,7,4,7,1,1,32,7,9,4,6,2,2,3,2,1,1]
my_df = pd.DataFrame(data=data, columns=['number_of_words'])

Now I need to calculate the y-axis, namely the occurrences of the number of words. Eg How often is number of words = 1 and how often = 9 and so on... I did it this way:

data = my_df['number_of_words'].value_counts()

Then I created a new df with that:

df_occurrences = pd.DataFrame(data=data)
df_occurrences.rename(columns={"number_of_words": "occurrences"}, inplace=True)

Now I wanted to merge them but their length is different because my_df includes duplicates.

Thus, I removed the duplicates.

my_df.drop_duplicates(subset ="number_of_words", keep=False, inplace=True)

my_df and df_occurrences now have a different length and I cannot merge and plot them anymore...

Any idea what went wrong?

I used set and count method. The loop iterate over set(data) and count method count the number of occurrences of an item in the list. I use the sorted function. b is the zero item and c is the first item in the nested list. b is x-axis and c is y-axis in plot.

d = sorted([[x,data.count(x)] for x in set(data)])
b = []
c = []
for i,j in d:
   b.append(i)
   c.append(j)
plt.plot(b,c)

As user BigBen wrote in the comment to the original question post, my_df.value_counts().sort_index().plot() is all I needed to do. The other approaches mentioned by Quang Hoang and keithpjolley in the same comment section also work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM