The goal is to plot something like this:
I have the following dummy df. Note that data = number of words = x axis
data = [13,2,2,13,14,5,6,2,2,2,1,1,1,1,1,1,1,1,9,200,12,3,1,1,1,1,1,2,5,4,5,5,6,7,3,2,3,4,6,5,4,7,4,7,4,7,1,1,32,7,9,4,6,2,2,3,2,1,1]
my_df = pd.DataFrame(data=data, columns=['number_of_words'])
Now I need to calculate the y-axis, namely the occurrences of the number of words. Eg How often is number of words = 1 and how often = 9 and so on... I did it this way:
data = my_df['number_of_words'].value_counts()
Then I created a new df with that:
df_occurrences = pd.DataFrame(data=data)
df_occurrences.rename(columns={"number_of_words": "occurrences"}, inplace=True)
Now I wanted to merge them but their length is different because my_df
includes duplicates.
Thus, I removed the duplicates.
my_df.drop_duplicates(subset ="number_of_words", keep=False, inplace=True)
my_df
and df_occurrences
now have a different length and I cannot merge and plot them anymore...
Any idea what went wrong?
I used set and count method. The loop iterate over set(data) and count method count the number of occurrences of an item in the list. I use the sorted function. b is the zero item and c is the first item in the nested list. b is x-axis and c is y-axis in plot.
d = sorted([[x,data.count(x)] for x in set(data)])
b = []
c = []
for i,j in d:
b.append(i)
c.append(j)
plt.plot(b,c)
As user BigBen wrote in the comment to the original question post, my_df.value_counts().sort_index().plot()
is all I needed to do. The other approaches mentioned by Quang Hoang and keithpjolley in the same comment section also work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.