Matplotlib: How can I reorder the graphs' x-axis?

Question

I have the following code:

def composition(tokens, title):
    '''creates a curve of composition for a corpus of texts'''
    token_lengths = [len(token) for token in tokens]
    fig = plt.figure()
    plt.gcf().subplots_adjust(bottom=0.15)
    len_distr = nltk.FreqDist(token_lengths)
    len_distr.plot(25, title=f'{title}')
    plt.show()
    fig.savefig(f'{title}.png')

it takes the output of a function tokenize as tokens and you have to provide a title for the graph. For example, I tokenize a text with the title and provide the title (longjumeau) in the picture.

The x-axis is the token-length, ie word length (or shortness) sorted after their occurence. So that I can compare different graphs for texts with each other. It might be a bar-diagram, I on't care too much about the kind of the graph at this moment.

Edit, bcs I wasn't too clear about what question I have: How can I order the x-axis values in ascending order (2,3,4,5,6) as opposed to now seemingly being sorted by the highest value on the y-axis.

if further code is needed, this is my git-repo, not perfect code, sorry: https://github.com/WunschK/Stylometry

additional info (not edited, but maybe necessary): my tokenize function:

def tokenize(text, language):
    '''Tokenises a given text (text) defined above and returns a list of tokens (tokens)'''
    tokens = nltk.word_tokenize(text=text.lower(), language=f"{language}")
    # strip punctuation of the list of word tokens:
    tokens = ([token for token in tokens if any(c.isalpha() for c in token)])
    return tokens

Answer 1

You have a graph showing that shorter words are used more frequently than longer words. Assuming that you are counting every occurrence of every word, and perhaps filtering out 1 and 2 letter words (eg on the basis of their being stop words), I find the graph's shape to be within expectation.

For example, I took the text of your question and histogrammed by token length, with some filtering of punctuation and whatnot (which most tokenizers do).

text = """t takes the output of a function tokenize as tokens and you have to provide a title for the graph. For example, I tokenize a text with the title and provide the title longjumeau in the picture.
The x-axis is the token-length word length or shortness sorted after their occurence. So that I can compare different graphs for texts with each other. It might be a bar-diagram, I ont care too much about the kind of the graph at this moment.

Edit bcs I wasn't too clear about what question I have How can I order the x-axis values in ascending order as opposed to now seemingly being sorted by the highest value on the y-axis.

if further code is needed, this is my git-repo, not perfect code sorry 

additional info not edited but maybe necessary my tokenize function"""

lenList = [len(t) for t in text.split()]

import matplotlib.pyplot as plt
plt.figure(figsize=(7,5))
plt.hist(lenList, bins=10)
plt.grid(alpha = 0.3)
plt.title("Word Length Instance Histogram - KWunsch SO question text")
plt.show()

The histogram shape looks kinda familiar, no?

Matplotlib: How can I reorder the graphs' x-axis?

Question

1 answers

solution1
0 2022-09-14 17:05:57

Matplotlib: How can I reorder the graphs' x-axis?

Question

1 answers

solution1 0 2022-09-14 17:05:57

solution1
0 2022-09-14 17:05:57