简体   繁体   中英

calculate punctuation percentage in a string in Python

I have been working on calculating the percentage of punctuations in a sentence. For some reason, my function works when doing double spacing, but counts all the characters and the white space. For example, I have a text DEACTIVATE: OK so total full length is 14 when I subtract the punctuation then length is 13, so percentage should be 1/13 = 7.63% , however, my function gives me 7.14%, which is basically 1/14 = 7.14% .

On the other side, if have just one white space, my function throws me an error

"ZeroDivisionError: division by zero".

Here is my code for your reference and a simple text samples

text= "Centre to position, remaining shift is still larger than maximum (retry nbr=1, centring_stroke.r=2.7662e-05, max centring stroke.r=2.5e-05)"
text2= "DEACTIVATE: KU-1421"

import string

def count_punct(text):
    count = sum([1 for char in text if char in string.punctuation])
    return round(count/(len(text) - text.count("  ")), 3)*100
df_sub['punct%'] = df_sub['Err_Text2'].apply(lambda x: count_punct(x))
df_sub.head(20)

Here, Make these small changes and your count_punct function should be up and running.. The reason your code was breaking is because you were checking for ___ instead of _ . ie 3 consecutive spaces instead of one space. That is why the difference always resulted in the same value.

import string
def count_punct(text):
    if text.strip() == "": # To take of care of all space input
        return 0
    count = sum([1 if char in string.punctuation else 0 for char in text ])
    spaces = text.count(" ") # Your error is here, Only check for 1 space instead of 3 spaces
    total_chars = len(text) - spaces

    return round(count / total_chars, 3)*100

text= "DEACTIVATE: OK"

print(count_punct(text))

Outputs:

7.7

And for the zero divide by error. It's a logic error when the total_chars is 0, because the length of string and number of spaces both are equal. Hence the difference is 0.

To fix this you can simply add an if statement (already added above)

if text.strip() == "":
    print(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM