I have been working on calculating the percentage of punctuations in a sentence. For some reason, my function works when doing double spacing, but counts all the characters and the white space. For example, I have a text DEACTIVATE: OK
so total full length is 14 when I subtract the punctuation then length is 13, so percentage should be 1/13 = 7.63%
, however, my function gives me 7.14%, which is basically 1/14 = 7.14%
.
On the other side, if have just one white space, my function throws me an error
"ZeroDivisionError: division by zero".
Here is my code for your reference and a simple text samples
text= "Centre to position, remaining shift is still larger than maximum (retry nbr=1, centring_stroke.r=2.7662e-05, max centring stroke.r=2.5e-05)"
text2= "DEACTIVATE: KU-1421"
import string
def count_punct(text):
count = sum([1 for char in text if char in string.punctuation])
return round(count/(len(text) - text.count(" ")), 3)*100
df_sub['punct%'] = df_sub['Err_Text2'].apply(lambda x: count_punct(x))
df_sub.head(20)
Here, Make these small changes and your count_punct
function should be up and running.. The reason your code was breaking is because you were checking for ___
instead of _
. ie 3 consecutive spaces instead of one space. That is why the difference always resulted in the same value.
import string
def count_punct(text):
if text.strip() == "": # To take of care of all space input
return 0
count = sum([1 if char in string.punctuation else 0 for char in text ])
spaces = text.count(" ") # Your error is here, Only check for 1 space instead of 3 spaces
total_chars = len(text) - spaces
return round(count / total_chars, 3)*100
text= "DEACTIVATE: OK"
print(count_punct(text))
Outputs:
7.7
And for the zero divide by error. It's a logic error when the total_chars is 0, because the length
of string and number of spaces
both are equal. Hence the difference is 0.
To fix this you can simply add an if statement (already added above)
if text.strip() == "":
print(0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.