So I am trying to check and see if a bullet point is part of an item in a list by iterating through it with a for loop. I know that, at least in Regex a bullet point is defined as \•
. But don't know how to use this. What I currently have but obviously doesn't work is something like this.
list = ['changing. • 5.0 oz.', 'hello', 'dfd','df', 'changing. • 5.0 oz.']
for items in list:
if "\u2022" in items:
print('yay')
Thanks in Advance!
Best if you use the re
(regex) library. Something like this:
# import regex library
import re
# compile the regex pattern, using raw string (that's what the r"" is)
bullet_point = re.compile(r"\u2022")
list = ['changing. • 5.0 oz.', 'hello', 'dfd','df', 'changing. • 5.0 oz.']
# search each item in the list
for item in list:
# search for bullet_point in item
result = re.search(bullet_point, item)
if result:
print('yay')
In Python 3 your code will work fine because UTF-8 is the default source code encoding . If you're going to be working with Unicode a lot, consider switching to Python 3.
In Python 2, the default is to treat literal strings as sequences of bytes , so you have to explicitly declare which strings are Unicode by prefixing them with u
.
First, set your source code encoding as UTF-8.
# -*- coding: utf-8 -*-
Then tell Python to encode those strings as Unicode. Otherwise they'll be treated as individual bytes which will lead to odd things like Python thinking the first string has a length of 21 instead of 19.
print len(u'changing. • 5.0 oz.') # 19 characters
print len('changing. • 5.0 oz.') # 21 bytes
This is because the Unicode code point U+02022 BULLET
is UTF-8 encoded as three bytes e2 80 a2
. The first treats it as a single character, the second as three bytes.
Finally, encode the character you're searching for as Unicode. That's either u'\•'
or u'•'
.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
list = [u'changing. • 5.0 oz.', u'hello', u'dfd', u'df', u'changing. • 5.0 oz.']
for item in list:
if u'•' in item:
print('yay')
Real code probably won't be using constant strings, so you have to make sure that whatever is in list
is encoded as UTF-8.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.