Beautiful Soup .find Chinese Characters

Question

a_string = soup.find(text='围')

soup.find_all('title', limit=1)
# [<title>The Dormouse's story</title>]

soup.find('title')
# <title>The Dormouse's story</title>

Is there anyway i can handle find with Chinese characters while using beautifulsoup?

Tried it for awhile , can't seem to detect the character. English character works fine

Source of the Website i'm working with

<!DOCTYPE html>
<html lang="zh-CN">
  <head>
        <meta charset="gbk" />

Answer 1

Try something like:

a_string = soup.find(text=re.compile(u'围', re.U))

In other words the searched string should be ensured to be unicode. It might work without re.compile() but at least make sure that your chinese string is enclosed within u''

Answer 2

When you use find(text='something') it will search for text nodes containing exactly the text 'something' and nothing else.

If you want to find a text that contains a particular letter, or match any other regular expression you must use regular expression pattern instead (like @Yannis said):

soup.find(text=re.compile(u'定'))

Note the the re.U flag is not required as you are not changing the behavior of special characters like \\s or \\w. If that would be the case, than you might need to provide it. See more on regular expressions here

Beautiful Soup .find Chinese Characters

Question

2 answers

solution1
1 2014-06-09 10:07:21

solution2
1 ACCPTED 2014-06-12 15:48:34

Beautiful Soup .find Chinese Characters

Question

2 answers

solution1 1 2014-06-09 10:07:21

solution2 1 ACCPTED 2014-06-12 15:48:34

solution1
1 2014-06-09 10:07:21

solution2
1 ACCPTED 2014-06-12 15:48:34