如何在python中使用nltk識別字符串中的顏色？

Question

這個問題確實可以說明一切，但是我的問題是我希望能夠使用nltk識別字符串中的顏色，而我所能找到的就是如何對詞性進行分類。 我知道我可以列出所有我要支持的顏色，但是由於我要支持css中可用的所有顏色，所以這將是一個很長的列表（其中一些顏色很奇怪，例如藍綠色和藍綠色）。 如果有比將它們全部寫出來更簡單的方法，將不勝感激。 謝謝！

編輯：

當我第一次問我需要的問題時，似乎忘記了提及。顏色名稱以自然語言的形式排列，而不是因為它們在語音識別中的使用而排列在一起。 因此，我選擇“ Tadhg McDonald-Jensen”的答案是最好的，因為它很好地回答了我原來的問題。 但是，我也發布了自己的答案，該答案提供了帶有空格的顏色名稱。 希望這可以幫助！

Answer 1

您可以使用webcolors軟件包獲取它可以識別的所有CSS顏色名稱，只需檢查webcolors.CSS3_NAMES_TO_HEX成員資格webcolors.CSS3_NAMES_TO_HEX ：

>>> import webcolors
>>> "green" in webcolors.CSS3_NAMES_TO_HEX
True
>>> "deepskyblue" in webcolors.CSS3_NAMES_TO_HEX
True
>>> "aquamarine" in webcolors.CSS3_NAMES_TO_HEX
True
>>> len(webcolors.CSS3_NAMES_TO_HEX)
147

這意味着webcolors.CSS3_NAMES_TO_HEX.keys()將為您提供python2中的列表或python3中設置的所有css3顏色名稱的字典鍵。

Answer 2

解決方案（無論如何對我來說）：

注意：如果您只需要沒有空格的顏色（“ deepskyblue”而不是“ deep sky blue”），則可以使用上述任何答案。 但是，由於我將其與語音識別結合使用，因此我需要用自然語言中的空格分隔的顏色，這可以使用下面的代碼（在python 3中）實現，我認為它更完整：

import urllib.request
from bs4 import BeautifulSoup

def getColors():
    html = urllib.request.urlopen('http://www.w3schools.com/colors/colors_names.asp').read()
    soup = BeautifulSoup(html, 'html.parser')
    children = [item.findChildren() for item in soup.find_all('tr')]
    colors = [''.join( ' '+x if 'A' <= x <= 'Z' else x for x in item[0].text.replace(u'\xa0', '')).strip().lower() for item in children]
    return colors[1:]

那你跑

print(getColors())

你得到：

 ['alice blue', 'antique white', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanched almond', 'blue', 'blue violet', 'brown', 'burly wood', 'cadet blue', 'chartreuse', 'chocolate', 'coral', 'cornflower blue', 'cornsilk', 'crimson', 'cyan', 'dark blue', 'dark cyan', 'dark golden rod', 'dark gray', 'dark grey', 'dark green', 'dark khaki', 'dark magenta', 'dark olive green', 'dark orange', 'dark orchid', 'dark red', 'dark salmon', 'dark sea green', 'dark slate blue', 'dark slate gray', 'dark slate grey', 'dark turquoise', 'dark violet', 'deep pink', 'deep sky blue', 'dim gray', 'dim grey', 'dodger blue', 'fire brick', 'floral white', 'forest green', 'fuchsia', 'gainsboro', 'ghost white', 'gold', 'golden rod', 'gray', 'grey', 'green', 'green yellow', 'honey dew', 'hot pink', 'indian red', 'indigo', 'ivory', 'khaki', 'lavender', 'lavender blush', 'lawn green', 'lemon chiffon', 'light blue', 'light coral', 'light cyan', 'light golden rod yellow', 'light gray', 'light grey', 'light green', 'light pink', 'light salmon', 'light sea green', 'light sky blue', 'light slate gray', 'light slate grey', 'light steel blue', 'light yellow', 'lime', 'lime green', 'linen', 'magenta', 'maroon', 'medium aqua marine', 'medium blue', 'medium orchid', 'medium purple', 'medium sea green', 'medium slate blue', 'medium spring green', 'medium turquoise', 'medium violet red', 'midnight blue', 'mint cream', 'misty rose', 'moccasin', 'navajo white', 'navy', 'old lace', 'olive', 'olive drab', 'orange', 'orange red', 'orchid', 'pale golden rod', 'pale green', 'pale turquoise', 'pale violet red', 'papaya whip', 'peach puff', 'peru', 'pink', 'plum', 'powder blue', 'purple', 'rebecca purple', 'red', 'rosy brown', 'royal blue', 'saddle brown', 'salmon', 'sandy brown', 'sea green', 'sea shell', 'sienna', 'silver', 'sky blue', 'slate blue', 'slate gray', 'slate grey', 'snow', 'spring green', 'steel blue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'white smoke', 'yellow', 'yellow green']

希望這可以幫助！

Answer 3

我不會使用nltk而是正則表達式。

獲取所有CSS顏色的列表（在此處）
提取顏色名稱並建立一個列表（使用beautifulsoup）
建立正則表達式模式
使用此正則表達式模式來匹配您想要的字符串

這對我有用
（如果需要，您只需要更改最后兩行和代理設置）

from bs4 import BeautifulSoup

color_url = 'http://colours.neilorangepeel.com/'
proxies = {'http': 'http://proxy.foobar.fr:3128'}#if needed

#GET THE HTML FILE
import urllib.request
authinfo = urllib.request.HTTPBasicAuthHandler()# set up authentication info
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)# build a new opener that adds authentication and caching FTP handlers
urllib.request.install_opener(opener)# install the opener
colorfile = urllib.request.urlopen(color_url)

soup = BeautifulSoup(colorfile, 'html.parser')

#BUILD THE REGEX PATERN
colors = soup.find_all('h1')
colorsnames = [color.string for color in colors]
colorspattern = '|'.join(colorsnames)
colorregex = re.compile(colorspattern)

#MATCH WHAT YOU NEED
if colorregex.search(yourstring):
    do what you want

如何在python中使用nltk識別字符串中的顏色？

問題描述

3 個解決方案

解決方案1
3 已采納 2016-06-24 17:50:09

解決方案2
2 2016-06-24 22:42:53

解決方案3
1 2016-06-24 14:50:59

如何在python中使用nltk識別字符串中的顏色？

問題描述

3 個解決方案

解決方案1 3 已采納 2016-06-24 17:50:09

解決方案2 2 2016-06-24 22:42:53

解決方案3 1 2016-06-24 14:50:59

解決方案1
3 已采納 2016-06-24 17:50:09

解決方案2
2 2016-06-24 22:42:53

解決方案3
1 2016-06-24 14:50:59