简体   繁体   中英

wordcloud for non-english corpus

wordcloud for non English text

Dear friends I am facing problems in generating proper wordcloud for non english text. The cloud is generated but it gives un-satisfactroy results. It shows wordcloud with characters only while I require wordcloud with proper words. I processed following code to generate wordcloud.

from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import random
import unicodedata
from wordcloud import WordCloud, STOPWORDS
text = scorpus
wordcloud = WordCloud(font_path='MBKhursheed.ttf',
                      relative_scaling = 1.0,
                      stopwords = sw
                      ).generate(text)
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

first you need to import (possibly install first) these two:

from arabic_reshaper import arabic_reshaper
from bidi.algorithm import get_display

then use it as the following:

text = get_display(arabic_reshaper.reshape(text))
wordcloud = WordCloud(font_path='MBKhursheed.ttf',
                      relative_scaling = 1.0,
                      stopwords = sw
                      ).generate(text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM