带有词汇表的 Python 词袋编码

Question

I am trying to implement new columns into my ML model.我正在尝试在我的 ML 模型中实现新列。 A numeric column should be created if a specific word is found in the text of the scraped data.如果在抓取数据的文本中找到特定单词，则应创建数字列。 For this I created a dummy script for testing.为此，我创建了一个虚拟脚本进行测试。

import pandas as pd

bagOfWords = ["cool", "place"]
wordsFound = ""

mystring = "This is a cool new place"
mystring = mystring.lower()

for word in bagOfWords:
    if word in mystring: 
        wordsFound = wordsFound + word + " "

print(wordsFound)
pd.get_dummies(wordsFound)

The output is输出是

    cool place
0   1

This means there is one sentence "0" and one entry of "cool place".这意味着有一个句子“0”和一个条目“cool place”。 This is not correct.这是不正确的。 Expectations would be like this:期望是这样的：

    cool place
0   1    1

Answer 1

Found a different solution, as I cound not find any way forward.找到了一个不同的解决方案，因为我找不到任何前进的道路。 Its a simple direct hot encoding.它是一种简单的直接热编码。 For this I enter for every word I need a new column into the dataframe and create the encoding directly.为此，我为每个需要在数据框中添加一个新列的单词输入并直接创建编码。

vocabulary = ["achtung", "suchen"]

for word in vocabulary:
    df2[word] = 0

    for index, row in df2.iterrows():
        if word in row["title"].lower():
            df2.set_value(index, word, 1)

带有词汇表的 Python 词袋编码

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-12-22 20:13:13

带有词汇表的 Python 词袋编码

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-12-22 20:13:13

解决方案1
0 已采纳 2019-12-22 20:13:13