尽管 Key 存在，但 KeyError

Question

这是一个适用于我的数据框的函数我的计算机上有一个名为“100-contacts”的 csv 文件，该文件包含有关邮件的信息，例如名字、地址、城市等。我的目标是检测垃圾邮件邮件。 我需要清除停用词和标点符号中的数据，这部分代码会对我有所帮助，但尽管存在 Key，但我还是收到了KeyError 。

def process_text(text):
  #1 Remove puntcuation 
  #2 Remove stopwords
  #3 Return a list of clean text words

  #1
  nopunc = [char for char in text if char not in string.punctuation]
  nopunc = ' '.join(nopunc)

  #2
  clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

  #3
  return clean_words

df['text'].head().apply(process_text)

Answer 1

您的列名中可能有空格。 在将 CSV 读入 DataFrame 时添加sep=r'\\s*,\\s*'可能会有所帮助。

import pandas as pd
import string
from nltk.corpus import stopwords

# csv.csv
# name, age, text
# aa, 11, randomtext
# bb, 22, randomtexttext
# cc, 33, ra..ndo..mtexttext
df = pd.read_csv('csv.csv', header=0, sep=r'\s*,\s*')

def process_text(text):
  #1 Remove puntcuation
  #2 Remove stopwords
  #3 Return a list of clean text words

  #1
  nopunc = [char for char in text if char not in string.punctuation]
  nopunc = ' '.join(nopunc)

  #2
  clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

  #3
  return clean_words

print(df['text'].head().apply(process_text))

尽管 Key 存在，但 KeyError

问题描述

1 个解决方案

解决方案1
0 2020-09-16 14:25:14

尽管 Key 存在，但 KeyError

问题描述

1 个解决方案

解决方案1 0 2020-09-16 14:25:14

解决方案1
0 2020-09-16 14:25:14