将 NLTK FreqDist 的结果作为一行在 Python 中写入 .csv 文件

Question

我试图根据 python 列表中的一组单词写出文本文件中特定单词的频率计数结果（我没有将它包含在代码列表中，因为有几百个）

file_path = 'D:/TestHedges/Hedges_Test_11.csv'
corpus_root = test_path
wordlists = PlaintextCorpusReader(corpus_root, '.*')
print(wordlists.fileids())
CIK_List = []
freq_out = []

for filename in glob.glob(os.path.join(test_path, '*.txt')):

 CIK = filename[33:39]
 CIK = CIK.strip('_')
 # CIK = CIK.strip('_0') commented out to see if it deals with just removing _.  It does not 13/9/2020



 newstext = wordlists.words()
 fdist = nltk.FreqDist([w.lower() for w in newstext])

 CIK_List.append(CIK)
 with open(file_path, 'w', newline='') as csv_file:
  writer = csv.writer(csv_file)
  writer.writerow(["CIK"] + word_list)
  for val in CIK_List:
   writer.writerow([val])
  for m in word_list:
     print(CIK, [fdist[m]], end='')
     writer.writerows([fdist[m]])

我的问题是将fdist[m]作为一行写入 .csv 文件。 它正在生成一个错误

_csv.Error: iterable expected, not int

如何重写它以将频率分布放入 .csv 文件中的一行？

提前致谢

Answer 1

您有两种选择 - 使用writerow而不是writerows或先创建一个值列表，然后将其传递给writer.writerows而不是fdist[m] 。 现在，列表中的每个行值都应该是一个元组（或一个可交互的）。 因此， writerows工作，您必须再次将其封装在一个元组中：

 writer.writerows([(fdist[m],)])

此处，逗号表示 1 值元组。

为了将所有值写入一行而不是此代码：

for m in word_list:
     print(CIK, [fdist[m]], end='')
     writer.writerows([fdist[m]])

你应该使用：

for m in word_list:
     print(CIK, [fdist[m]], end='')
writer.writerows(([fdist[m] for m in word_list],))

请注意列表理解。

另一方面，仅通过查看您的代码，在我看来，您只需使用标准库中的collections.Counter即可在不涉及NLTK库的情况下执行相同操作。 它是FreqDist类中的底层容器。

将 NLTK FreqDist 的结果作为一行在 Python 中写入 .csv 文件

问题描述

1 个解决方案

解决方案1
1 2020-09-27 07:58:29

请注意列表理解。

将 NLTK FreqDist 的结果作为一行在 Python 中写入 .csv 文件

问题描述

1 个解决方案

解决方案1 1 2020-09-27 07:58:29

请注意列表理解。

解决方案1
1 2020-09-27 07:58:29