Python 和 Pandas：“系列”对象是可变的，因此它们不能被散列

Question

With Python and Pandas, I'm seeking to write a script that takes the data from the text column, evaluates that text with the textstat module, and then write the results back into the csv under the word_count column.使用 Python 和 Pandas，我试图编写一个脚本，该脚本从text列中获取数据，使用 textstat 模块评估该文本，然后将结果写回word_count列下的 csv。

Here is the structure of the csv:这是csv的结构：

 user_id         text text_number  word_count
0       10  test text A      text_0         NaN
1       11          NaN         NaN         NaN
2       12          NaN         NaN         NaN
3       13          NaN         NaN         NaN
4       14          NaN         NaN         NaN
5       15  test text B      text_1         NaN

Here is my code attempt to loop the text column into textstat:这是我尝试将text列循环到 textstat 的代码：

df = pd.read_csv("texts.csv").fillna('')
text_data = df["text"]
length1 = len(text_data)

for x in range(length1):
    (text_data[x])

    #this is the textstat word count operation
    word_count = textstat.lexicon_count(text_data, removepunct=True)
    output_df = pd.DataFrame({"word_count":[word_count]})
    output_df.to_csv('texts.csv', mode="a", header=False, index=False)

However, I recieve this error:但是，我收到此错误：

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any suggestions on how to proceed?有关如何进行的任何建议？ All assistance appreciated.感谢所有帮助。

Answer 1

The more pandas approach would be to use fillna + apply .更多的pandas方法是使用fillna + apply 。 Then write the Series directly out to_csv :然后将Series直接写出to_csv ：

(
    df["text"].fillna('')  # Replace NaN with empty String
        .apply(textstat.lexicon_count,
               removepunct=True)  # Call lexicon_count on each value
        .rename('word_count')  # Rename Series
        .to_csv('texts.csv', mode="a", index=False)  # Write to csv
)

texts.csv:文本.csv：

word_count
1
0
0
0
0
1

To add a column to the existing DataFrame/csv instead of appending to the end of it can also do:要将一列添加到现有的 DataFrame/csv 而不是附加到它的末尾也可以这样做：

df['word_count'] = (
    df["text"].fillna('')  # Replace NaN with empty String
        .apply(textstat.lexicon_count,
               removepunct=True)  # Call lexicon_count on each value
)

df.to_csv('texts.csv', index=False)  # Write to csv

texts.csv:文本.csv：

user_id,text,text_number,word_count
text,A,text_0,1
,,,0
,,,0
,,,0
,,,0
text,B,text_1,1

To fix the current implementation, also use fillna and conditionally write the header only on the first iteration:要修复当前实现，还可以使用fillna并仅在第一次迭代时有条件地写入标头：

text_data = df["text"].fillna('')

for i, x in enumerate(text_data):
    # this is the textstat word count operation
    word_count = textstat.lexicon_count(x, removepunct=True)
    output_df = pd.DataFrame({"word_count": [word_count]})
    output_df.to_csv('texts.csv', mode="a", header=(i == 0), index=False)

texts.csv:文本.csv：

word_count
1
0
0
0
0
1

DataFrame and imports:数据框和导入：

import pandas as pd
import textstat
from numpy import nan

df = pd.DataFrame({
    'user_id': ['text', nan, nan, nan, nan, 'text'],
    'text': ['A', nan, nan, nan, nan, 'B'],
    'text_number': ['text_0', nan, nan, nan, nan, 'text_1'],
    'word_count': [nan, nan, nan, nan, nan, nan]
})

Python 和 Pandas：“系列”对象是可变的，因此它们不能被散列

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-07-26 01:14:13

Python 和 Pandas：“系列”对象是可变的，因此它们不能被散列

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-07-26 01:14:13

解决方案1
3 已采纳 2021-07-26 01:14:13