简体   繁体   English

Python 和 Pandas:“系列”对象是可变的,因此它们不能被散列

[英]Python & Pandas: 'Series' objects are mutable, thus they cannot be hashed

With Python and Pandas, I'm seeking to write a script that takes the data from the text column, evaluates that text with the textstat module, and then write the results back into the csv under the word_count column.使用 Python 和 Pandas,我试图编写一个脚本,该脚本从text列中获取数据,使用 textstat 模块评估该文本,然后将结果写回word_count列下的 csv。

Here is the structure of the csv:这是csv的结构:

 user_id         text text_number  word_count
0       10  test text A      text_0         NaN
1       11          NaN         NaN         NaN
2       12          NaN         NaN         NaN
3       13          NaN         NaN         NaN
4       14          NaN         NaN         NaN
5       15  test text B      text_1         NaN

Here is my code attempt to loop the text column into textstat:这是我尝试将text列循环到 textstat 的代码:

df = pd.read_csv("texts.csv").fillna('')
text_data = df["text"]
length1 = len(text_data)

for x in range(length1):
    (text_data[x])

    #this is the textstat word count operation
    word_count = textstat.lexicon_count(text_data, removepunct=True)
    output_df = pd.DataFrame({"word_count":[word_count]})
    output_df.to_csv('texts.csv', mode="a", header=False, index=False)

However, I recieve this error:但是,我收到此错误:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any suggestions on how to proceed?有关如何进行的任何建议? All assistance appreciated.感谢所有帮助。

The more pandas approach would be to use fillna + apply .更多的pandas方法是使用fillna + apply Then write the Series directly out to_csv :然后将Series直接写出to_csv

(
    df["text"].fillna('')  # Replace NaN with empty String
        .apply(textstat.lexicon_count,
               removepunct=True)  # Call lexicon_count on each value
        .rename('word_count')  # Rename Series
        .to_csv('texts.csv', mode="a", index=False)  # Write to csv
)

texts.csv:文本.csv:

word_count
1
0
0
0
0
1

To add a column to the existing DataFrame/csv instead of appending to the end of it can also do:要将一列添加到现有的 DataFrame/csv 而不是附加到它的末尾也可以这样做:

df['word_count'] = (
    df["text"].fillna('')  # Replace NaN with empty String
        .apply(textstat.lexicon_count,
               removepunct=True)  # Call lexicon_count on each value
)

df.to_csv('texts.csv', index=False)  # Write to csv

texts.csv:文本.csv:

user_id,text,text_number,word_count
text,A,text_0,1
,,,0
,,,0
,,,0
,,,0
text,B,text_1,1

To fix the current implementation, also use fillna and conditionally write the header only on the first iteration:要修复当前实现,还可以使用fillna并仅在第一次迭代时有条件地写入标头:

text_data = df["text"].fillna('')

for i, x in enumerate(text_data):
    # this is the textstat word count operation
    word_count = textstat.lexicon_count(x, removepunct=True)
    output_df = pd.DataFrame({"word_count": [word_count]})
    output_df.to_csv('texts.csv', mode="a", header=(i == 0), index=False)

texts.csv:文本.csv:

word_count
1
0
0
0
0
1

DataFrame and imports:数据框和导入:

import pandas as pd
import textstat
from numpy import nan

df = pd.DataFrame({
    'user_id': ['text', nan, nan, nan, nan, 'text'],
    'text': ['A', nan, nan, nan, nan, 'B'],
    'text_number': ['text_0', nan, nan, nan, nan, 'text_1'],
    'word_count': [nan, nan, nan, nan, nan, nan]
})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 系列对象是可变的,因此它们不能在 Python pandas 数据帧上散列 - Series objects are mutable, thus they cannot be hashed on Python pandas dataframe pandas Python Series对象是可变的,因此它们不能在查询方法中进行散列 - pandas Python Series objects are mutable, thus they cannot be hashed in query method Python:“系列”对象是可变的,因此它们不能被散列 - Python : 'Series' objects are mutable, thus they cannot be hashed Pandas loc 错误:“系列”对象是可变的,因此它们不能被散列 - Pandas loc error: 'Series' objects are mutable, thus they cannot be hashed Pandas 返回错误:“系列”对象是可变的,因此它们不能被散列 - Pandas returns error: 'Series' objects are mutable, thus they cannot be hashed 类型错误:“系列”对象是可变的,因此它们不能被散列 - TypeError: 'Series' objects are mutable, thus they cannot be hashed Python Pandas:“系列”对象是可变的,因此在使用时不能散列。groupby - Python Pandas: "Series" objects are mutable, thus cannot be hashed when using .groupby “系列”对象是可变的,因此它们不能被散列 - 'Series' objects are mutable, thus they cannot be hashed 在pandas系列上使用apply方法获取TypeError'Series'对象是可变的,因此不能将它们散列 - Using apply method on pandas series getting TypeError 'Series' objects are mutable, thus they cannot be hashed pandas 列使用 function 计算,包括字典查找,“系列”对象是可变的,因此它们不能被散列 - pandas column calculated using function including dict lookup, 'Series' objects are mutable, thus they cannot be hashed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM