創建新列時如何解決 Python Pandas 分配錯誤

Question

我有一個包含家庭描述的df ：

description
0   Beautiful, spacious skylit studio in the heart...
1   Enjoy 500 s.f. top floor in 1899 brownstone, w...
2   The spaceHELLO EVERYONE AND THANKS FOR VISITIN...
3   We welcome you to stay in our lovely 2 br dupl...
4   Please don’t expect the luxury here just a bas...
5   Our best guests are seeking a safe, clean, spa...
6   Beautiful house, gorgeous garden, patio, cozy ...
7   Comfortable studio apartment with super comfor...
8   A charming month-to-month home away from home ...
9   Beautiful peaceful healthy homeThe spaceHome i...

我正在嘗試計算每行上的句子數（使用來自sent_tokenize的nltk.tokenize ）和 append 這些值作為新列， sentence_count到df 。 由於這是更大數據管道的一部分，我使用 pandas assign ，以便我可以鏈接操作。

不過，我似乎無法讓它工作。 我試過了：

df.assign(sentence_count=lambda x: len(sent_tokenize(x['description'])))

和

df.assign(sentence_count=len(sent_tokenize(df['description'])))

但兩者都返回以下內容：

TypeError: expected string or bytes-like object

我已經確認每一行都有一個str dtype 。 也許是因為description有dtype('O') ？

我在這里做錯了什么？ 使用pipe和自定義 function 在這里可以正常工作，但我更喜歡使用assign 。

Answer 1

在第一個示例中將x['description']傳遞給sent_tokenize時，它是pandas.Series 。 它不是一個字符串。 它是一個字符串系列（類似於列表）。

因此，您應該這樣做：

df['counts'] = x['description'].apply(sent_tokenize)

或者，如果您需要將額外的參數傳遞給sent_tokenize ：

df['counts'] = x['description'].apply(lambda x: sent_tokenize(x))

創建新列時如何解決 Python Pandas 分配錯誤

問題描述

1 個解決方案

解決方案1
0 2021-11-20 15:26:53

創建新列時如何解決 Python Pandas 分配錯誤

問題描述

1 個解決方案

解決方案1 0 2021-11-20 15:26:53

解決方案1
0 2021-11-20 15:26:53