錯誤從pandas數據幀迭代文本行

Question

我在嘗試迭代包含freetext的pandas數據幀中的一個系列時遇到錯誤。 該文本包含在df[1] 。

import pandas as pd
corpus = []
for i in range(0, 1000):
    review = df[1][i]

引發的錯誤是在最后一行代碼上。

除了KeyError為e1：如果len（self）> 0並且['integer'中的self.inferred_type，'boolean']：... KeyError：100

盡管搜索我無法弄清楚錯誤信息的含義。

編輯我意識到錯誤不是由正則表達式引起的，所以已經完全沒有提到正則表達式。 錯誤與上面顯示的代碼保持一致。

Answer 1

使用循環被認為是Pandas中最不理想的選擇。 請查看df.replace（）。

考慮這個數據幀，

df = pd.DataFrame({'col': ['sgra834', '%^$asgsg', '23hgfh*', 'sfg343^%adf']})

    col
0   sgra834
1   %^$asgsg
2   23hgfh*
3   sfg343^%adf

你可以使用替換，

df.replace('[^a-zA-Z]', '', regex = True)

你得到

    col
0   sgra
1   asgsg
2   hgfh
3   sfgadf