Python 和 Pandas：將數據附加到新列

Question

使用 Python 和 Pandas，我正在編寫一個腳本，該腳本將來自 csv 的文本數據通過 pylanguagetool 庫傳遞，以計算文本中語法錯誤的數量。 腳本成功運行，但將數據附加到 csv 的末尾而不是新列。

csv的結構是：

工作代碼是：

import pandas as pd
from pylanguagetool import api

df = pd.read_csv("Streamlit\stack.csv")

text_data = df["text"].fillna('')
length1 = len(text_data)

for i, x in enumerate(range(length1)):
    # this is the pylanguagetool operation
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    # this pulls the error count "message" from the pylanguagetool json
    error_count = result.count("message")
    output_df = pd.DataFrame({"error_count": [error_count]})
    output_df.to_csv("Streamlit\stack.csv", mode="a", header=(i == 0), index=False)

輸出是：

預期輸出：

像這樣附加輸出需要哪些更改？

Answer 1

您可以考慮使用lambda ，而不是使用循環，它可以在一行中完成您想要的操作：

df["error_count"] = df["text"].fillna("").apply(lambda x: len(api.check(x, api_url='https://languagetool.org/api/v2/', lang='en-US')["matches"]))

>>> df
   user_id  ... error_count
0       10  ...           2
1       11  ...           0
2       12  ...           0
3       13  ...           0
4       14  ...           0
5       15  ...           2

編輯：

您可以使用以下命令將上述內容寫入 .csv 文件：

df.to_csv("Streamlit\stack.csv", index=False)

您不想使用mode="a"因為它以追加模式打開文件，而您想要（默認）寫入模式。

Answer 2

我的策略是將錯誤計數保留在列表中，然后在原始數據庫中創建一個單獨的列，最后將該數據庫寫入 csv：

text_data = df["text"].fillna('')
length1 = len(text_data)
error_count_lst = []
for i, x in enumerate(range(length1)):
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    error_count = result.count("message")
    error_count_lst.append(error_count)

text_data['error_count'] = error_count_lst
text_data.to_csv('file.csv', index=False)

Python 和 Pandas：將數據附加到新列

問題描述

2 個解決方案

解決方案1
4 已采納 2021-07-26 13:43:47

編輯：

解決方案2
1 2021-07-26 13:41:56

Python 和 Pandas：將數據附加到新列

問題描述

2 個解決方案

解決方案1 4 已采納 2021-07-26 13:43:47

編輯：

解決方案2 1 2021-07-26 13:41:56

解決方案1
4 已采納 2021-07-26 13:43:47

解決方案2
1 2021-07-26 13:41:56