Python 和 Pandas：将数据附加到新列

Question

With Python and Pandas, I'm writing a script that passes text data from a csv through the pylanguagetool library to calculate the number of grammatical errors in a text.使用 Python 和 Pandas，我正在编写一个脚本，该脚本将来自 csv 的文本数据通过 pylanguagetool 库传递，以计算文本中语法错误的数量。 The script successfully runs, but appends the data to the end of the csv instead of to a new column.脚本成功运行，但将数据附加到 csv 的末尾而不是新列。

The structure of the csv is: csv的结构是：

The working code is:工作代码是：

import pandas as pd
from pylanguagetool import api

df = pd.read_csv("Streamlit\stack.csv")

text_data = df["text"].fillna('')
length1 = len(text_data)

for i, x in enumerate(range(length1)):
    # this is the pylanguagetool operation
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    # this pulls the error count "message" from the pylanguagetool json
    error_count = result.count("message")
    output_df = pd.DataFrame({"error_count": [error_count]})
    output_df.to_csv("Streamlit\stack.csv", mode="a", header=(i == 0), index=False)

The output is:输出是：

Expected output:预期输出：

What changes are necessary to append the output like this?像这样附加输出需要哪些更改？

Answer 1

Instead of using a loop, you might consider lambda which would accomplish what you want in one line:您可以考虑使用lambda ，而不是使用循环，它可以在一行中完成您想要的操作：

df["error_count"] = df["text"].fillna("").apply(lambda x: len(api.check(x, api_url='https://languagetool.org/api/v2/', lang='en-US')["matches"]))

>>> df
   user_id  ... error_count
0       10  ...           2
1       11  ...           0
2       12  ...           0
3       13  ...           0
4       14  ...           0
5       15  ...           2

Edit:编辑：

You can write the above to a .csv file with:您可以使用以下命令将上述内容写入 .csv 文件：

df.to_csv("Streamlit\stack.csv", index=False)

You don't want to use mode="a" as that opens the file in append mode whereas you want (the default) write mode.您不想使用mode="a"因为它以追加模式打开文件，而您想要（默认）写入模式。

Answer 2

My strategy would be to keep the error counts in a list then create a separate column in the original database and finally write that database to csv:我的策略是将错误计数保留在列表中，然后在原始数据库中创建一个单独的列，最后将该数据库写入 csv：

text_data = df["text"].fillna('')
length1 = len(text_data)
error_count_lst = []
for i, x in enumerate(range(length1)):
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    error_count = result.count("message")
    error_count_lst.append(error_count)

text_data['error_count'] = error_count_lst
text_data.to_csv('file.csv', index=False)

Python 和 Pandas：将数据附加到新列

问题描述

2 个解决方案

解决方案1
4 已采纳 2021-07-26 13:43:47

Edit:编辑：

解决方案2
1 2021-07-26 13:41:56

Python 和 Pandas：将数据附加到新列

问题描述

2 个解决方案

解决方案1 4 已采纳 2021-07-26 13:43:47

Edit:编辑：

解决方案2 1 2021-07-26 13:41:56

解决方案1
4 已采纳 2021-07-26 13:43:47

解决方案2
1 2021-07-26 13:41:56