[英]Python & Pandas: How to address NaN values in a loop?
With Python and Pandas I'm seeking to take values from CSV cells and write them as txt files via a loop.使用 Python 和 Pandas,我试图从 CSV 单元格中获取值并通过循环将它们写为 txt 文件。 The structure of the CSV file is:
CSV文件的结构是:
user_id, text, text_number
0, test text A, text_0
1,
2,
3,
4,
5, test text B, text_1
The script below successfully writes a txt file for the first row - it is named text_0.txt and contains test text A
.下面的脚本成功地为第一行写入了一个 txt 文件 - 它被命名为 text_0.txt 并包含
test text A
。
import pandas as pd
df= pd.read_csv("test.csv", sep=",")
for index in range(len(df)):
with open(df["text_number"][index] + '.txt', 'w') as output:
output.write(df["text"][index])
However, I receive an error when it proceeds to the next row:但是,当它继续到下一行时,我收到一个错误:
TypeError: write() argument must be str, not float
I'm guessing the error is generated when it encounters values it reads as NaN
.我猜当它遇到读取为
NaN
值时会生成错误。 I attempted to add the dropna
feature per the pandas documentation like so:我尝试根据
dropna
文档添加dropna
功能,如下所示:
import pandas as pd
df= pd.read_csv("test.csv", sep=",")
df2 = df.dropna(axis=0, how='any')
for index in range(len(df)):
with open(df2["text_number"][index] + '.txt', 'w') as output:
output.write(df2["text"][index])
However, the same issue persists - a txt file is created for the first row, but a new error message is returned for the next row: KeyError: 1
.但是,同样的问题仍然存在 - 为第一行创建了一个 txt 文件,但为下一行返回了一条新的错误消息:
KeyError: 1
。
Any suggestions?有什么建议么? All assistance greatly appreciated.
非常感谢所有帮助。
The issue here is that you are creating a range index which is not necessarily in the data frame's index.这里的问题是您正在创建一个不一定在数据框索引中的范围索引。 For your use case, you can just iterate through rows of data frame and write to the file.
对于您的用例,您可以遍历数据帧行并写入文件。
for t in df.itertuples():
if t.text_number: # do not write if text number is None
with open(t.text_number + '.txt', 'w') as output:
output.write(str(t.text))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.