[英]How do I save my list to a dataframe keeping empty rows?
I'm trying to extract subject-verb-object triplets and then attach an ID. 我正在尝试提取主谓词-对象三元组,然后附加一个ID。 I am using a loop so my list of extracted triplets keeping the results for the rows were no triplet was found.
我正在使用一个循环,所以我提取的三元组列表保持了行的结果,但未找到三元组。 So it looks like:
所以看起来像:
[]
[trump,carried,energy]
[]
[clinton,doesn't,trust]
When I print mylist it looks as expected. 当我打印mylist时,它看起来像预期的那样。
However when I try and create a dataframe from mylist I get an error caused by the empty rows 但是,当我尝试从mylist创建数据框时,出现由空行引起的错误
`IndexError: list index out of range`.
I tried to include an if statement to avoid this but the problem is the same. 我试图包括一个if语句来避免这种情况,但是问题是相同的。 I also tried using reindex instead but the df2 came out empty.
我也尝试使用reindex代替,但是df2空了。
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import spacy
import textacy
import csv, string, re
import numpy as np
import pandas as pd
#Import csv file with pre-processing already carried out
import pandas as pd
df = pd.read_csv("pre-processed_file_1.csv", sep=",")
#Prepare dataframe to be relevant columns and unicode
df1 = df[['text_1', 'id']].copy()
import StringIO
s = StringIO.StringIO()
tweets = df1.to_csv(encoding='utf-8');
nlp = spacy.load('en')
count = 0;
df2 = pd.DataFrame();
for row in df1.iterrows():
doc = nlp(unicode(row));
text_ext = textacy.extract.subject_verb_object_triples(doc);
tweetID = df['id'].tolist();
mylist = list(text_ext)
count = count + 1;
if (mylist):
df2 = df2.append(mylist, ignore_index=True)
else:
df2 = df2.append('0','0','0')
Any help would be very appreciated. 任何帮助将不胜感激。 Thank you!
谢谢!
You're supposed to pass a DataFrame-shaped object to append
. 您应该传递一个DataFrame形状的对象
append
。 Passing the raw data doesn't work. 传递原始数据不起作用。 So
df2=df2.append([['0','0','0']],ignore_index=True)
所以
df2=df2.append([['0','0','0']],ignore_index=True)
You can also wrap your processing in a function process_row
, then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()])
. 您还可以将处理包装在函数
process_row
,然后执行df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()])
。 Note that while append
won't work with empty rows, the DataFrame constructor just fills them in with None
. 请注意,虽然
append
不适用于空行,但DataFrame构造函数仅将它们填充为None
。 If you want empty rows to be ['0','0','0']
, you have several options: 如果希望空行为
['0','0','0']
,则有以下几种选择:
-Have your processing function return ['0','0','0']
for empty rows -让处理函数为空行返回
['0','0','0']
-Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-将df1.iterrows
[process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
的列表理解更改为[process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-Do df2=df2.fillna('0')
-执行
df2=df2.fillna('0')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.