简体   繁体   English

如何将列表保存到保留空行的数据框?

[英]How do I save my list to a dataframe keeping empty rows?

I'm trying to extract subject-verb-object triplets and then attach an ID. 我正在尝试提取主谓词-对象三元组,然后附加一个ID。 I am using a loop so my list of extracted triplets keeping the results for the rows were no triplet was found. 我正在使用一个循环,所以我提取的三元组列表保持了行的结果,但未找到三元组。 So it looks like: 所以看起来像:

[]
[trump,carried,energy]
[]
[clinton,doesn't,trust]

When I print mylist it looks as expected. 当我打印mylist时,它看起来像预期的那样。

However when I try and create a dataframe from mylist I get an error caused by the empty rows 但是,当我尝试从mylist创建数据框时,出现由空行引起的错误

`IndexError: list index out of range`.

I tried to include an if statement to avoid this but the problem is the same. 我试图包括一个if语句来避免这种情况,但是问题是相同的。 I also tried using reindex instead but the df2 came out empty. 我也尝试使用reindex代替,但是df2空了。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import spacy
import textacy
import csv, string, re
import numpy as np
import pandas as pd

#Import csv file with pre-processing already carried out
import pandas as pd
df = pd.read_csv("pre-processed_file_1.csv", sep=",")

#Prepare dataframe to be relevant columns and unicode
df1 = df[['text_1', 'id']].copy()
import StringIO
s = StringIO.StringIO()
tweets = df1.to_csv(encoding='utf-8');
nlp = spacy.load('en')

count = 0;
df2 = pd.DataFrame();
for row in df1.iterrows():
  doc = nlp(unicode(row));
  text_ext = textacy.extract.subject_verb_object_triples(doc);
  tweetID = df['id'].tolist();
  mylist = list(text_ext)
  count = count + 1;
  if (mylist):
        df2 = df2.append(mylist, ignore_index=True)
  else:
        df2 = df2.append('0','0','0')

Any help would be very appreciated. 任何帮助将不胜感激。 Thank you! 谢谢!

You're supposed to pass a DataFrame-shaped object to append . 您应该传递一个DataFrame形状的对象append Passing the raw data doesn't work. 传递原始数据不起作用。 So df2=df2.append([['0','0','0']],ignore_index=True) 所以df2=df2.append([['0','0','0']],ignore_index=True)

You can also wrap your processing in a function process_row , then do df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()]) . 您还可以将处理包装在函数process_row ,然后执行df2 = pd.DataFrame([process_row(row) for row in df1.iterrows()]) Note that while append won't work with empty rows, the DataFrame constructor just fills them in with None . 请注意,虽然append不适用于空行,但DataFrame构造函数仅将它们填充为None If you want empty rows to be ['0','0','0'] , you have several options: 如果希望空行为['0','0','0'] ,则有以下几种选择:

-Have your processing function return ['0','0','0'] for empty rows -让处理函数为空行返回['0','0','0']
-Change the list comprehension to [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()] -将df1.iterrows [process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]的列表理解更改为[process_row(row) if process_row(row) else ['0','0','0'] for row in df1.iterrows()]
-Do df2=df2.fillna('0') -执行df2=df2.fillna('0')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM