简体   繁体   English

熊猫:将数据框列合并到列表中

[英]Pandas: Merge a dataframe column to a list

i am doing some Text Analysis with Python (Nltk, Pandas) and need some help with my Dataframe. 我正在使用Python(Nltk,Pandas)进行一些文本分析,并且需要一些有关Dataframe的帮助。 I am still a programming beginner. 我仍然是编程初学者。

I have a PoS Tagged Dataframe(1000 rows, 5 columns). 我有一个PoS标记数据框(1000行,5列)。

Column names: Number(this in the Index), Id, Title, Question, Answers 列名:编号(索引中的此),ID,标题,问题,答案

#2 Example rows for Question:

[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),('website', 'NOUN')]
[('Would', 'VERB'), ('you', 'PRON'), ('recomme...)] 

#2 Example rows for Answers:

[('This', 'DET'), ('is', 'VERB'), ('not', 'ADV'),('website', 'NOUN')] 
[('There', 'DET'), ('is', 'VERB'), ('a', 'DET'...)] 

Goals: 目标:

1.) one list (not str) with all 1000 PoS Tagged Questions 1.)包含所有1000个PoS标记问题的一个 列表 (非str)

2.) one list (not str) with all 1000 PoS Tagged Answers 2.)包含所有1000个PoS标记答案的一个 列表 (非str)

3.) one list (not str) with all 1000 PoS Tagged Answers and Questions 3.)包含所有1000个PoS标记的答案和问题的一个 列表 (非str)

What i tried so far is to merge all rows in the Question column but my result was like: 到目前为止,我尝试的是合并“问题”列中的所有行,但是我的结果是:

[[('I', 'PRON'), ('am', 'VERB'),..],[('Would', 'VERB'), 
('you', 'PRON'), ('recomme...)],[(.....)]]  

I guess i made a mistake with joining them. 我想我加入他们的工作是一个错误。 how can i do this correctly to achieve a list which looks like this: 我怎样才能正确地做到这一点,看起来像这样的清单:

[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),.....]

for the complete column. 完整列。

Edit after Beneres answer: Beneres回答后编辑:

Thx for your quick answer. 谢谢您的快速解答。 .sum() was my approach i did before but the result is: .sum()是我以前做过的方法,但结果是:

print (df['Merged'])
0      [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
1      [('I', 'PRON'), ('am', 'VERB'), ('building', '...
2      [('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
3      [('I', 'PRON'), ('am', 'VERB'), ('working', 'V...

What i need is 我需要的是

print (df['Merged'])
0      [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
        ('I', 'PRON'), ('am', 'VERB'), ('building', '...
        ('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
        ('I', 'PRON'), ('am', 'VERB'), ('working', 'V...]

Edit 2: solved 编辑2:解决

If I understood well, you just need to do: 如果我理解得很好,则只需执行以下操作:

df['Merged'] = df['Questions'] + df['Answers']

which merges questions and answers, and then do 合并问题和答案,然后执行

df.sum()

which merges (sums) all lists. 合并(汇总)所有列表。

Example: 例:

import pandas as pd

df = pd.DataFrame({'Q':[[('I', 'PRON'), ('am', 'VERB')], [('You', 'PRON'), ('are', 'VERB')]], 
              'A':[[('This', 'DET'), ('is', 'VERB')], [('Sparta', 'NOUN'), ('bitch', 'VERB')]]})
df['Merged'] = df['A'] +df['Q']

then: 然后:

df.sum()

looks like this: 看起来像这样:

A         [(This, DET), (is, VERB), (Sparta, NOUN), (bit...
Q         [(I, PRON), (am, VERB), (You, PRON), (are, VERB)]
Merged    [(This, DET), (is, VERB), (I, PRON), (am, VERB...
dtype: object

Then I am not quite sure about the format for goal 3, please give more details if this is not what you want. 然后,我对目标3的格式不太确定,如果这不是您想要的,请提供更多详细信息。

I solved the problem in a weird way, don't know if this is a good solution but it works: 我以一种怪异的方式解决了这个问题,不知道这是否是一个好的解决方案,但是它可以工作:

from ast import literal_eval

# sum all columns and replace resulting "][" between columns with ", "
# change str to list with literal_eval
allQuestions = literal_eval(dfQuestion.sum().replace("][", " ,"))
allAnswers = literal_eval(dfAnswers.sum().replace("][", " ,"))
allPosts = allQuestions + allAnswers

I hope this can help somebody else. 我希望这可以帮助其他人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM