简体   繁体   中英

Pandas: Merge a dataframe column to a list

i am doing some Text Analysis with Python (Nltk, Pandas) and need some help with my Dataframe. I am still a programming beginner.

I have a PoS Tagged Dataframe(1000 rows, 5 columns).

Column names: Number(this in the Index), Id, Title, Question, Answers

#2 Example rows for Question:

[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),('website', 'NOUN')]
[('Would', 'VERB'), ('you', 'PRON'), ('recomme...)] 

#2 Example rows for Answers:

[('This', 'DET'), ('is', 'VERB'), ('not', 'ADV'),('website', 'NOUN')] 
[('There', 'DET'), ('is', 'VERB'), ('a', 'DET'...)] 

Goals:

1.) one list (not str) with all 1000 PoS Tagged Questions

2.) one list (not str) with all 1000 PoS Tagged Answers

3.) one list (not str) with all 1000 PoS Tagged Answers and Questions

What i tried so far is to merge all rows in the Question column but my result was like:

[[('I', 'PRON'), ('am', 'VERB'),..],[('Would', 'VERB'), 
('you', 'PRON'), ('recomme...)],[(.....)]]  

I guess i made a mistake with joining them. how can i do this correctly to achieve a list which looks like this:

[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),.....]

for the complete column.

Edit after Beneres answer:

Thx for your quick answer. .sum() was my approach i did before but the result is:

print (df['Merged'])
0      [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
1      [('I', 'PRON'), ('am', 'VERB'), ('building', '...
2      [('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
3      [('I', 'PRON'), ('am', 'VERB'), ('working', 'V...

What i need is

print (df['Merged'])
0      [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
        ('I', 'PRON'), ('am', 'VERB'), ('building', '...
        ('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
        ('I', 'PRON'), ('am', 'VERB'), ('working', 'V...]

Edit 2: solved

If I understood well, you just need to do:

df['Merged'] = df['Questions'] + df['Answers']

which merges questions and answers, and then do

df.sum()

which merges (sums) all lists.

Example:

import pandas as pd

df = pd.DataFrame({'Q':[[('I', 'PRON'), ('am', 'VERB')], [('You', 'PRON'), ('are', 'VERB')]], 
              'A':[[('This', 'DET'), ('is', 'VERB')], [('Sparta', 'NOUN'), ('bitch', 'VERB')]]})
df['Merged'] = df['A'] +df['Q']

then:

df.sum()

looks like this:

A         [(This, DET), (is, VERB), (Sparta, NOUN), (bit...
Q         [(I, PRON), (am, VERB), (You, PRON), (are, VERB)]
Merged    [(This, DET), (is, VERB), (I, PRON), (am, VERB...
dtype: object

Then I am not quite sure about the format for goal 3, please give more details if this is not what you want.

I solved the problem in a weird way, don't know if this is a good solution but it works:

from ast import literal_eval

# sum all columns and replace resulting "][" between columns with ", "
# change str to list with literal_eval
allQuestions = literal_eval(dfQuestion.sum().replace("][", " ,"))
allAnswers = literal_eval(dfAnswers.sum().replace("][", " ,"))
allPosts = allQuestions + allAnswers

I hope this can help somebody else.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM