简体   繁体   中英

Concatenating a series of strings into a single string within a Pandas Dataframe column (for each row)?

This is really throwing me for a loop. In a pandas dataframe (df) I have the following:

date News
2021-02-03 Some random event occurred today.
2021-02-03 We asked a question on Stack Overflow.
2021-02-02 The weather is nice.
2021-02-02 Hello. World.

The date column is the index which is of the date format, and the News column is a string. What I want to do is to combine the duplicate dates and join or concatenate the News column, for example:

date News
2021-02-03 Some random event occurred today. We asked a question on Stack Overflow.
2021-02-02 The weather is nice. Hello. World.

So far, I have:

df = df.groupby(['date']).agg({'News': list})

However, while this does combine the duplicated dates, it puts the string values in a list, or rather according to the errors I've been getting while trying to join them, into a series. At this point, I am completely lost and any hint/tip to lead me to the right pythonic way of doing this would be greatly appreciated!

PS: I would like to avoid using a loop if at all possible since this will need to parse through roughly 200k records multiple times (as a function). If it makes any difference, I'll be using TextBlob on the News column to perform sentiment analysis on.

Quang Hoang answered the question perfectly! Although I'm not able to mark it as the answer sadly =(

df.groupby('date')['News'].agg(' '.join). – Quang Hoang Feb 8 at 15:08

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM