[英]Concatenating a series of strings into a single string within a Pandas Dataframe column (for each row)?
This is really throwing me for a loop.这真的让我陷入了困境。 In a pandas dataframe (df) I have the following:在 pandas dataframe (df) 我有以下内容:
date日期 | News消息 |
---|---|
2021-02-03 2021-02-03 | Some random event occurred today.今天发生了一些随机事件。 |
2021-02-03 2021-02-03 | We asked a question on Stack Overflow.我们在 Stack Overflow 上提出了一个问题。 |
2021-02-02 2021-02-02 | The weather is nice.天气很好。 |
2021-02-02 2021-02-02 | Hello.你好。 World.世界。 |
The date column is the index which is of the date format, and the News column is a string. date 列是日期格式的索引,News 列是一个字符串。 What I want to do is to combine the duplicate dates and join or concatenate the News column, for example:我想要做的是合并重复的日期并加入或连接新闻列,例如:
date日期 | News消息 |
---|---|
2021-02-03 2021-02-03 | Some random event occurred today.今天发生了一些随机事件。 We asked a question on Stack Overflow.我们在 Stack Overflow 上提出了一个问题。 |
2021-02-02 2021-02-02 | The weather is nice.天气很好。 Hello.你好。 World.世界。 |
So far, I have:到目前为止,我有:
df = df.groupby(['date']).agg({'News': list}) df = df.groupby(['date']).agg({'News': list})
However, while this does combine the duplicated dates, it puts the string values in a list, or rather according to the errors I've been getting while trying to join them, into a series.然而,虽然这确实结合了重复的日期,但它会将字符串值放在一个列表中,或者更确切地说,根据我在尝试加入它们时遇到的错误,将它们放入一个系列中。 At this point, I am completely lost and any hint/tip to lead me to the right pythonic way of doing this would be greatly appreciated!在这一点上,我完全迷失了,任何能引导我找到正确的pythonic方式的提示/提示将不胜感激!
PS: I would like to avoid using a loop if at all possible since this will need to parse through roughly 200k records multiple times (as a function). PS:如果可能的话,我想避免使用循环,因为这需要多次解析大约 20 万条记录(作为函数)。 If it makes any difference, I'll be using TextBlob on the News column to perform sentiment analysis on.如果有什么不同,我将在 News 列上使用 TextBlob 来执行情绪分析。
Quang Hoang answered the question perfectly! Quang Hoang完美地回答了这个问题! Although I'm not able to mark it as the answer sadly =(虽然我无法将其标记为可悲的答案 =(
df.groupby('date')['News'].agg(' '.join). df.groupby('date')['News'].agg(''.join)。 – Quang Hoang Feb 8 at 15:08 – Quang Hoang 2 月 8 日 15:08
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.