简体   繁体   English

将一系列字符串连接成 Pandas Dataframe 列中的单个字符串(对于每一行)?

[英]Concatenating a series of strings into a single string within a Pandas Dataframe column (for each row)?

This is really throwing me for a loop.这真的让我陷入了困境。 In a pandas dataframe (df) I have the following:在 pandas dataframe (df) 我有以下内容:

date日期 News消息
2021-02-03 2021-02-03 Some random event occurred today.今天发生了一些随机事件。
2021-02-03 2021-02-03 We asked a question on Stack Overflow.我们在 Stack Overflow 上提出了一个问题。
2021-02-02 2021-02-02 The weather is nice.天气很好。
2021-02-02 2021-02-02 Hello.你好。 World.世界。

The date column is the index which is of the date format, and the News column is a string. date 列是日期格式的索引,News 列是一个字符串。 What I want to do is to combine the duplicate dates and join or concatenate the News column, for example:我想要做的是合并重复的日期并加入或连接新闻列,例如:

date日期 News消息
2021-02-03 2021-02-03 Some random event occurred today.今天发生了一些随机事件。 We asked a question on Stack Overflow.我们在 Stack Overflow 上提出了一个问题。
2021-02-02 2021-02-02 The weather is nice.天气很好。 Hello.你好。 World.世界。

So far, I have:到目前为止,我有:

df = df.groupby(['date']).agg({'News': list}) df = df.groupby(['date']).agg({'News': list})

However, while this does combine the duplicated dates, it puts the string values in a list, or rather according to the errors I've been getting while trying to join them, into a series.然而,虽然这确实结合了重复的日期,但它会将字符串值放在一个列表中,或者更确切地说,根据我在尝试加入它们时遇到的错误,将它们放入一个系列中。 At this point, I am completely lost and any hint/tip to lead me to the right pythonic way of doing this would be greatly appreciated!在这一点上,我完全迷失了,任何能引导我找到正确的pythonic方式的提示/提示将不胜感激!

PS: I would like to avoid using a loop if at all possible since this will need to parse through roughly 200k records multiple times (as a function). PS:如果可能的话,我想避免使用循环,因为这需要多次解析大约 20 万条记录(作为函数)。 If it makes any difference, I'll be using TextBlob on the News column to perform sentiment analysis on.如果有什么不同,我将在 News 列上使用 TextBlob 来执行情绪分析。

Quang Hoang answered the question perfectly! Quang Hoang完美地回答了这个问题! Although I'm not able to mark it as the answer sadly =(虽然我无法将其标记为可悲的答案 =(

df.groupby('date')['News'].agg(' '.join). df.groupby('date')['News'].agg(''.join)。 – Quang Hoang Feb 8 at 15:08 – Quang Hoang 2 月 8 日 15:08

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas DataFrame 将每一行的列转换为单个列作为 Pandas 系列 - Python Pandas DataFrame convert columns of each row to one single column as Pandas Series 检查pandas DataFrame列中字符串中是否有字符串 - Check if string within strings in pandas DataFrame column 如何检查PANDAS DataFrame列中是否包含一系列字符串,并将该字符串分配为行中的新列? - How to check if a series of strings is contained in a PANDAS DataFrame columns and assign that string as a new column in the row? 为 pandas dataframe 的每一行替换列中的字符串 - Replace a string in a column for each row of a pandas dataframe 将字符串连接到熊猫系列中? - Concatenating a string to a series in Pandas? 如何将字符串添加到pandas dataframe列系列中的每个偶数行? - How to add a string to every even row in a pandas dataframe column series? 在熊猫数据框中串联一行 - Concatenating a row in a pandas Dataframe 大熊猫:将每个字符串的一列(字段)替换为一列 - Pandas: replace a single column (field) of strings with one column for each string 将 pandas 系列连接到 dataframe 后重新分配列名 - Reassigning column names after concatenating pandas series into a dataframe 有效地将单行添加到Pandas Series或DataFrame - Efficiently add single row to Pandas Series or DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM