Pandas 中的新 dataframe 基于现有 df 的特定值（其中很多）

Question

Good evening.晚上好。 I'm using pandas on Jupyter Notebook.我在 Jupyter Notebook 上使用 pandas。 I have a huge dataframe representing full history of posts of 26 channels in a messenger, It has a column "dialog_id" which represents in which dialog the message was sent(so, there can be only 26 unique values in the column, but there are more then 700k rows, and the df is sorted itself by time, not id. so it is kinda chaotic), I have to split this dataframe into 2 different(one will contain full history of 13 channels. and the other will contain history for the rest 13 channels), I know ids by which I have to split.我有一个巨大的 dataframe 代表信使中 26 个频道的帖子的完整历史记录，它有一个列“dialog_id”代表消息是在哪个对话框中发送的（因此，该列中只能有 26 个唯一值，但是有超过 700k 行，df 是按时间排序的，而不是 id。所以它有点混乱），我必须将这个 dataframe 分成 2 个不同的（一个将包含 13 个通道的完整历史记录。另一个将包含rest 13 个频道），我知道我必须拆分的 ID。 they are random as well, For example.它们也是随机的，例如。 one is -1001232032465 and the other is -1001153765346.一个是-1001232032465，另一个是-1001153765346。

The question is, how do I do it most elegantly and adequate?问题是，我如何最优雅、最充分地做到这一点？ I know I can do it somehow with df.loc[], but I don't want to put like 13 rows of df.loc[].我知道我可以用 df.loc[] 以某种方式做到这一点，但我不想放 13 行 df.loc[]。 I've tried to use logical operators for this, like: df1.loc[(df["dialog_id"] == '-1001708255880') & (df["dialog_id"] == '-1001645788710' )], but it doesn't work.我尝试为此使用逻辑运算符，例如：df1.loc[(df["dialog_id"] == '-1001708255880') & (df["dialog_id"] == '-1001645788710' )]，但它不起作用。 I suppose I'm using them wrong.我想我用错了。 I expect a solution with any method creating a new df, with the use of logical operators.我期望使用逻辑运算符创建新 df 的任何方法的解决方案。 In verbal expression, I think it should sound like "put the row in a new df if the dialog_id is x, or dialog_id is y, or dialog_id is z, etc".在口头表达中，我认为它应该听起来像“如果 dialog_id 是 x，或者 dialog_id 是 y，或者 dialog_id 是 z，等等，将行放在一个新的 df 中”。 Please help me!请帮我！

Answer 1

The easiest way seems to be just setting up a query.最简单的方法似乎就是设置查询。

df = pd.DataFrame(dict(col_id=[1,2,3,4,], other=[5,6,7,8,]))

channel_groupA = [1,2]
channel_groupB = [3,4]

df_groupA = df.query(f'col_id == {channel_groupA}')
df_groupB = df.query(f'col_id == {channel_groupB}')

Pandas 中的新 dataframe 基于现有 df 的特定值（其中很多）

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-11-17 15:21:33

Pandas 中的新 dataframe 基于现有 df 的特定值（其中很多）

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-11-17 15:21:33

解决方案1
0 已采纳 2022-11-17 15:21:33