[英]How to separate a data frame based on a column's range of values with pandas?
This is a bit of weird question, but I have been importing property data from an api in the format of a json file within python.这是一个有点奇怪的问题,但我一直在从 api 导入属性数据,格式为 python 中的 json 文件格式。 I then use Pandas to convert the json into a dataframe.然后我使用 Pandas 将 json 转换为 dataframe。
I am having trouble manipulating the data within the data frame.我在处理数据框中的数据时遇到问题。 My current data is set up as to be formatted like this table.我当前的数据设置为像这张表一样格式化。
Each Property is assigned a name and a property id and address, and there is a record for every unit within a property.每个属性都分配有一个名称和属性 ID 和地址,并且属性中的每个单元都有一个记录。 Ideally, I would like to create multiple data frames separated by property id, such that it would look like this.理想情况下,我想创建由属性 id 分隔的多个数据框,使其看起来像这样。
My only problem here is that due to their being some organization issues, there are about 100 different property ids, and none of the ids are in order.我唯一的问题是,由于它们是一些组织问题,大约有 100 个不同的属性 ID,并且没有一个 ID 是按顺序排列的。 They all have a random number from 1 - 1000.它们都有一个从 1 到 1000 的随机数。
Is there a way to automatically separate dataframes based on property id by using some sort of unique identifier combined with a for loop?有没有办法通过使用某种唯一标识符与 for 循环结合来根据属性 id 自动分离数据帧?
I don't really know how to approach the scenario.我真的不知道如何处理这个场景。 Thanks!谢谢!
Try this:尝试这个:
list_of_dataframes = [x for _, x in df.groupby(df['Property Id'].ne(df['Property Id'].shift(1)).cumsum())]
Now list_of_dataframes
is a list
of dataframes, where each dataframe contains the rows where the Property Id
was consecutively the same.现在list_of_dataframes
是一个数据帧list
,其中每个 dataframe 包含Property Id
连续相同的行。 So Property Id
s 1 1 1 9 9 9 1 1 1
would return 3 dataframes , one containing the first three 1's, the second containing the next three 9's, and the last containing the last three 1's.所以Property Id
s 1 1 1 9 9 9 1 1 1
将返回3 个数据帧,一个包含前三个 1,第二个包含接下来的三个 9,最后一个包含最后三个 1。
If don't want the groups to be based on the consective order (ie, you want 1 1 1 9 9 9 1 1 1
to be two dataframes, the first containing all six 1's, and the second containing the three 9's), you can do this:如果不希望组基于连续顺序(即,您希望1 1 1 9 9 9 1 1 1
是两个数据帧,第一个包含所有六个 1,第二个包含三个 9),您可以这样做:
list_of_dataframes = [x for _, x in df.groupby(df['Property Id'])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.