简体   繁体   English

迭代没有唯一值的python数据帧

[英]Iterate through a python dataframe with no unique values

I'm having trouble with rearranging a dataframe in python, made from a CSV file to how I need it. 我在python中重新排列数据帧时遇到了麻烦,从CSV文件到我需要它的方式。 The data in the dataframe is as such: 数据框中的数据是这样的:

ID      VOLUME      DATETIME

900     2.36        11/01/2015 13:40
900     2.30        11/01/2015 13:40
900     2.18        11/01/2015 13:41
900     2.30        11/01/2015 13:41
901     1.88        07/01/2015 17:01
901     1.80        07/01/2015 17:01
901     1.73        07/01/2015 17:02
901     1.80        07/01/2015 17:02

I have tried all sorts to pivot the above to how I need it but due to the fields having no real unique values I can not do it. 我已尝试各种方法将上述内容转移到我需要的方式,但由于字段没有真正的唯一值,我无法做到。 I have been thinking I need to use iterrows to get it how I need it but haven't been able to figure it out. 我一直在想我需要使用iterrows来获取它我需要的方式但是却无法弄明白。 This is how I'm looking to get the data: 这就是我想要获取数据的方式:

    900↓    901↓

    2.36    1.88
    2.30    1.80
    2.18    1.73
    2.30    1.80

I am trying to display one column per item in the ID column but I'm really starting to bang my head against the wall on this one. 我试图在ID列中显示每个项目的一列,但我真的开始在这个上面撞墙。 Can I create a new dataframe as above or am I going about this the wrong way? 我可以像上面那样创建一个新的数据帧,还是以错误的方式解决这个问题?

Solution for the case when you have ID's with different # of rows: 当您的ID具有不同的行数时,解决方案:

In [34]: df
Out[34]:
    ID  VOLUME          DATETIME
0  900    2.36  11/01/2015 13:40
1  900    2.30  11/01/2015 13:40
2  900    2.18  11/01/2015 13:41
3  900    2.30  11/01/2015 13:41
4  901    1.88  07/01/2015 17:01
5  901    1.80  07/01/2015 17:01
6  901    1.73  07/01/2015 17:02
7  901    1.80  07/01/2015 17:02
8  901    1.11  07/01/2015 17:03   # NOTE: i've intentionally added this row

In [35]: pd.DataFrame({k : pd.Series(v)
                       for k, v in df.groupby('ID').VOLUME.apply(list).to_dict().items()})
Out[35]:
    900   901
0  2.36  1.88
1  2.30  1.80
2  2.18  1.73
3  2.30  1.80
4   NaN  1.11

OLD answer: 老答案:

try this: 尝试这个:

In [12]: pd.DataFrame(df.groupby('ID').VOLUME.apply(list).to_dict())
Out[12]:
    900   901
0  2.36  1.88
1  2.30  1.80
2  2.18  1.73
3  2.30  1.80

or: 要么:

In [18]: pd.DataFrame(df.groupby('ID').VOLUME.apply(lambda x: x.values).to_dict())
Out[18]:
    900   901
0  2.36  1.88
1  2.30  1.80
2  2.18  1.73
3  2.30  1.80

NOTE: this will work if you have the same amount of rows for all your ID 's 注意:如果您的所有ID都有相同的行数,这将有效

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM