如何使用python pandas将CSV解析为所需的格式？

Question

我是python熊猫的新手。 我有这样的CSV文件：

insectName   count   weather  location   time        date      Condition
  aaa         15      sunny   balabala  0900:1200   1990-02-10     25
  bbb         10      sunny   balabala  0900:1200   1990-02-10     25
  ccc         20      sunny   balabala  0900:1200   1990-02-10     25
  ddd         50      sunny   balabala  0900:1200   1990-02-10     25
  ...        ...      ...      ...        ...            ...       ...
  XXX         40      sunny   balabala  1300:1500   1990-02-15     38
  yyy         10      sunny   balabala  1300:1500   1990-02-15     38
  yyy         25      sunny   balabala  1300:1500   1990-02-15     38

该文件中包含许多数据，并且每天的insectName都可以重复。 我想连续一天按“日期”转换数据使用情况。 像这样：

insectName  count  insectName  count  insectName  count  weather  location  time        date      Condition
  ccc         20      bbb       10       aaa        15    sunny   balabala  0900:1200   1990-02-10     25
  yyy         25      yyy       10       XXX        40    sunny   balabala  1300:1500   1990-02-15     38
  ...        ...      ...      ...       ...        ...    ...      ...        ...            ...        ...

我该怎么办？

Answer 1

有一个groupby/cumcount/unstack技巧可以将长格式的DataFrame转换为宽格式的DataFrame：

import pandas as pd
df = pd.read_table('data', sep='\s+')

common = ['weather', 'location', 'time', 'date', 'Condition']
grouped = df.groupby(common)
df['idx'] = grouped.cumcount()
df2 = df.set_index(common+['idx'])
df2 = df2.unstack('idx')
df2 = df2.swaplevel(0, 1, axis=1)
df2 = df2.sortlevel(axis=1)
df2.columns = df2.columns.droplevel(0)
df2 = df2.reset_index()
print(df2)

产量

  weather  location       time        date  Condition insectName  count  \
0   sunny  balabala  0900:1200  1990-02-10         25        aaa     15   
1   sunny  balabala  1300:1500  1990-02-15         38        XXX     40   

  insectName  count insectName  count insectName  count  
0        bbb     10        ccc     20        ddd     50  
1        yyy     10        yyy     25        NaN    NaN

尽管宽格式可能对表示有用，但请注意，长格式通常是数据处理的正确格式。 请参阅Hadley Wickham 关于整齐数据的优点的文章（PDF）。

如何使用python pandas将CSV解析为所需的格式？

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-02-25 14:43:10

如何使用python pandas将CSV解析为所需的格式？

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-02-25 14:43:10

解决方案1
0 已采纳 2015-02-25 14:43:10