[英]How can I use Python and Pandas to parse through text and return the strings I want in separate data cells?
[英]How can I use python pandas to parse CSV into the format I want?
我是python熊猫的新手。 我有这样的CSV文件:
insectName count weather location time date Condition
aaa 15 sunny balabala 0900:1200 1990-02-10 25
bbb 10 sunny balabala 0900:1200 1990-02-10 25
ccc 20 sunny balabala 0900:1200 1990-02-10 25
ddd 50 sunny balabala 0900:1200 1990-02-10 25
... ... ... ... ... ... ...
XXX 40 sunny balabala 1300:1500 1990-02-15 38
yyy 10 sunny balabala 1300:1500 1990-02-15 38
yyy 25 sunny balabala 1300:1500 1990-02-15 38
该文件中包含许多数据,并且每天的insectName都可以重复。 我想连续一天按“日期”转换数据使用情况。 像这样:
insectName count insectName count insectName count weather location time date Condition
ccc 20 bbb 10 aaa 15 sunny balabala 0900:1200 1990-02-10 25
yyy 25 yyy 10 XXX 40 sunny balabala 1300:1500 1990-02-15 38
... ... ... ... ... ... ... ... ... ... ...
我该怎么办?
有一个groupby/cumcount/unstack
技巧可以将长格式的DataFrame转换为宽格式的DataFrame:
import pandas as pd
df = pd.read_table('data', sep='\s+')
common = ['weather', 'location', 'time', 'date', 'Condition']
grouped = df.groupby(common)
df['idx'] = grouped.cumcount()
df2 = df.set_index(common+['idx'])
df2 = df2.unstack('idx')
df2 = df2.swaplevel(0, 1, axis=1)
df2 = df2.sortlevel(axis=1)
df2.columns = df2.columns.droplevel(0)
df2 = df2.reset_index()
print(df2)
产量
weather location time date Condition insectName count \
0 sunny balabala 0900:1200 1990-02-10 25 aaa 15
1 sunny balabala 1300:1500 1990-02-15 38 XXX 40
insectName count insectName count insectName count
0 bbb 10 ccc 20 ddd 50
1 yyy 10 yyy 25 NaN NaN
尽管宽格式可能对表示有用,但请注意,长格式通常是数据处理的正确格式。 请参阅Hadley Wickham 关于整齐数据的优点的文章(PDF) 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.