繁体   English   中英

Pandas 数据框在 x 行后创建新列

[英]Pandas dataframe create new column after x rows

我正在尝试根据 CSV 文件中的一些数据创建一个新的 DataFrame。

我的数据是以下形式:

1, 81.99525117808678
2, 78.79210736916842
3, 69.33703048261454
4, 53.12612416937101
5, 48.8442549498639
6, 48.8442549498639
7, 38.96011640562207
8, 33.66251691693962
9, 29.202159649144907
10, 27.77726568480279
1, 81.99525117808678
2, 78.79210736916842
3, 69.33703048261454
4, 53.12612416937101
5, 48.8442549498639
6, 48.8442549498639
7, 38.96011640562207
8, 33.66251691693962
9, 29.202159649144907
10, 27.77726568480279

第一个数字代表索引,第二个数字代表值。 我想为每个独特的运行创建一个新列。 例如:

Index:       Run 1:             Run 2:
1,           81.99525117808678, 81.99525117808678
2,           78.79210736916842, 78.79210736916842
3,           69.33703048261454, 69.33703048261454
4,           53.12612416937101, 53.12612416937101
5,           48.8442549498639, 48.8442549498639
6,           48.8442549498639, 48.8442549498639
7,           38.96011640562207, 38.96011640562207
8,           33.66251691693962, 33.66251691693962
9,           29.202159649144907, 29.202159649144907
10,          27.77726568480279, 27.77726568480279

到目前为止,我有以下几点:

df = pd.read_csv(path, header=None, names=['Generation', 'Fitness'], index_col=0)

这产生了结果:

0   
1   81.995251
2   78.792107
3   69.337030
4   53.126124
5   48.844255
6   48.844255
7   38.960116
8   33.662517
9   29.202160
10  27.777266
1   81.995251
2   78.792107
3   69.337030
4   53.126124
5   48.844255
6   48.844255
7   38.960116
8   33.662517
9   29.202160
10  27.777266

您可以创建一个reader可迭代(有关详细信息,请参阅文档),块大小为 10,然后连接每个块:

reader = pd.read_csv('data.csv', sep=',', chunksize=10,
                       index_col=0, header=None, names=['Generation', 'Fitness'])

my_df = pd.concat((chunk for chunk in reader), axis=1)

>>> my_df
              Fitness    Fitness
Generation                      
1           81.995251  81.995251
2           78.792107  78.792107
3           69.337030  69.337030
4           53.126124  53.126124
5           48.844255  48.844255
6           48.844255  48.844255
7           38.960116  38.960116
8           33.662517  33.662517
9           29.202160  29.202160
10          27.777266  27.777266

如果您需要列名,您可以使用列表理解重命名它们:

# python 3.6 or above
my_df.columns = [f'Run {i}' for i, _ in enumerate(my_df.columns,1)]
# Or:
my_df.columns = ['Run {}'.format(i) for i, _ in enumerate(my_df.columns,1)]
# Or:
my_df.columns = range(1,len(list(df))+1)
my_df = my_df.add_prefix('Run ')


>>> my_df
                Run 1      Run 2
Generation                      
1           81.995251  81.995251
2           78.792107  78.792107
3           69.337030  69.337030
4           53.126124  53.126124
5           48.844255  48.844255
6           48.844255  48.844255
7           38.960116  38.960116
8           33.662517  33.662517
9           29.202160  29.202160
10          27.777266  27.777266

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM