[英]pandas: Create new column by comparing DataFrame rows of one column of DataFrame
[英]Pandas dataframe create new column after x rows
我正在尝试根据 CSV 文件中的一些数据创建一个新的 DataFrame。
我的数据是以下形式:
1, 81.99525117808678
2, 78.79210736916842
3, 69.33703048261454
4, 53.12612416937101
5, 48.8442549498639
6, 48.8442549498639
7, 38.96011640562207
8, 33.66251691693962
9, 29.202159649144907
10, 27.77726568480279
1, 81.99525117808678
2, 78.79210736916842
3, 69.33703048261454
4, 53.12612416937101
5, 48.8442549498639
6, 48.8442549498639
7, 38.96011640562207
8, 33.66251691693962
9, 29.202159649144907
10, 27.77726568480279
第一个数字代表索引,第二个数字代表值。 我想为每个独特的运行创建一个新列。 例如:
Index: Run 1: Run 2:
1, 81.99525117808678, 81.99525117808678
2, 78.79210736916842, 78.79210736916842
3, 69.33703048261454, 69.33703048261454
4, 53.12612416937101, 53.12612416937101
5, 48.8442549498639, 48.8442549498639
6, 48.8442549498639, 48.8442549498639
7, 38.96011640562207, 38.96011640562207
8, 33.66251691693962, 33.66251691693962
9, 29.202159649144907, 29.202159649144907
10, 27.77726568480279, 27.77726568480279
到目前为止,我有以下几点:
df = pd.read_csv(path, header=None, names=['Generation', 'Fitness'], index_col=0)
这产生了结果:
0
1 81.995251
2 78.792107
3 69.337030
4 53.126124
5 48.844255
6 48.844255
7 38.960116
8 33.662517
9 29.202160
10 27.777266
1 81.995251
2 78.792107
3 69.337030
4 53.126124
5 48.844255
6 48.844255
7 38.960116
8 33.662517
9 29.202160
10 27.777266
您可以创建一个reader
可迭代(有关详细信息,请参阅文档),块大小为 10,然后连接每个块:
reader = pd.read_csv('data.csv', sep=',', chunksize=10,
index_col=0, header=None, names=['Generation', 'Fitness'])
my_df = pd.concat((chunk for chunk in reader), axis=1)
>>> my_df
Fitness Fitness
Generation
1 81.995251 81.995251
2 78.792107 78.792107
3 69.337030 69.337030
4 53.126124 53.126124
5 48.844255 48.844255
6 48.844255 48.844255
7 38.960116 38.960116
8 33.662517 33.662517
9 29.202160 29.202160
10 27.777266 27.777266
如果您需要列名,您可以使用列表理解重命名它们:
# python 3.6 or above
my_df.columns = [f'Run {i}' for i, _ in enumerate(my_df.columns,1)]
# Or:
my_df.columns = ['Run {}'.format(i) for i, _ in enumerate(my_df.columns,1)]
# Or:
my_df.columns = range(1,len(list(df))+1)
my_df = my_df.add_prefix('Run ')
>>> my_df
Run 1 Run 2
Generation
1 81.995251 81.995251
2 78.792107 78.792107
3 69.337030 69.337030
4 53.126124 53.126124
5 48.844255 48.844255
6 48.844255 48.844255
7 38.960116 38.960116
8 33.662517 33.662517
9 29.202160 29.202160
10 27.777266 27.777266
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.