[英]Create new columns in a data frame based on an existing numeric column, a list of strings as column names and a list of tuples as values
I have a data frame that contains a numeric column and I have a list of tuples and a list of strings.我有一个包含数字列的数据框,我有一个元组列表和一个字符串列表。 The list of tuples represents the values that should be added, where each index in that list corresponds to the numeric column in the data frame.
元组列表表示应该添加的值,其中该列表中的每个索引对应于数据框中的数字列。 The list of strings represents the names of the to be added columns.
字符串列表表示要添加的列的名称。
Example:例子:
import pandas as pd
df = pd.DataFrame({'number':[0,0,1,1,2,2,3,3]})
# a list of keys and a list of tuples
keys = ['foo','bar']
combinations = [('99%',0.9),('99%',0.8),('1%',0.9),('1%',0.8)]
Expected output:预期输出:
number foo bar
0 0 99% 0.9
1 0 99% 0.9
2 1 99% 0.8
3 1 99% 0.8
4 2 1% 0.9
5 2 1% 0.9
6 3 1% 0.8
7 3 1% 0.8
To get that output, you can just try要获得该输出,您可以尝试
df2 = pd.DataFrame(combinations, columns = keys)
pd.concat([df, df2], axis=1)
which returns返回
number foo bar
0 0 99% 0.9
1 1 99% 0.8
2 2 1% 0.9
3 3 1% 0.8
Based on your new requirements, you can use the following根据您的新要求,您可以使用以下内容
df.set_index('number', inplace=True)
df = df.merge(df2, left_index = True, right_index=True)
df = df.reset_index().rename(columns={'index':'number'})
This also works for different duplicates amounts, ie这也适用于不同的重复数量,即
df = pd.DataFrame({'number':[0,0,1,1,1,2,2,3,3,3]})
returns返回
number foo bar
0 0 99% 0.9
1 0 99% 0.9
2 1 99% 0.8
3 1 99% 0.8
4 1 99% 0.8
5 2 1% 0.9
6 2 1% 0.9
7 3 1% 0.8
8 3 1% 0.8
9 3 1% 0.8
You can use list comprehension, in a for
loop, I think it's a pretty fast and straightforward approach:您可以在
for
循环中使用列表理解,我认为这是一种非常快速和直接的方法:
for i in range(len(keys)):
df[keys[i]] = [x[i] for x in combinations]
Output:输出:
number foo bar
0 0 99% 0.9
1 1 99% 0.8
2 2 1% 0.9
3 3 1% 0.8
I found one solution using:我找到了一种解决方案:
df_new = pd.DataFrame()
for model_number,df_subset in df.groupby('number'):
for key_idx,key in enumerate(keys):
df_subset[key] = combinations[model_number][key_idx]
df_new = df_new.append(df_subset)
But this seems pretty 'dirty' for me, there might be better and more efficient solutions?但这对我来说似乎很“脏”,可能有更好更有效的解决方案吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.