简体   繁体   English

将数据框列表示为其他列的线性组合

[英]Expressing a dataframe column as a linear combination of other columns

I have a data frame with 10 columns of data and 2000 rows.我有一个包含 10 列数据和 2000 行的数据框。 It also has additional columns that need to be ignored:它还具有需要忽略的其他列:

df1 = pd.DataFrame(np.random.randint(0,100,size=(2000, 10)), columns=list('ABCDEFGHIJ'))
df1['Company Name']=stringlist1

The names of columns can change from run to run as different files have different column names.由于不同的文件具有不同的列名称,因此每次运行时列的名称可能会有所不同。 The only thing common is that the data to be considered starts from the 7th column onwards, for the next 10 columns.唯一的共同点是要考虑的数据从第 7 列开始,接下来的 10 列。 I have a several lists, each containing 10 weights, some of which are zero, others are non-zero, adding up to 1. Example:我有几个列表,每个列表包含 10 个权重,其中一些为零,其他为非零,加起来为 1。示例:

wt1=[0.0,0.34,0.05,0.0,0.1,0.01,0.0,0.0,0.5,0.0]

I need to define a new df1 column that is the linear combination of the 10 columns, with the weights specified in wt1.我需要定义一个新的 df1 列,它是 10 列的线性组合,权重在 wt1 中指定。

How do I do that?我怎么做? Mind you, the names of columns (ABCD...) cannot appear in the summation expression, as the above code needs to be reusable for data where column names can be different (they are being read in from an Excel sheet).请注意,列名 (ABCD...) 不能出现在求和表达式中,因为上面的代码需要可重用于列名可能不同的数据(它们是从 Excel 工作表中读取的)。

I tried:我试过:

icollist1=[icol1 for icol1,val1 in enumerate(wt1) if val1>0.0]
for icol1 in icollist1:
    df1['Weighted Sum']+=np.asarray(wt1[icol1])*df1[colnames1[icol1]]

where colnames1 is a list of columns extracted from the Excel file this dataframe was read from.其中 colnames1 是从读取此数据框的 Excel 文件中提取的列列表。

I get errors:我收到错误:

TypeError: can't multiply sequence by non-int of type 'float'
...
During handling of the above exception, another exception occurred:
...
TypeError: can't multiply sequence by non-int of type 'float'

Try this for the example you provide针对您提供的示例尝试此操作

df1 = pd.DataFrame(np.random.randint(0,100,size=(2000, 10)), columns=list('ABCDEFGHIJ'))
wt1=[0.0,0.34,0.05,0.0,0.1,0.01,0.0,0.0,0.5,0.0]

df1.mul(wt1, axis=1).sum(axis=1)

In case you have more than 10 columns and you want to multiple from 7th columns onward:如果您有超过 10 列,并且您想从第 7 列开始多列:

df1 = pd.DataFrame(np.random.randint(0,100,size=(2000, 20)))
wt1=[0.0,0.34,0.05,0.0,0.1,0.01,0.0,0.0,0.5,0.0]
df1.iloc[:,6:16].mul(wt1, axis=1).sum(axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM