简体   繁体   English

遍历pandas数据帧中的每一行,并将所有行值乘以同一数据帧中的行值之一

[英]Iterate over every row in pandas dataframe and multiply all row values by one of the row values in same dataframe

I am trying to normalize row values in a dataframe. 我试图规范化数据框中的行值。 The values that need to be normalized contain the text, 'Count' in their column header. 需要标准化的值在其列标题中包含文本“ Count”。 There are other columns that do not have the word 'Count' in their header and I would like them to remain unchanged. 还有其他一些列的标题中没有单词“ Count”,我希望它们保持不变。 I have a normalization value that is in a column named, 'Normalization value.' 我在名为“规范化值”的列中有一个规范化值。

Therefore, in row wise iteration, each row's normalization value needs to be multiplied by every value that belongs to a column whose header contains the word 'count.' 因此,在逐行迭代中,需要将每行的规范化值乘以属于其标题包含单词“ count”的列的每个值。 This dataframe has a multiindex that I would like to preserve and is many columns and rows long so I need to do this without specifying exact locations or names. 该数据框具有我要保留的多索引,并且有很多列和行,因此我需要在不指定确切位置或名称的情况下执行此操作。

To no avail I've tried variations of multiply, broadcasting, itertuples, user defined functions. 无济于事,我尝试了乘法,广播,迭代,用户定义函数的变体。

This is my example dataframe: 这是我的示例数据框:

Sample Timepoint CountA CountB PercentA PercentB CountC Normalization Value
1           1       10     20     40       30       50        .1
2           1       20     10     25       35       100       .2
2           2       50     20     20       22       40        .5

This is what I would like the dataframe to look like after normalizing counts: 这是我希望数据计数归一化后的样子:

Sample Timepoint CountA CountB PercentA PercentB CountC Normalization Value
1           1       1     2     40       30       5        .1
2           1       4     2     25       35       20       .2
2           2       25    10    20       22       20       .5

You can using str.contains , then assign the mul result back with .loc , also here since you need assign it back ,with filter will failed 您可以使用str.contains ,然后使用.loc分配mul结果,这也是因为您需要将其分配回来,而filter将会失败

s=df.columns.str.contains('Count')
df.loc[:,s]=df.loc[:,s].mul(df['Normalization Value'],0)
df
Out[238]: 
   Sample  Timepoint         ...          CountC Normalization Value
0       1          1         ...             5.0                 0.1
1       2          1         ...            20.0                 0.2
2       2          2         ...            20.0                 0.5
[3 rows x 8 columns]

You need to filter the columns using .filter() with regex and then modify them and put it back to the main dataframe using .loc . 您需要使用带有regex .filter()过滤列,然后对其进行修改,然后使用.loc将其放回主数据.loc

df.loc[:, df.filter(regex='Count*', axis=1).columns] = df.loc[:, df.filter(regex='Count*', axis=1).columns].multiply(df['Normalization'], axis='index')

Output: 输出:

      Sample  Timepoint  CountA  CountB  PercentA  PercentB  CountC  Normalization
0       1          1     1.0     2.0        40        30     5.0            0.1
1       2          1     4.0     2.0        25        35    20.0            0.2
2       2          2    25.0    10.0        20        22    20.0            0.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM