Python-如何通过使用循环使此重复代码更短？

Question

I'm very new to Python. 我是Python的新手。 This following is an example of my data: 以下是我的数据示例：

Category    May  June  July
Product1    32   41    43
Product2    74   65    65
Product3    17   15    18
Product4    14   13    14

I have many sets of data and I'd like to calculate Chi-square for each set. 我有很多数据集，我想为每组计算卡方。 The code is as follow: 代码如下：

Product1 = [32,41,43]
chi2, p = scipy.stats.chisquare(Product1)
print('Product1')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product2 = [74,65,65]
chi2, p = scipy.stats.chisquare(Product2)
print('Product2')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product3 = [17,15,18]
chi2, p = scipy.stats.chisquare(Product3)
print('Product3')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product4 = [14,13,14]
chi2, p = scipy.stats.chisquare(Product4)
print('Prokduct4')
if p > 0.05:
    print('Same')
else:
    print('Different')

I used "df = pd.read_excel" to insert the data table and it comes with index and I don't know how to call each row to calculate. 我使用“ df = pd.read_excel”插入数据表，它带有索引，但我不知道如何调用每一行进行计算。

How can I make this repetitive code shorter by using loop and pull the data from the table? 如何通过使用循环并从表中提取数据来缩短此重复代码？ Thank you so much for your help. 非常感谢你的帮助。

Answer 1

You could use a loop to repeat the steps above, but you might as well leverage scipy 's ability to deal with pandas dataframes! 您可以使用循环来重复上述步骤，但是您也可以利用scipy处理pandas数据帧的能力！ You can apply the chisquare test over all rows of a dataframe using axis=1 . 您可以使用axis=1将chisquare检验应用于数据chisquare所有行。 For exmample: 例如：

from scipy.stats import chisquare

df['p'] = chisquare(df[['May', 'June', 'July']], axis=1)[1]

df['same_diff'] = np.where(df['p'] > 0.05, 'same', 'different')

>>> df
   Category  May  June  July         p same_diff
0  Product1   32    41    43  0.411506      same
1  Product2   74    65    65  0.672294      same
2  Product3   17    15    18  0.869358      same
3  Product4   14    13    14  0.975905      same

Now your dataframe has your p values as a column, and whether they are "same" or "different" as a column 现在，您的数据框将p值作为一列，将它们的“相同”或“不同”作为一列

Answer 2

I will start after the data is loaded into pandas data frame: 我将数据加载到pandas数据框中后开始：

Then, you can do: 然后，您可以执行以下操作：

for row in df.iterrows():
    product = row[1][0]
    chi, p = scipy.stats.chisquare(row[1][1:])
    print(product, ":", "same" if p > 0.05 else "different")

This will print: 这将打印：

Product1 : same
Product2 : same
Product3 : same
Product4 : same

Python-如何通过使用循环使此重复代码更短？

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-07-27 15:36:17

解决方案2
1 2018-07-27 15:36:20

Python-如何通过使用循环使此重复代码更短？

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-07-27 15:36:17

解决方案2 1 2018-07-27 15:36:20

解决方案1
3 已采纳 2018-07-27 15:36:17

解决方案2
1 2018-07-27 15:36:20