[英]Python - How can I make this repetitive code shorter by using loop?
I'm very new to Python. 我是Python的新手。 This following is an example of my data: 以下是我的数据示例:
Category May June July
Product1 32 41 43
Product2 74 65 65
Product3 17 15 18
Product4 14 13 14
I have many sets of data and I'd like to calculate Chi-square for each set. 我有很多数据集,我想为每组计算卡方。 The code is as follow: 代码如下:
Product1 = [32,41,43]
chi2, p = scipy.stats.chisquare(Product1)
print('Product1')
if p > 0.05:
print('Same')
else:
print('Different')
Product2 = [74,65,65]
chi2, p = scipy.stats.chisquare(Product2)
print('Product2')
if p > 0.05:
print('Same')
else:
print('Different')
Product3 = [17,15,18]
chi2, p = scipy.stats.chisquare(Product3)
print('Product3')
if p > 0.05:
print('Same')
else:
print('Different')
Product4 = [14,13,14]
chi2, p = scipy.stats.chisquare(Product4)
print('Prokduct4')
if p > 0.05:
print('Same')
else:
print('Different')
I used "df = pd.read_excel" to insert the data table and it comes with index and I don't know how to call each row to calculate. 我使用“ df = pd.read_excel”插入数据表,它带有索引,但我不知道如何调用每一行进行计算。
How can I make this repetitive code shorter by using loop and pull the data from the table? 如何通过使用循环并从表中提取数据来缩短此重复代码? Thank you so much for your help. 非常感谢你的帮助。
You could use a loop to repeat the steps above, but you might as well leverage scipy
's ability to deal with pandas
dataframes! 您可以使用循环来重复上述步骤,但是您也可以利用scipy
处理pandas
数据帧的能力! You can apply the chisquare
test over all rows of a dataframe using axis=1
. 您可以使用axis=1
将chisquare
检验应用于数据chisquare
所有行。 For exmample: 例如:
from scipy.stats import chisquare
df['p'] = chisquare(df[['May', 'June', 'July']], axis=1)[1]
df['same_diff'] = np.where(df['p'] > 0.05, 'same', 'different')
>>> df
Category May June July p same_diff
0 Product1 32 41 43 0.411506 same
1 Product2 74 65 65 0.672294 same
2 Product3 17 15 18 0.869358 same
3 Product4 14 13 14 0.975905 same
Now your dataframe has your p
values as a column, and whether they are "same" or "different" as a column 现在,您的数据框将p
值作为一列,将它们的“相同”或“不同”作为一列
I will start after the data is loaded into pandas data frame: 我将数据加载到pandas数据框中后开始:
Then, you can do: 然后,您可以执行以下操作:
for row in df.iterrows():
product = row[1][0]
chi, p = scipy.stats.chisquare(row[1][1:])
print(product, ":", "same" if p > 0.05 else "different")
This will print: 这将打印:
Product1 : same
Product2 : same
Product3 : same
Product4 : same
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.