[英]Need to calculate columns from CSV using pandas
incidentcountlevel1
and examcount
were two column names on CSV file. incidentcountlevel1
examcount
和examcount
是CSV文件上的两个列名称。 I want to calculate two columns based on these. 我想基于这些计算两列。 I have written the script below but it's failing:
我已经在下面编写了脚本,但是失败了:
import pandas as pd
import numpy as np
import time, os, fnmatch, shutil
df = pd.read_csv(r"/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer',skiprows=[1])
df1 = pd.read_csv(r"/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer',skiprows=[1])
df3 = pd.read_csv("/home/corp_sourcing/Metric_Fact_20180324_1227.csv",header='infer',converters={"incidentcountlevel1":int})
inc_count_lvl_1 = df3.loc[:, ['incidentcountlevel1']]
exam_count=df3.loc[:, ['examcount']]
for exam_count in exam_count: #need to iterate this col to calculate for each row
if exam_count < 1:
print "IPTE Cannot be calculated"
else:
if inc_count_lvl_1 > 5:
ipte1= (inc_count_lvl_1/exam_count)*1000
else:
dof = 2*(inc_count_lvl_1+ 1)
chi_square=chi2.ppf(0.5,dof)
ipte1=(chi_square/(2*exam_count))×1000
You can apply lamda function
on pandas column. 您可以在熊猫列上应用
lamda function
。 Just created an example using numpy. 刚刚使用numpy创建了一个示例。 You can change according to your case
您可以根据情况进行更改
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 50]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 1500
or you can create your own function: 或者您可以创建自己的函数:
>>> def fx(x, y):
... return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
A B new_column
0 10 20 200
1 20 30 600
2 30 10 1500
I your case, the solution might look like this. 对于您的情况,解决方案可能如下所示。
df['new_column'] = np.vectorize(fx)(df['examcount'], df['incidentcountlevel1'])
def fx(exam_count,inc_count_lvl_1):
if exam_count < 1:
return -1 ##whatever you want
else:
if inc_count_lvl_1 > 5:
ipte1= (inc_count_lvl_1/exam_count)*1000
else:
dof = 2*(inc_count_lvl_1+ 1)
chi_square=chi2.ppf(0.5,dof)
ipte1=(chi_square/(2*exam_count))×1000
return ipte1
If you dont want to use lamda fucntions
then you can use iterrows
. 如果您不想使用
lamda fucntions
则可以使用iterrows
。 iterrows is a generator which yield both index and row. iterrows是产生索引和行的生成器。
for index, row in df.iterrows():
print row['examcount'], row['incidentcountlevel1']
#do your stuff.
I hope it helps. 希望对您有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.