简体   繁体   English

如何比较两个数据帧列?

[英]How to compare two dataframes columns?

import pandas as pd
import quandl
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use("fivethirtyeight")
df_2010=pd.read_csv("c:/users/ashub/downloads/documents/MLB 2010.csv",index_col=0)
#print(df_2010)
sliced_data=df_2010[["Home Team","Away Team","Home Score","Away Score"]]
#print(sliced_data)
for win in sliced_data:
    flag1=sliced_data["Home Team"]+str("index")
    flag2=sliced_data["Away Team"]+str("index")
    print(sliced_data["Home Score"],sliced_data["Away Score"])
    if sliced_data["Home Score"]>sliced_data["Away Score"]:
        df_2010=df_2010.join([1,0],index=[flag1,flag2])
    else:
        df_2010=df_2010.join([0,1],index=[flag1,flag2])
df_2010.to_html("c:/users/ashub/desktop/ashu.html")

ValueError: The truth value of a Series is ambiguous. ValueError:Series的真值是不明确的。 Use a.empty, a.bool(), a.item(), a.any() or a.all(). 使用a.empty,a.bool(),a.item(),a.any()或a.all()。

The error is at if condition when i am comparing the score of home team and away team.What I want to do is to add a column to the csv file which lists the win or loss of a team,win being 1 and loss being zero so that i can add the win of a particular team in a season and calculate their probability of winning and predict the probability of winning in the next season, 当我比较主队和客队的得分时,错误处于if条件。我想要做的是在csv文件中添加一个列,列出团队的胜负,胜利为1,亏损为零这样我就可以在一个赛季中加入特定球队的胜利并计算他们获胜的概率并预测下一赛季的胜利概率,

You can do just this: 你可以这样做:

df_2010['Win'] = df_2010['Home Score'] > df_2010['Away Score']

You won't need that sliced data frame. 您不需要切片数据框。

Here's a full example: 这是一个完整的例子:

import pandas as pd
import numpy as np

df = pd.DataFrame([np.random.randint(0, 5, 5), 
                   np.random.randint(0, 5, 5)], 
                  index=['Home Score', 'Away Score']).T

print(df)

df['Win'] = df['Home Score'] > df['Away Score']

print(df)

Which will add to 哪个会增加

   Home Score  Away Score
0           3           3
1           4           2
2           4           1
3           4           4
4           4           2

an additional column win like this: 一个额外的列win像这样:

   Home Score  Away Score    Win
0           3           3  False
1           4           2   True
2           4           1   True
3           4           4  False
4           4           2   True

I think you can create boolean mask by compare columns and then assign new columns: 我认为您可以通过比较列创建布尔掩码,然后分配新列:

np.random.seed(123)
sliced_data = pd.DataFrame([np.random.randint(0, 5, 5), 
                   np.random.randint(0, 5, 5)], 
                  index=['Home Score', 'Away Score']).T

m = sliced_data['Home Score'] > sliced_data['Away Score']


sliced_data['Away Team index'] = (~m).astype(int)
sliced_data['Home Team index'] = m.astype(int)

print(sliced_data)
   Home Score  Away Score  Away Team index  Home Team index
0           2           2                1                0
1           4           3                0                1
2           2           1                0                1
3           1           1                1                0
4           3           0                0                1

It is same as: 它与:

sliced_data['Away Team index'] = np.where(m, 0,1)
sliced_data['Home Team index'] = np.where(m, 1,0)

print(sliced_data)
   Home Score  Away Score  Away Team index  Home Team index
0           2           2                1                0
1           4           3                0                1
2           2           1                0                1
3           1           1                1                0
4           3           0                0                1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM