[英]python- concatenate and append a pandas dataframe in a for loop
I am sorry I didnt really know how to word the title of this question.对不起,我真的不知道如何用这个问题的标题来表达。 I do not work with Python too often and I am just starting to work with the pandas and numpy packages.
我不太经常使用 Python,我刚刚开始使用 pandas 和 numpy 包。
I am getting unexpected results when trying to concatenate and append a pandas dataframe in a for loop.尝试在 for 循环中连接和附加 Pandas 数据帧时,我得到了意想不到的结果。
I have a data set that I got from sql and put into a pandas dataframe ( df ):我有一个从 sql 中获得的数据集,并将其放入 Pandas 数据框( df )中:
print(df.head())
date visitor visitor_score home home_score W L
0 20160405 BOS 6 CLE 2 94 67
1 20160406 BOS 6 CLE 7 94 67
2 20160408 BOS 8 TOR 7 89 73
3 20160409 BOS 8 TOR 4 89 73
4 20160410 BOS 0 TOR 3 89 73
I have another data set from sql that I also put in a pandas data frame ( dfBostonStats ):我有另一个来自 sql 的数据集,我也放入了一个熊猫数据框( dfBostonStats ):
print(dfBostonStats.head())
teamID ab h 2b 3b hr so sb ra er era IPouts HA \
0 BOS 5670 1598 343 25 208 1160 83 694 640 4.0 4319 1342
hra soa e fp bpf ppf dp
0 176 1362 75 0.987 108 106 139
I want to concatenate that data frame ( dfBostonStats ) to each row of the first data frame ( df ).我想将该数据框 ( dfBostonStats ) 连接到第一个数据框 ( df ) 的每一行。
I determined I could use pandas.concat and I proved this through concatenating the first row of df :我确定我可以使用 pandas.concat 并通过连接df的第一行证明了这一点:
print(pd.concat([df.iloc[[0]], dfBostonStats], axis=1))
date visitor visitor_score home home_score W L teamID ab \
0 20160405 BOS 6 CLE 2 94 67 BOS 5670
h ... era IPouts HA hra soa e fp bpf ppf dp
0 1598 ... 4.0 4319 1342 176 1362 75 0.987 108 106 139
I then tried to concatenate each row by using a for loop but it gives me an unexpected result.然后我尝试使用 for 循环连接每一行,但它给了我一个意想不到的结果。 it concatenates one row properly but then prints me a row of just the 2nd dataframe I have listed (dfBostonStats)
它正确连接一行,但随后只打印我列出的第二个数据帧的一行(dfBostonStats)
for index, element in df.iterrows():
tempdf = pd.concat([df.iloc[[index]], dfBostonStats], axis=1)
concatDataFrame = concatDataFrame.append(tempdf, ignore_index=True)
print(concatDataFrame.head())
date visitor visitor_score home home_score W L teamID \
0 20160405 BOS 6.0 CLE 2.0 94.0 67.0 BOS
1 NaN NaN NaN NaN NaN NaN NaN BOS
2 20160406 BOS 6.0 CLE 7.0 94.0 67.0 NaN
3 NaN NaN NaN NaN NaN NaN NaN BOS
4 20160408 BOS 8.0 TOR 7.0 89.0 73.0 NaN
ab h ... era IPouts HA hra soa e fp \
0 5670.0 1598.0 ... 4.0 4319.0 1342.0 176.0 1362.0 75.0 0.987
1 5670.0 1598.0 ... 4.0 4319.0 1342.0 176.0 1362.0 75.0 0.987
2 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
3 5670.0 1598.0 ... 4.0 4319.0 1342.0 176.0 1362.0 75.0 0.987
4 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
bpf ppf dp
0 108.0 106.0 139
1 108.0 106.0 139
2 NaN NaN NaN
3 108.0 106.0 139
4 NaN NaN NaN
I can not figure out why it is printing that row with only dfBostonStats rather then just printing only concatenated rows?我不明白为什么它只用 dfBostonStats 打印那一行,而不是只打印连接的行?
On a side note , I know inside the for loop there is a copy occuring every time causing a performance hit but I figured I would deal with that once I get the data looking how it should.附带说明一下,我知道在 for 循环内每次都会发生一个副本,导致性能下降,但我想一旦我得到数据,我就会处理它应该如何处理。
I think if need join first dataframe by column visitor
and second by column teamID
use merge
with left join.我认为如果需要按列
visitor
加入第一个数据帧,然后按列teamID
加入第二个数据teamID
使用与左连接merge
。 No loop is necessary:不需要循环:
print (df)
date visitor visitor_score home home_score W L
0 20160405 BOS 6 CLE 2 94 67
1 20160406 BOS 6 CLE 7 94 67
2 20160408 AAA 8 TOR 7 89 73
3 20160409 AAA 8 TOR 4 89 73
4 20160410 AAA 0 TOR 3 89 73
print (dfBostonStats)
teamID ab h 2b 3b hr so sb ra er era IPouts HA \
0 BOS 5670 1598 343 25 208 1160 83 694 640 4.0 4319 1342
0 AAA 4 5 6 4 5 1160 83 694 640 4.0 4319 1342
hra soa e fp bpf ppf dp
0 176 1362 75 0.987 10 106 139
0 176 1362 75 0.987 10 106 139
df2 = df.merge(dfBostonStats, left_on='visitor', right_on='teamID', how='left')
print (df2)
date visitor visitor_score home home_score W L teamID ab \
0 20160405 BOS 6 CLE 2 94 67 BOS 5670
1 20160406 BOS 6 CLE 7 94 67 BOS 5670
2 20160408 AAA 8 TOR 7 89 73 AAA 4
3 20160409 AAA 8 TOR 4 89 73 AAA 4
4 20160410 AAA 0 TOR 3 89 73 AAA 4
h ... era IPouts HA hra soa e fp bpf ppf dp
0 1598 ... 4.0 4319 1342 176 1362 75 0.987 10 106 139
1 1598 ... 4.0 4319 1342 176 1362 75 0.987 10 106 139
2 5 ... 4.0 4319 1342 176 1362 75 0.987 10 106 139
3 5 ... 4.0 4319 1342 176 1362 75 0.987 10 106 139
4 5 ... 4.0 4319 1342 176 1362 75 0.987 10 106 139
[5 rows x 27 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.