简体   繁体   English

Pandas 创建新列(问题)

[英]Pandas creating new column (Issue)

From Kaggle I downloaded some data about LaLiga (and its results) throughout the years.多年来,我从 Kaggle 下载了一些关于 LaLiga(及其结果)的数据。 There´sa column named 'Score' (for instance: 1-1) and I want to create a new column called 'Total Goals'.有一个名为“得分”的列(例如:1-1),我想创建一个名为“总目标”的新列。 As I try to apply my beginner pandas skills, the only thing I manage to pop up is a column with the value 'NaN'当我尝试应用我的初学者 pandas 技能时,我唯一设法弹出的是一个值为“NaN”的列

df['Total Goals'] = df['Score'].apply(lambda x: x.split('-'))

df['Total Goals'] = pd.to_numeric(df['Total Goals'], errors='coerce')
df['Total Goals'] = df['Total Goals'][0]+ df['Total Goals'][1]
df.head()

Unfortunately I couldn´t figure out the exact issue and now want to ask you where the problem is.不幸的是,我无法弄清楚确切的问题,现在想问你问题出在哪里。

If I assume your source data looks a bit like this:如果我假设您的源数据看起来有点像这样:

df = pd.DataFrame({'Score':['1-1','0-1','2-2']})

then you can simply do:那么你可以简单地做:

df['Total Goals'] = df['Score'].apply(lambda x: sum( [int(y) for y in x.split('-')] ))

This iterates over the list items created by the split using a list comprehension before summing them.这在对它们求和之前使用列表推导对拆分创建的列表项进行迭代。 Result:结果:

    Score   Total Goals
0     1-1             2
1     0-1             1
2     2-2             4

For very large datasets the apply() may be too slow, in that case it may be better to capture both values into separate columns before summing them (starting from the same dataset above):对于非常大的数据集,apply() 可能太慢了,在这种情况下,最好在对它们求和之前将两个值捕获到单独的列中(从上面的相同数据集开始):

df[['sc1','sc2']] = df['Score'].str.split('-', 1, expand=True).astype(int)
df['Total Goals'] = df.sc1+df.sc2

In the best Pandas style, this can be magically combined, going straight to the desired result:在最好的 Pandas 风格中,这可以神奇地结合起来,直接达到预期的结果:

df['Total Goals'] = df['Score'].str.split('-', 1, expand=True).astype(int).sum(axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM