Is there a way that I can create a new column in a dataframe by selecting values from different columns from another dataframe based on some conditions in the first dataframe?
My data sets are like this:
df1 = pd.DataFrame(
[['USA', 1992],
['China', 1993],
['Japan', 1994]],
columns = ['Country', 'year'])
scores = pd.DataFrame(
[['USA', 20, 30, 40],
['China', 5, 15, 30],
['Japan', 30, 50, 40],
['Korea', 10, 15, 20],
['France', 10, 12, 15]],
columns = ['Country', 1992, 1993, 1994])
And my desired dataset would be:
df = pd.DataFrame(
[['USA', 1992, 20]
['China', 1993, 15]
['Japan', 1994, 40]],
columns = ['Country', 'year', 'score'])
I have tried using apply with a lambda function but it gives me a
KeyError: ('Country', u'occurred at index Country')
the line that I have tried is:
df1['score'] = df.apply(lambda x: scores[scores['Country'] == x['Country']][x['year']][1])
Thank you in advance!
You can melt the scores
DataFrame and merge it with the original:
scores = pd.melt(scores, id_vars='Country', value_name='score', var_name='year')
df1.merge(scores)
Out:
Country year score
0 USA 1992 20
1 China 1993 15
2 Japan 1994 40
merge
by default merges on common columns. If you want to specify the column names, you can use the on
parameter (ie df1.merge(scores, on=['Country', 'year'])
)
You can use Country
as an index on scores
DataFrame:
scores = scores.set_index(['Country'])
Then you will be able to apply the function get_score
, creating and filling the score
column with the desired value:
def get_score(row):
row['score'] = scores.loc[row['Country'], row['year']]
return row
df = df1.apply(get_score, axis=1)
Which gives you this output:
Country year score
0 USA 1992 20
1 China 1993 15
2 Japan 1994 40
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.