import pandas as pd
dict1 = {'id_game': [112, 113, 114], 'game_name' : ['x','z','y'],'id_category':[1,2,3], 'id_players':[[588,589,590],[589],[588,589]]}
dict2 = {'id_player': [588, 589, 590],'player_name' : ['fff','aaa','ccc'] ,'indication':['mmm x ggg sdg y', 'uuu x fdb y kfnkjq z', 'fffre x']}
game_df = pd.DataFrame(dict1)
player_df = pd.DataFrame(dict2)
Here is my sample of the data that I have, I am looking to find a solution for getting a column contain categories_id in the second dataframe game_df
based on relation between game_df['id_players']
and player_df['id_player']
or game_df['game_name']
and drug_df['indication']
In the following script i used game_name
and indication
values:
new_list = []
for i in range(len(game_df)):
for j in range(len(player_df)):
if game_df['game_name'][i] in player_df['indication'][j]:
new_list.append(game_df['id_category'][i])
print(new_list)
player_df['categories_id'] = new_list
ERROR:
--> 747 raise ValueError(
748 "Length of values "
749 f"({len(data)}) "
ValueError: Length of values (6) does not match length of index (3)
Your code can be fixed by adding break
after print(new_list)
, with the same indentation.
...
if game_df['game_name'][i] in player_df['indication'][j]:
new_list.append(game_df['id_category'][i])
print(new_list)
break
That being said, iterating over dataframes is highly discouraged because it's slow and gets unwieldy very quickly. The canonical way to approach problems like this would be to merge
the dataframes on the id_player(s)
, ie, explode the ids in id_players
into individual rows,
>>> game_df = game_df.explode("id_players").rename(columns={"id_players": "id_player"})
>>> game_df
id_game game_name id_category id_player
0 112 x 1 588
0 112 x 1 589
0 112 x 1 590
1 113 z 2 589
2 114 y 3 588
2 114 y 3 589
so you can .merge
it with the game_df
,
>>> df = game_df.merge(player_df, on="id_player")
>>> df
id_game game_name id_category id_player player_name indication
0 112 x 1 588 fff mmm x ggg sdg y
1 114 y 3 588 fff mmm x ggg sdg y
2 112 x 1 589 aaa uuu x fdb y kfnkjq z
3 113 z 2 589 aaa uuu x fdb y kfnkjq z
4 114 y 3 589 aaa uuu x fdb y kfnkjq z
5 112 x 1 590 ccc fffre x
That will make analyses rather straightforward, like checking if the game_name
is in the indication
becomes
df.apply(lambda row: row.game_name in row.indication, axis=1)
though it returns True for all of them, so I'm not sure if that's actually what you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.