简体   繁体   English

将值从一个csv文件匹配到另一个,并使用pandas / python替换整个列

[英]Matching values from one csv file to another and replace entire column using pandas/python

Consider the following example: 请考虑以下示例:

I have a dataset of Movielens- 我有一个Movielens的数据集 -

u.item.csv u.item.csv

ID|MOVIE NAME (YEAR)|REL.DATE|NULL|IMDB LINK|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0

Seperator used here is Pipe, which is still manageable. 这里使用的分离器是Pipe,它仍然可以管理。

training_data.csv training_data.csv

,user_id,movie_id,rating,unix_timestamp
0,1,1,5,874965758
1,1,2,3,876893171
2,1,3,4,878542960

Since I need to show the Movie names in "Training_data", instead of "movie id" I need to match every ID of u.item.csv with movie_id with training_data.csv and then replace it. 因为我需要表现出“Training_data”的电影名称,而不是“电影ID”我需要u.item.csv的每一个ID与training_data.csv movie_id匹配,然后替换它。

I'm using Python Pandas, and The training data was converted from Sframe to Dataframe to CSV. 我正在使用Python Pandas,并且训练数据已从Sframe转换为Dataframe为CSV。 So that I could acquire the required change, which is yet unsuccessful. 这样我就可以获得所需的更改,但这种更改尚未成功。 I can surely use some looping structure, but matching and replacing is real challenge I face. 我当然可以使用一些循环结构,但匹配和替换是我面临的真正挑战。

Ps I know Training data will be in sequence per user and will produce the exact output if replaced, but I need to learn this so that when I recommend movies I need MOVIE Names to displayed and not IDs. Ps我知道训练数据将按用户顺序排列,并且如果被替换将产生确切的输出,但我需要学习这一点,以便当我推荐电影时我需要MOVIE名称来显示而不是ID。

I've already tried 我已经试过了

  1. THIS (pandas-python-replace-multiple-values-in-multiple-columns) - But can cost a lot of time when I have 100K values in Dataset 这个(pandas-python-replace-multiple-values-in-multiple-columns) - 但是当我在Dataset中有100K值时可能会花费很多时间
  2. THIS (pandas-replace-multiple-values-one-column) - Matching values not explained 这(pandas-replace-multiple-values-one-column) - 未解释的匹配值
  3. THIS (pandas-replacing-column-values) - Manual entries are done 这(pandas-replacement-column-values) - 完成手动输入

I think you need map by Series created by set_index : 我想你需要set_index创建的Series map

print (df1.set_index('ID')['MOVIE NAME (YEAR)'])
ID
1     Toy Story (1995)
2     GoldenEye (1995)
3    Four Rooms (1995)
Name: MOVIE NAME (YEAR), dtype: object

df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id           movie_id  rating  unix_timestamp
0        1   Toy Story (1995)       5       874965758
1        1   GoldenEye (1995)       3       876893171
2        1  Four Rooms (1995)       4       878542960

Or use replace : 或者使用replace

df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id           movie_id  rating  unix_timestamp
0        1   Toy Story (1995)       5       874965758
1        1   GoldenEye (1995)       3       876893171
2        1  Four Rooms (1995)       4       878542960

Difference is if not match, map create NaN and replace let original value: 差异如果不匹配, map创建NaN和替换,让原来的价值:

print (df2)
   user_id  movie_id  rating  unix_timestamp
0        1         1       5       874965758
1        1         2       3       876893171
2        1         5       4       878542960 <- 5 not match

df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id          movie_id  rating  unix_timestamp
0        1  Toy Story (1995)       5       874965758
1        1  GoldenEye (1995)       3       876893171
2        1               NaN       4       878542960

df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
   user_id          movie_id  rating  unix_timestamp
0        1  Toy Story (1995)       5       874965758
1        1  GoldenEye (1995)       3       876893171
2        1                 5       4       878542960

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用另一个文件中的匹配值替换使用 pandas 的列? - How to replace a column using pandas with the matching value from another file? 使用Python将列从一个.csv添加到另一个.csv文件 - Add column from one .csv to another .csv file using Python 使用 Python Pandas 从列(csv 文件)中的值中删除逗号 - Removing comma from values in column (csv file) using Python Pandas 使用熊猫中的另一列替换一列中的值的有效方法 - Efficient way to replace values in one column using another column in pandas 用另一列 Pandas DataFrame 替换一列中的值 - Replace values from one column with another column Pandas DataFrame Pandas/Python:使用.replace() 从另一列值替换列值 - Pandas/Python: Replacing column values from another column values using .replace() 使用 Pandas 确定另一个 CSV 文件中是否缺少一个 CSV 文件中的值 - Using Pandas to determine if values from one CSV file are missing in another CSV file 如何用条件 pandas python 替换另一列中的列的值 - how to replace the values of a column from another column with condition pandas python 如何根据 Python 中的匹配值将信息从一个 CSV 文件添加到另一个文件? - How do I add information from one CSV file to another based off of matching values in Python? Python pandas 用模式(同一列 -A)相对于 Pandas 数据帧中的另一列替换一列(A)的 NaN 值 - Python pandas replace NaN values of one column(A) by mode (of same column -A) with respect to another column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM