The Question is: Based on the user_id column, I want to get the values of rating and product_id columns. There can be multiple entries with the same user_id in the same file and other files too. Following is the table with some data provided from the first file.
| product_id | user_id | user_name | rating |
|-------------|-----------------|----------------------------------------------|--------|
| B0009XRZ92 | A2JFZLAUG3YFQ7 | Entropy Babe "EB" | 5 |
| B0009XRZ92 | A22HGAAO8KZ2N3 | R. Metzelar | 5 |
| B000067A8B | A2NJO6YE954DBH | Lawrance M. Bernabo | 4 |
| B0009XRZ92 | A3HE4MYMWK4AER | Rebecca M. Eddy "Foster Mom and Untbunny" | 5 |
| B003A3R3ZY | A9A2PR663ED1V | Roger D. Goff | 5 |
| B0009XRZ92 | A2MRZDJF90JC1U | Suzanne K. Armstrong "Suzy Q" | 5 |
| B0009XRZ92 | A2YNBDT3170PCR | C. O'Hern | 5 |
| B0009XRZ92 | A10VJ7BDVCPKEZ | Carol S. Bottom | 5 |
| B0009XRZ92 | AAAQO894MG80B | Paul J. Michko | 5 |
| B00067BBQE | A9A2PR663ED1V | Roger D. Goff | 5 |
| B0009XRZ92 | A31S5QUMFR8NH2 | Dana L. Jordan "Mom of Twins" | 5 |
| B0009XRZ92 | A2DS24DHXUH0GM | Gaz Rev(iewer) | 4 |
| B00006AUMZ | A2NJO6YE954DBH | Lawrance M. Bernabo | 4 |
| B0009XRZ92 | A16FRHL2ZC7EUR | M. Claytor | 5 |
| B0009XRZ92 | A3AV8R0A62PP1N | MARCUSHELBLINZ "mmmacman" | 5 |
| B0009XRZ92 | A3QN84C38DE9FU | Gillian M. Kratzer | 5 |
| B0009XRZ92 | A36MLTLVQFEQYL | Yossarian "alienated socialist" | 5 |
| B00006AUMD | A2NJO6YE954DBH | Lawrance M. Bernabo | 4 |
What I want to do is:
To take one user_id only from the first file and display the rating and product_id columns value for that user for all the movies from all the files and if the user didn't rate some movies then the record should be displayed with the product_id value and rating as Nan and the whole process should be repeated for all the users in the first file only.
By using the pivot_table
import pandas as pd
df = pd.read_csv('LCM1.csv')
df_new=df.pivot_table(index='user_id',columns='product_id',values='rating').rename_axis(None,1)
print(df_new)
The result will be the following:
B000067A8B B00006AUMD B00006AUMZ B00067BBQE \
user_id
A10VJ7BDVCPKEZ NaN NaN NaN NaN
A16FRHL2ZC7EUR NaN NaN NaN NaN
A2DS24DHXUH0GM NaN NaN NaN NaN
A2NJO6YE954DBH 4.0 4.0 4.0 NaN
A2YNBDT3170PCR NaN NaN NaN NaN
A36MLTLVQFEQYL NaN NaN NaN NaN
A3HE4MYMWK4AER NaN NaN NaN NaN
A3QN84C38DE9FU NaN NaN NaN NaN
AAAQO894MG80B NaN NaN NaN NaN
A22HGAAO8KZ2N3 NaN NaN NaN NaN
A2JFZLAUG3YFQ7 NaN NaN NaN NaN
A2MRZDJF90JC1U NaN NaN NaN NaN
A31S5QUMFR8NH2 NaN NaN NaN NaN
A3AV8R0A62PP1N NaN NaN NaN NaN
A9A2PR663ED1V NaN NaN NaN 5.0
B0009XRZ92 B003A3R3ZY
user_id
A10VJ7BDVCPKEZ 5.0 NaN
A16FRHL2ZC7EUR 5.0 NaN
A2DS24DHXUH0GM 4.0 NaN
A2NJO6YE954DBH NaN NaN
A2YNBDT3170PCR 5.0 NaN
A36MLTLVQFEQYL 5.0 NaN
A3HE4MYMWK4AER 5.0 NaN
A3QN84C38DE9FU 5.0 NaN
AAAQO894MG80B 5.0 NaN
A22HGAAO8KZ2N3 5.0 NaN
A2JFZLAUG3YFQ7 5.0 NaN
A2MRZDJF90JC1U 5.0 NaN
A31S5QUMFR8NH2 5.0 NaN
A3AV8R0A62PP1N 5.0 NaN
A9A2PR663ED1V NaN 5.0
But What I want to do is to take user_id values from the only first file and search for
product_id
andrating
values in all files against thatuser_id
.
Hopefully, you've got my question and if any problem in understanding please comment below. Thanks
Check if this meets your requirement.
data1 = pd.read_csv("user.txt", sep="|")
data2 = pd.read_csv("file2.csv")
# Merge on user_id and product_id
masterDf = data1.merge(data2, how='inner', on=["user_id","product_id"])
masterDf['rating'] = masterDf.rating.astype(str).astype(int)
df_new=data.pivot_table(index='user_id',columns='product_id',values='rating').rename_axis(None,1)
df_new
The output will be:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.