简体   繁体   中英

How to read data from multiple csv files for user_ids that exists only in first file and create one Pivot table

The Question is: Based on the user_id column, I want to get the values of rating and product_id columns. There can be multiple entries with the same user_id in the same file and other files too. Following is the table with some data provided from the first file.

| product_id  | user_id         | user_name                                    | rating |
|-------------|-----------------|----------------------------------------------|--------|
|  B0009XRZ92 | A2JFZLAUG3YFQ7  |  Entropy Babe "EB"                           | 5      |
|  B0009XRZ92 | A22HGAAO8KZ2N3  |  R.   Metzelar                               | 5      |
|  B000067A8B |  A2NJO6YE954DBH |  Lawrance M. Bernabo                         | 4      |
|  B0009XRZ92 |  A3HE4MYMWK4AER |  Rebecca M. Eddy "Foster Mom and   Untbunny" | 5      |
|  B003A3R3ZY | A9A2PR663ED1V   |  Roger D. Goff                               | 5      |
|  B0009XRZ92 | A2MRZDJF90JC1U  |  Suzanne K. Armstrong "Suzy Q"               | 5      |
|  B0009XRZ92 |  A2YNBDT3170PCR |  C.   O'Hern                                 | 5      |
|  B0009XRZ92 |  A10VJ7BDVCPKEZ |  Carol S. Bottom                             | 5      |
|  B0009XRZ92 |  AAAQO894MG80B  |  Paul J. Michko                              | 5      |
|  B00067BBQE | A9A2PR663ED1V   |  Roger D. Goff                               | 5      |
|  B0009XRZ92 | A31S5QUMFR8NH2  |  Dana L. Jordan "Mom of Twins"               | 5      |
|  B0009XRZ92 |  A2DS24DHXUH0GM |  Gaz    Rev(iewer)                           | 4      |
|  B00006AUMZ |  A2NJO6YE954DBH |  Lawrance M. Bernabo                         | 4      |
|  B0009XRZ92 |  A16FRHL2ZC7EUR |  M.   Claytor                                | 5      |
|  B0009XRZ92 | A3AV8R0A62PP1N  |  MARCUSHELBLINZ "mmmacman"                   | 5      |
|  B0009XRZ92 |  A3QN84C38DE9FU |  Gillian M. Kratzer                          | 5      |
|  B0009XRZ92 |  A36MLTLVQFEQYL |  Yossarian "alienated socialist"             | 5      |
|  B00006AUMD |  A2NJO6YE954DBH |  Lawrance M. Bernabo                         | 4      |

What I want to do is:

To take one user_id only from the first file and display the rating and product_id columns value for that user for all the movies from all the files and if the user didn't rate some movies then the record should be displayed with the product_id value and rating as Nan and the whole process should be repeated for all the users in the first file only.

By using the pivot_table

import pandas as pd
df = pd.read_csv('LCM1.csv')
df_new=df.pivot_table(index='user_id',columns='product_id',values='rating').rename_axis(None,1)
print(df_new)

The result will be the following:
                     B000067A8B     B00006AUMD     B00006AUMZ     B00067BBQE   \
user_id                                                                         
  A10VJ7BDVCPKEZ             NaN            NaN            NaN            NaN   
  A16FRHL2ZC7EUR             NaN            NaN            NaN            NaN   
  A2DS24DHXUH0GM             NaN            NaN            NaN            NaN   
  A2NJO6YE954DBH             4.0            4.0            4.0            NaN   
  A2YNBDT3170PCR             NaN            NaN            NaN            NaN   
  A36MLTLVQFEQYL             NaN            NaN            NaN            NaN   
  A3HE4MYMWK4AER             NaN            NaN            NaN            NaN   
  A3QN84C38DE9FU             NaN            NaN            NaN            NaN   
  AAAQO894MG80B              NaN            NaN            NaN            NaN   
 A22HGAAO8KZ2N3              NaN            NaN            NaN            NaN   
 A2JFZLAUG3YFQ7              NaN            NaN            NaN            NaN   
 A2MRZDJF90JC1U              NaN            NaN            NaN            NaN   
 A31S5QUMFR8NH2              NaN            NaN            NaN            NaN   
 A3AV8R0A62PP1N              NaN            NaN            NaN            NaN   
 A9A2PR663ED1V               NaN            NaN            NaN            5.0   

                     B0009XRZ92     B003A3R3ZY   
user_id                                          
  A10VJ7BDVCPKEZ             5.0            NaN  
  A16FRHL2ZC7EUR             5.0            NaN  
  A2DS24DHXUH0GM             4.0            NaN  
  A2NJO6YE954DBH             NaN            NaN  
  A2YNBDT3170PCR             5.0            NaN  
  A36MLTLVQFEQYL             5.0            NaN  
  A3HE4MYMWK4AER             5.0            NaN  
  A3QN84C38DE9FU             5.0            NaN  
  AAAQO894MG80B              5.0            NaN  
 A22HGAAO8KZ2N3              5.0            NaN  
 A2JFZLAUG3YFQ7              5.0            NaN  
 A2MRZDJF90JC1U              5.0            NaN  
 A31S5QUMFR8NH2              5.0            NaN  
 A3AV8R0A62PP1N              5.0            NaN  
 A9A2PR663ED1V               NaN            5.0

But What I want to do is to take user_id values from the only first file and search for product_id and rating values in all files against that user_id .

Hopefully, you've got my question and if any problem in understanding please comment below. Thanks

Check if this meets your requirement.

data1 = pd.read_csv("user.txt", sep="|")
data2 = pd.read_csv("file2.csv")

# Merge on user_id and product_id
masterDf = data1.merge(data2, how='inner', on=["user_id","product_id"])

masterDf['rating'] = masterDf.rating.astype(str).astype(int)
df_new=data.pivot_table(index='user_id',columns='product_id',values='rating').rename_axis(None,1)
df_new

The output will be:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM