I fail to read/insert the first column of csv file, I have already set the names in the csv file, although, if I type as name=['...','...' etc], python will set them again, and I will end up having the names 2 times, I want to implement the data from the csv into names of pd.read_csv.
import pandas as pd
import tkFileDialog
import numpy as np
import warnings
warnings.filterwarnings('ignore')
rating=tkFileDialog.askopenfilename()
df = pd.read_csv(rating, sep='\t')
print df.head()
movies=tkFileDialog.askopenfilename()
movie_titles=pd.read_csv(movies)
print movie_titles.head
df=pd.merge(df,movies,on='movieId')
print df.head()
And the error is:
Traceback (most recent call last):
File "C:/Users/Umer Selmani/Desktop/MP2/test panda.py", line 16, in <module>
df=pd.merge(df,movies,on='movieId')
File "C:\Users\Umer Selmani\Desktop\MP2\venv\lib\site-packages\pandas\core\reshape\merge.py", line 47, in merge
validate=validate)
File "C:\Users\Umer Selmani\Desktop\MP2\venv\lib\site-packages\pandas\core\reshape\merge.py", line 480, in __init__
right = validate_operand(right)
File "C:\Users\Umer Selmani\Desktop\MP2\venv\lib\site-packages\pandas\core\reshape\merge.py", line 1752, in validate_operand
'a {obj} was passed'.format(obj=type(obj)))
TypeError: Can only merge Series or DataFrame objects, a <type 'unicode'> was passed
The following line:
df=pd.merge(df, movies, on='movieId')
Should be:
df=pd.merge(df, movie_titles, on='movieId')
The movies
variable contains a string, not a dataframe.
I am not sure if I understood what you want to do, but as I can see, there are three possible issues there:
df
is incorrectly trying to merge itself; merge
generating duplicated columns (and values); merge
trying to work with unicode
; The first issue is an error. Your variable df
is trying to merge itself to another one ( movie_titles
) but the syntax is not correct.
Try this, instead:
df = df.merge(movie_titles, on='movieId')
The second issue is not a problem: it is default, actually. When you merge two datasets with same column headers, you get header_x
and header_y
.
For instance:
header1_x header2_x header1_y header2_y
0 a f a f
1 b g b g
2 c h c h
3 d i d i
One way of solving it --one that is not going to take you too much thinking-- is dropping the columns you do not want:
df = df[[header1_x, header2_x]]
The third issue is related to unicode
object. It means the header movieId
probably is not encoded correctly.
If it persists after you work on the previous issues, try unicodedata
(see doc ):
import unicodedata
unicodedata.normalize("NFKD", df).encode("ascii',"ignore')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.