[英]Data frame join in pandas python
I am trying to perform outer join But it Constantly giving errors as Follows... I also use neither_df = chunk.set_index('author').join(both_authors.set_index, on='author', how='outer', lsuffix='_left', rsuffix='_right') it give output columns for neither as [index,author,body,subreddit,subreddit_id,score] but it does not produce column author_right in neither df my required columns for neither are [author,author_left,body,subreddit,subreddit_id,score,author_right]我正在尝试执行外连接但它不断给出错误如下......我也使用了 none_df = chunk.set_index('author').join(both_authors.set_index, on='author', how='outer', lsuffix ='_left', rsuffix='_right') 它为两者都提供输出列 [index,author,body,subreddit,subreddit_id,score] 但它不会在两个 df 中产生列 author_right 我需要的列都不是 [author, author_left,body,subreddit,subreddit_id,score,author_right]
chunk = chunk.astype(object)
chunk.author=chunk.author.astype(object)
chunk.info()
both_authors =both_authors.astype(object)
both_authors.info()
neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')
And Even all my datatypes Are Object it again giving error甚至我所有的数据类型都是对象它再次给出错误
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 author 10000 non-null object
1 body 10000 non-null object
2 subreddit 10000 non-null object
3 subreddit_id 10000 non-null object
4 score 10000 non-null object
dtypes: object(5)
memory usage: 390.8+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10017 entries, 0 to 13410
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 author 10017 non-null object
dtypes: object(1)
memory usage: 156.5+ KB
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-108f6e06d14a> in <module>
30 both_authors =both_authors.astype(object)
31 both_authors.info()
---> 32 neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')
33 neither_df = neither_df[neither_df['author_right'].isnull()]
34 if neither_record_count < 10000 and not neither_df.empty:
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
7207 """
7208 return self._join_compat(
-> 7209 other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
7210 )
7211
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
7230 right_index=True,
7231 suffixes=(lsuffix, rsuffix),
-> 7232 sort=sort,
7233 )
7234 else:
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
84 copy=copy,
85 indicator=indicator,
---> 86 validate=validate,
87 )
88 return op.get_result()
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
629 # validate the merge keys dtypes. We may need to coerce
630 # to avoid incompat dtypes
--> 631 self._maybe_coerce_merge_keys()
632
633 # If argument passed to validate,
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in _maybe_coerce_merge_keys(self)
1144 inferred_right in string_types and inferred_left not in string_types
1145 ):
-> 1146 raise ValueError(msg)
1147
1148 # datetimelikes must match exactly
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
你应该使用pd.concat([key1,key2], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.