简体   繁体   English

Pandas python中的数据框连接

[英]Data frame join in pandas python

I am trying to perform outer join But it Constantly giving errors as Follows... I also use neither_df = chunk.set_index('author').join(both_authors.set_index, on='author', how='outer', lsuffix='_left', rsuffix='_right') it give output columns for neither as [index,author,body,subreddit,subreddit_id,score] but it does not produce column author_right in neither df my required columns for neither are [author,author_left,body,subreddit,subreddit_id,score,author_right]我正在尝试执行外连接但它不断给出错误如下......我也使用了 none_df = chunk.set_index('author').join(both_authors.set_index, on='author', how='outer', lsuffix ='_left', rsuffix='_right') 它为两者都提供输出列 [index,author,body,subreddit,subreddit_id,score] 但它不会在两个 df 中产生列 author_right 我需要的列都不是 [author, author_left,body,subreddit,subreddit_id,score,author_right]

   chunk = chunk.astype(object)
   chunk.author=chunk.author.astype(object)
   chunk.info()
   both_authors =both_authors.astype(object)
   both_authors.info()
   neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right') 

And Even all my datatypes Are Object it again giving error甚至我所有的数据类型都是对象它再次给出错误

RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   author        10000 non-null  object
 1   body          10000 non-null  object
 2   subreddit     10000 non-null  object
 3   subreddit_id  10000 non-null  object
 4   score         10000 non-null  object
dtypes: object(5)
memory usage: 390.8+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10017 entries, 0 to 13410
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   author  10017 non-null  object
dtypes: object(1)
memory usage: 156.5+ KB





---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-108f6e06d14a> in <module>
     30     both_authors =both_authors.astype(object)
     31     both_authors.info()
---> 32     neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')
     33     neither_df = neither_df[neither_df['author_right'].isnull()]
     34     if neither_record_count < 10000 and not neither_df.empty:

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
   7207         """
   7208         return self._join_compat(
-> 7209             other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
   7210         )
   7211 

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
   7230                 right_index=True,
   7231                 suffixes=(lsuffix, rsuffix),
-> 7232                 sort=sort,
   7233             )
   7234         else:

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     84         copy=copy,
     85         indicator=indicator,
---> 86         validate=validate,
     87     )
     88     return op.get_result()

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    629         # validate the merge keys dtypes. We may need to coerce
    630         # to avoid incompat dtypes
--> 631         self._maybe_coerce_merge_keys()
    632 
    633         # If argument passed to validate,

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in _maybe_coerce_merge_keys(self)
   1144                     inferred_right in string_types and inferred_left not in string_types
   1145                 ):
-> 1146                     raise ValueError(msg)
   1147 
   1148             # datetimelikes must match exactly

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

你应该使用pd.concat([key1,key2], axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM