简体   繁体   English

列追加对于熊猫很麻烦

[英]Columns appending is troublesome with Pandas

Here is what I have tried and what error I received: 这是我尝试过的以及收到的错误:

>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
>>> df
   A  B  C  D
0  1  5  0  1
1  2  4  0  1
2  3  3  0  1
3  4  2  0  1
4  5  1  0  1
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
>>> first = [2,2,2,2,2,2,2,2,2,2,2,2]
>>> first = pd.DataFrame(first).T
>>> first.index = [2]
>>> df = df.join(first)
>>> df
   A  B  C  D    0    1    2    3    4    5    6    7    8    9   10   11
0  1  5  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
2  3  3  0  1  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
3  4  2  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
>>> second = [3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
>>> second = pd.DataFrame(second).T
>>> second.index = [1]
>>> df = df.join(second)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 6815, in join
    rsuffix=rsuffix, sort=sort)
  File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 6830, in _join_compat
    suffixes=(lsuffix, rsuffix), sort=sort)
  File "C:\Python35\lib\site-packages\pandas\core\reshape\merge.py", line 48, in merge
    return op.get_result()
  File "C:\Python35\lib\site-packages\pandas\core\reshape\merge.py", line 552, in get_result
    rdata.items, rsuf)
  File "C:\Python35\lib\site-packages\pandas\core\internals\managers.py", line 1972, in items_overlap_with_suffix
    '{rename}'.format(rename=to_rename))
ValueError: columns overlap but no suffix specified: Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='object')

I am trying to create new list with the extra columns which I have to add at specific indexes of the main dataframe df . 我正在尝试使用必须在主数据帧df特定索引处添加的额外列创建新列表。
When i tried the first it worked and you can see the output. 当我第first尝试它时,您可以看到输出。 But when I tried the same way with second I received the above mentioned error. 但是当我second尝试相同的方式时,收到了上述错误。

Kindly, let me know what I can do in this situation and achieve the goal I am expecting. 请让我知道在这种情况下可以做些什么并达到我期望的目标。

Use DataFrame.combine_first instead join if need assign to same columns created before, last DataFrame.reindex by list of columns for expected ordering: 使用DataFrame.combine_first而不是join ,如果需要分配到之前创建相同的列,最后DataFrame.reindex由预期排序列的列表:

df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
orig = df.columns.tolist()

first = [2,2,2,2,2,2,2,2,2,2,2,2]
first = pd.DataFrame(first).T
first.index = [2]
df = df.combine_first(first)

second = [3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
second = pd.DataFrame(second).T
second.index = [1]
df = df.combine_first(second)

df = df.reindex(orig + first.columns.tolist(), axis=1)
print (df)
   A  B  C  D    0    1    2    3    4    5    6    7    8    9   10   11
0  1  5  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0
2  3  3  0  1  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
3  4  2  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Yes this is expected behaviour because join works much like an SQL join, meaning that it will join on the provided index and concatenate all the columns together. 是的,这是预期的行为,因为联接的工作方式与SQL联接非常相似,这意味着它将在提供的索引上联接并将所有列连接在一起。 The problem arises from the fact that pandas does not accept two columns to have the same name. 问题来自于以下事实:熊猫不接受两列具有相同的名称。 Hence, if you have 2 columns in each dataframe with the same name, it will first look for a suffix to add to those columns to avoid name clashes. 因此,如果每个数据框中有2个具有相同名称的列,它将首先查找要添加到这些列的后缀,以避免名称冲突。 This is controlled with the lsuffix and rsuffix arguments in the join method. 这由join方法中的lsuffixrsuffix参数控制。

Conclusion: 2 ways to solve this: 结论:有两种解决方法:

  • Either provide a suffix so that pandas is able to resolve the name clashes; 请提供一个后缀,以便熊猫能够解决名称冲突。 or 要么
  • Make sure that you don't have overlapping columns 确保没有重叠的列

You have to specify the suffixes since the column names are the same. 由于列名相同,因此必须指定suffixes Assuming you are trying to add the second values as new columns horizontally: 假设您尝试将second值水平添加为新列:

df = df.join(second, lsuffix='first', rsuffix='second')

   A  B  C  D  0first  1first  2first  3first  4first  5first  ...  10second  11second   12   13   14   15   16   17   18   19
0  1  5  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       3.0       3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0
2  3  3  0  1     2.0     2.0     2.0     2.0     2.0     2.0  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
3  4  2  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM