列追加对于熊猫很麻烦

Question

Here is what I have tried and what error I received: 这是我尝试过的以及收到的错误：

>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
>>> df
   A  B  C  D
0  1  5  0  1
1  2  4  0  1
2  3  3  0  1
3  4  2  0  1
4  5  1  0  1
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
>>> first = [2,2,2,2,2,2,2,2,2,2,2,2]
>>> first = pd.DataFrame(first).T
>>> first.index = [2]
>>> df = df.join(first)
>>> df
   A  B  C  D    0    1    2    3    4    5    6    7    8    9   10   11
0  1  5  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
2  3  3  0  1  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
3  4  2  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
>>> second = [3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
>>> second = pd.DataFrame(second).T
>>> second.index = [1]
>>> df = df.join(second)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 6815, in join
    rsuffix=rsuffix, sort=sort)
  File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 6830, in _join_compat
    suffixes=(lsuffix, rsuffix), sort=sort)
  File "C:\Python35\lib\site-packages\pandas\core\reshape\merge.py", line 48, in merge
    return op.get_result()
  File "C:\Python35\lib\site-packages\pandas\core\reshape\merge.py", line 552, in get_result
    rdata.items, rsuf)
  File "C:\Python35\lib\site-packages\pandas\core\internals\managers.py", line 1972, in items_overlap_with_suffix
    '{rename}'.format(rename=to_rename))
ValueError: columns overlap but no suffix specified: Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='object')

I am trying to create new list with the extra columns which I have to add at specific indexes of the main dataframe df . 我正在尝试使用必须在主数据帧df特定索引处添加的额外列创建新列表。
When i tried the first it worked and you can see the output. 当我第first尝试它时，您可以看到输出。 But when I tried the same way with second I received the above mentioned error. 但是当我second尝试相同的方式时，收到了上述错误。

Kindly, let me know what I can do in this situation and achieve the goal I am expecting. 请让我知道在这种情况下可以做些什么并达到我期望的目标。

Answer 1

Use DataFrame.combine_first instead join if need assign to same columns created before, last DataFrame.reindex by list of columns for expected ordering: 使用DataFrame.combine_first而不是join ，如果需要分配到之前创建相同的列，最后DataFrame.reindex由预期排序列的列表：

df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
orig = df.columns.tolist()

first = [2,2,2,2,2,2,2,2,2,2,2,2]
first = pd.DataFrame(first).T
first.index = [2]
df = df.combine_first(first)

second = [3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
second = pd.DataFrame(second).T
second.index = [1]
df = df.combine_first(second)

df = df.reindex(orig + first.columns.tolist(), axis=1)
print (df)
   A  B  C  D    0    1    2    3    4    5    6    7    8    9   10   11
0  1  5  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0
2  3  3  0  1  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
3  4  2  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Answer 2

Yes this is expected behaviour because join works much like an SQL join, meaning that it will join on the provided index and concatenate all the columns together. 是的，这是预期的行为，因为联接的工作方式与SQL联接非常相似，这意味着它将在提供的索引上联接并将所有列连接在一起。 The problem arises from the fact that pandas does not accept two columns to have the same name. 问题来自于以下事实：熊猫不接受两列具有相同的名称。 Hence, if you have 2 columns in each dataframe with the same name, it will first look for a suffix to add to those columns to avoid name clashes. 因此，如果每个数据框中有2个具有相同名称的列，它将首先查找要添加到这些列的后缀，以避免名称冲突。 This is controlled with the lsuffix and rsuffix arguments in the join method. 这由join方法中的lsuffix和rsuffix参数控制。

Conclusion: 2 ways to solve this: 结论：有两种解决方法：

Either provide a suffix so that pandas is able to resolve the name clashes; 请提供一个后缀，以便熊猫能够解决名称冲突。 or 要么
Make sure that you don't have overlapping columns 确保没有重叠的列

Answer 3

You have to specify the suffixes since the column names are the same. 由于列名相同，因此必须指定suffixes 。 Assuming you are trying to add the second values as new columns horizontally: 假设您尝试将second值水平添加为新列：

df = df.join(second, lsuffix='first', rsuffix='second')

   A  B  C  D  0first  1first  2first  3first  4first  5first  ...  10second  11second   12   13   14   15   16   17   18   19
0  1  5  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       3.0       3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0
2  3  3  0  1     2.0     2.0     2.0     2.0     2.0     2.0  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
3  4  2  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

列追加对于熊猫很麻烦

问题描述

3 个解决方案

解决方案1
5 已采纳 2019-05-29 11:59:43

解决方案2
3 2019-05-29 11:58:54

解决方案3
3 2019-05-29 11:59:12

列追加对于熊猫很麻烦

问题描述

3 个解决方案

解决方案1 5 已采纳 2019-05-29 11:59:43

解决方案2 3 2019-05-29 11:58:54

解决方案3 3 2019-05-29 11:59:12

解决方案1
5 已采纳 2019-05-29 11:59:43

解决方案2
3 2019-05-29 11:58:54

解决方案3
3 2019-05-29 11:59:12