将系列作为新列添加到 pandas dataframe 时缺少行

Question

I have two series, s1 and s2, defined through:我有两个系列，s1 和 s2，通过以下方式定义：

s1 = pd.Series({1: 10, 2: 20, 3: 30, 4: 40}, name='s1')
s2 = pd.Series({3: 35, 4: 45, 5: 55, 6: 65}, name='s2')

They look like this:它们看起来像这样：

1    10
2    20
3    30
4    40
Name: s1, dtype: int64

3    35
4    45
5    55
6    65
Name: s2, dtype: int64

I am trying to create a dataframe that will have s1 and s2 as two columns, with the index a combination of the two series indexes.我正在尝试创建一个 dataframe ，它将 s1 和 s2 作为两列，索引是两个系列索引的组合。 But a simple assignment doesn't work:但是一个简单的任务不起作用：

df = pd.DataFrame()
df['s1'] = s1
df['s2'] = s2

The resulting dataframe has the index from s1 but misses the rows from s2 for which the index is not in s1:生成的 dataframe 具有来自 s1 的索引，但错过了来自 s2 的索引不在 s1 中的行：

   s1    s2
1  10   NaN
2  20   NaN
3  30  35.0
4  40  45.0

Why is that?这是为什么？ It seems somewhat counter-intuitive.这似乎有点违反直觉。

Note - a proposed solution is:注意 - 建议的解决方案是：

df = pd.concat((s1, s2), axis=1)

which gives the expected result:这给出了预期的结果：

     s1    s2
1  10.0   NaN
2  20.0   NaN
3  30.0  35.0
4  40.0  45.0
5   NaN  55.0
6   NaN  65.0

But I am nevertheless curious why a simple column assignment doesn't work.但我仍然很好奇为什么简单的列分配不起作用。

Answer 1

Its because its matching on index, and s2 starts at 3这是因为它在索引上匹配，并且 s2 从 3 开始


df = pd.DataFrame()
df['s1'] = s1
df['s2'] = s2

This is setting the shape based on s1, then matching s2 data to the s1 df shape.这是根据 s1 设置形状，然后将 s2 数据匹配到 s1 df 形状。

Answer 2

This is because after you assign a Series to a column of an empty dataframe, the index of dataframe is created align with the Series.这是因为在将 Series 分配给空的 dataframe 的列之后，dataframe 的索引将与 Series 对齐。

Then you assign another Series to a new column of the one-column dataframe, only indexes in dataframe will be considered.然后将另一个系列分配给单列 dataframe 的新列，仅考虑 dataframe 中的索引。

If you try to assign s2 before s1如果您尝试在s1之前分配s2

   s2    s1
3  35  30.0
4  45  40.0
5  55   NaN
6  65   NaN

print(df)

   s2    s1
3  35  30.0
4  45  40.0
5  55   NaN
6  65   NaN

Answer 3

Since the 2 series have different indices, assigning either series before the other (whether s1 before s2 or s2 before s1) to the empty dataframe would still cause you missing rows.由于 2 个系列具有不同的索引，因此在另一个系列之前（无论是 s1 在 s2 之前还是 s2 在 s1 之前）分配给空的 dataframe 仍然会导致您丢失行。 This is because the dataframe index of an empty dataframe is automatically set to that of the first series assigned to it.这是因为空 dataframe 的 dataframe 索引自动设置为分配给它的第一个系列的索引。 As a result, when the second series is assigned to the dataframe, it will just take the rows aligning with its current index (just recently set to the index of s1) and ignore rows from the remaining portion of s2 index not common with s1.因此，当将第二个系列分配给 dataframe 时，它将只获取与其当前索引对齐的行（最近设置为 s1 的索引），并忽略 s2 索引的其余部分中与 s1 不常见的行。

There is one remedy to make the 2 statements assigning s1 and s2 to df working as you expect:有一种补救措施可以使将s1和s2分配给df的 2 个语句按您的预期工作：

df = pd.DataFrame(index=s1.index.union(s2.index))

By presetting the dataframe index to be the union of s1.index and s2.index s1.index.union(s2.index) , you will get your desired result:通过将 dataframe 索引预设为 s1.index 和 s2.index s1.index.union(s2.index)的并集，您将获得所需的结果：

df['s1'] = s1
df['s2'] = s2


print(df)

     s1    s2
1  10.0   NaN
2  20.0   NaN
3  30.0  35.0
4  40.0  45.0
5   NaN  55.0
6   NaN  65.0

Breaking down the intermediate steps, you will see interesting result:分解中间步骤，你会看到有趣的结果：

df = pd.DataFrame(index=s1.index.union(s2.index))
df['s1'] = s1


print(df)

     s1
1  10.0
2  20.0
3  30.0
4  40.0
5   NaN
6   NaN

Here, before assigning s2, you can still see index 5 6 (which is part of s2 only) after assigning only s1 and before assigning s2.在这里，在分配 s2 之前，您仍然可以在仅分配 s1 和分配 s2 之前看到索引5 6 （它只是 s2 的一部分）。 The corresponding values for index 5 6 are NaN .索引5 6的对应值为NaN 。 This is because we have already defined the empty dataframe df with index being union of s1 and s2 while s2 has still not yet assigned to it.这是因为我们已经定义了空的 dataframe df ，索引是 s1 和 s2 的并集，而 s2 尚未分配给它。

If you want to only modify the dataframe index on the fly after s1 has been assigned to the empty dataframe which has not set with the index= parameter, you can do it by df.reindex() , as follows:如果您只想在将 s1 分配给未使用index=参数设置的空 dataframe 后即时修改 dataframe 索引，则可以通过df.reindex()进行，如下所示：

df = pd.DataFrame()                            # without the index= parameter
df['s1'] = s1
df = df.reindex(s1.index.union(s2.index))      # Use reindex()



print(df)

     s1    s2
1  10.0   NaN
2  20.0   NaN
3  30.0  35.0
4  40.0  45.0
5   NaN  55.0
6   NaN  65.0

将系列作为新列添加到 pandas dataframe 时缺少行

问题描述

3 个解决方案

解决方案1
1 2021-04-29 14:50:32

解决方案2
0 2021-04-29 14:54:23

解决方案3
0 2021-04-29 16:01:05

将系列作为新列添加到 pandas dataframe 时缺少行

问题描述

3 个解决方案

解决方案1 1 2021-04-29 14:50:32

解决方案2 0 2021-04-29 14:54:23

解决方案3 0 2021-04-29 16:01:05

解决方案1
1 2021-04-29 14:50:32

解决方案2
0 2021-04-29 14:54:23

解决方案3
0 2021-04-29 16:01:05