如何修改系列以匹配 Pandas dataframe 的索引？

Question

考虑一个系列y （ dtype是float64 ），它有它的索引，例如

y = pd.Series((6.0, 1621.0, 4.6, 1479.9, 1520.0), index=(3608, 3652, 510, 941, 3007))

看起来像：

3608       6.000
3652    1621.000
510        4.600
941     1479.900
3007    1520.000
          ...   
dtype: float64 (length: 554)

有一个 Pandas dataframe X有自己的索引和多个列，例如：

X = pd.DataFrame({'Col1':[1,2,3], 'Col2':[1,2,3]}, index=[510,3007,3652])

看起来像：

         Col1      Col2
510
3007
3652
... (dataframe length/count is 7)

我想修改系列y ，以获得基于 dataframe 索引排序的新系列，并且具有与 dataframe 相同数量的样本（即y中的 7 个索引匹配X ）。 预期y是：

510        4.600
3007    1520.000
3652    1621.000
          ...   
dtype: float64 (length: 7)

对此的任何帮助和建议将不胜感激。

Answer 1

您可以使用Index.intersection方法：

out = y[y.index.intersection(X.index)]

或Index.isin方法：

out = y[y.index.isin(X.index)]

为也存在于X.index中的索引过滤y 。

如果X.index保证是y.index的子集，那么您也可以简单地使用X.index进行过滤：

out = y[X.index]

Output：

3652    1621.0
510        4.6
3007    1520.0
dtype: float64

Answer 2

根据问题，鉴于系列y未命名/无法直接与 dataframe 列名匹配，以下工作：-

通过使用 to_frame( to_frame()将系列y转换为 dataframe 并使用问题评论中@Chris（谢谢X.merge()所建议的 X.merge() - 同时使用说明符来对任一索引执行匹配，我们可以得到修改后的y

modified_y = X.merge(y.to_frame(), left_index=True, right_index=True)

这个y是 dataframe，因此可以使用以下方法转换回系列形式：-

modified_y = pd.Series(y.iloc[:,0].values, index = y.index)

可能有更简单的替代方案，但这就是它对我的工作方式。