如何在没有KeyError的情况下用df.reindex替换df.loc

Question

我有一个巨大的 dataframe，我从 .csv 文件中得到它。 定义列后，我只想使用我需要的列。 我使用了 Python 3.8.1 版本并且效果很好，尽管引发了“FutureWarning：

将 list-likes 传递给 .loc 或 [] 并缺少任何 label 将在将来引发 KeyError，您可以使用 .reindex() 作为替代方案。”

如果我尝试在 Python 3.10.x 中做同样的事情，我现在会得到一个 KeyError：“['empty'] not in index”

为了得到切片/摆脱我不需要的列，我使用 the.loc function 像这样：

df = df.loc[:, ['laenge','Timestamp', 'Nick']]

如何在没有 KeyError 的情况下使用 .reindex function（或任何其他）获得相同的结果？

谢谢

Answer 1

如果只需要 DataFrame 中存在的列，请使用numpy.intersect1d ：

df = df[np.intersect1d(['laenge','Timestamp', 'Nick'], df.columns)]

同样的 output 如果使用DataFrame.reindex只删除缺失值列：

df = df.reindex(['laenge','Timestamp', 'Nick'], axis=1).dropna(how='all', axis=1)

样本：

df = pd.DataFrame({'laenge': [0,5], 'col': [1,7], 'Nick': [2,8]})

print (df)
   laenge  col  Nick
0       0    1     2
1       5    7     8

df = df[np.intersect1d(['laenge','Timestamp', 'Nick'], df.columns)]
print (df)
   Nick  laenge
0     2       0
1     8       5

Answer 2

使用reindex ：

df = pd.DataFrame({'A': [0], 'B': [1], 'C': [2]})
#    A  B  C
# 0  0  1  2


df.reindex(['A', 'C', 'D'], axis=1)

output：

   A  C   D
0  0  2 NaN

如果您只需要获取公共列，则可以使用Index.intersection ：

cols = ['A', 'C', 'E']
df[df.columns.intersection(cols)]

output：

   A  C
0  0  2

如何在没有KeyError的情况下用df.reindex替换df.loc

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-04-06 07:32:50

解决方案2
0 2022-04-06 07:34:05

如何在没有KeyError的情况下用df.reindex替换df.loc

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-04-06 07:32:50

解决方案2 0 2022-04-06 07:34:05

解决方案1
0 已采纳 2022-04-06 07:32:50

解决方案2
0 2022-04-06 07:34:05