检查 pandas dataframe 中的列编码

Question

I have a pandas data frame that contains two columns name and name_id .我有一个包含两列name和name_id的 pandas 数据框。 Column name contains strings (names of books) and column name_id contains integers.列name包含字符串（书名），列name_id包含整数。

I want to make a check that these two columns have direct correspondence and that in fact name_id is an encoding for name .我想检查这两列是否直接对应，实际上name_id是name的编码。

The trick is that name_id contains not consecutive numbers.诀窍是name_id不包含连续的数字。

The only thing I came to so far is to make my own (consecutive) encodings for both of the columns and to check the correlation value on those.到目前为止，我唯一要做的就是为两列制作自己的（连续）编码，并检查它们的相关值。 But what is a better/cleaner solution?但是什么是更好/更清洁的解决方案？

Answer 1

If I understand you well, you could replace name id with your current index like this:如果我理解你，你可以用你当前的索引替换name id ，如下所示：

df = (
    df.reset_index(drop=False, inplace=False)
    .drop(columns="name_id")
    .rename(columns={"index": "name_id"})
)

#  Output:
#  name_id  name
#        0  book a
#        1  book b
#        2  book c

检查 pandas dataframe 中的列编码

问题描述

1 个解决方案

解决方案1
0 2021-04-01 12:58:29

检查 pandas dataframe 中的列编码

问题描述

1 个解决方案

解决方案1 0 2021-04-01 12:58:29

解决方案1
0 2021-04-01 12:58:29