简体   繁体   English

检查 pandas dataframe 中的列编码

[英]Check column encodings in the pandas dataframe

I have a pandas data frame that contains two columns name and name_id .我有一个包含两列namename_id的 pandas 数据框。 Column name contains strings (names of books) and column name_id contains integers.name包含字符串(书名),列name_id包含整数。

I want to make a check that these two columns have direct correspondence and that in fact name_id is an encoding for name .我想检查这两列是否直接对应,实际上name_idname的编码。

The trick is that name_id contains not consecutive numbers.诀窍是name_id不包含连续的数字。

The only thing I came to so far is to make my own (consecutive) encodings for both of the columns and to check the correlation value on those.到目前为止,我唯一要做的就是为两列制作自己的(连续)编码,并检查它们的相关值。 But what is a better/cleaner solution?但是什么是更好/更清洁的解决方案?

If I understand you well, you could replace name id with your current index like this:如果我理解你,你可以用你当前的索引替换name id ,如下所示:

df = (
    df.reset_index(drop=False, inplace=False)
    .drop(columns="name_id")
    .rename(columns={"index": "name_id"})
)

#  Output:
#  name_id  name
#        0  book a
#        1  book b
#        2  book c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM