[英]Check column encodings in the pandas dataframe
I have a pandas data frame that contains two columns name
and name_id
.我有一个包含两列name
和name_id
的 pandas 数据框。 Column name
contains strings (names of books) and column name_id
contains integers.列name
包含字符串(书名),列name_id
包含整数。
I want to make a check that these two columns have direct correspondence and that in fact name_id
is an encoding for name
.我想检查这两列是否直接对应,实际上name_id
是name
的编码。
The trick is that name_id
contains not consecutive numbers.诀窍是name_id
不包含连续的数字。
The only thing I came to so far is to make my own (consecutive) encodings for both of the columns and to check the correlation value on those.到目前为止,我唯一要做的就是为两列制作自己的(连续)编码,并检查它们的相关值。 But what is a better/cleaner solution?但是什么是更好/更清洁的解决方案?
If I understand you well, you could replace name id
with your current index like this:如果我理解你,你可以用你当前的索引替换name id
,如下所示:
df = (
df.reset_index(drop=False, inplace=False)
.drop(columns="name_id")
.rename(columns={"index": "name_id"})
)
# Output:
# name_id name
# 0 book a
# 1 book b
# 2 book c
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.