简体   繁体   English

Dataframe 不同大小但列数无差异

[英]Dataframe of different size but no difference in columns

I am realizing an XG Boost model. I did my train-test split on a dataframe having 91 columns.我正在实现 XG Boost model。我在具有 91 列的 dataframe 上进行了训练测试拆分。 I want to use my model on a new dataframe which have different columns than my training set.我想在新的 dataframe 上使用我的 model,它的列与我的训练集不同。 I have removed the extra columns and added the ones which were present in the train dataset and not the new one.我删除了多余的列并添加了火车数据集中存在的列,而不是新列。

在此处输入图像描述

However, I cannot use the models because the new set does not have the same number of columns but when I am computing the list of the differences in columns the list is empty.但是,我无法使用这些模型,因为新集合没有相同数量的列,但是当我计算列差异列表时,该列表为空。

在此处输入图像描述

Do you have an idea of how I could correct this problem?您知道我该如何解决这个问题吗?

Thanks in advance for your time !在此先感谢您的时间 !

You can try like this:你可以这样尝试:

import pandas as pd

X_PAU = pd.DataFrame({'test1': ['A', 'A'], 'test2': [0, 0]})
print(len( X_PAU.columns ))
X = pd.DataFrame({'test1': ['A', 'A']})
print(len( X.columns ))

# Your implimentation
print(set(X.columns) - set(X_PAU.columns)) #This should be empty set

#
print(X_PAU.columns.difference(X.columns).tolist()) # this will print the missing column name
print(len(X_PAU.columns.difference(X.columns).tolist())) # this will print the difference number

Output: Output:

2
1
set()
['test2']
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM