[英]Dataframe of different size but no difference in columns
I am realizing an XG Boost model. I did my train-test split on a dataframe having 91 columns.我正在实现 XG Boost model。我在具有 91 列的 dataframe 上进行了训练测试拆分。 I want to use my model on a new dataframe which have different columns than my training set.我想在新的 dataframe 上使用我的 model,它的列与我的训练集不同。 I have removed the extra columns and added the ones which were present in the train dataset and not the new one.我删除了多余的列并添加了火车数据集中存在的列,而不是新列。
However, I cannot use the models because the new set does not have the same number of columns but when I am computing the list of the differences in columns the list is empty.但是,我无法使用这些模型,因为新集合没有相同数量的列,但是当我计算列差异列表时,该列表为空。
Do you have an idea of how I could correct this problem?您知道我该如何解决这个问题吗?
Thanks in advance for your time !在此先感谢您的时间 !
You can try like this:你可以这样尝试:
import pandas as pd
X_PAU = pd.DataFrame({'test1': ['A', 'A'], 'test2': [0, 0]})
print(len( X_PAU.columns ))
X = pd.DataFrame({'test1': ['A', 'A']})
print(len( X.columns ))
# Your implimentation
print(set(X.columns) - set(X_PAU.columns)) #This should be empty set
#
print(X_PAU.columns.difference(X.columns).tolist()) # this will print the missing column name
print(len(X_PAU.columns.difference(X.columns).tolist())) # this will print the difference number
Output: Output:
2
1
set()
['test2']
1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.