[英]Dropping a group of columns based on a condition fulfilled by a column in Pandas
I have a dataframe whereby the column names are tuples.我有一个 dataframe,列名是元组。 Below is a sample data for my dataframe:下面是我的 dataframe 的示例数据:
data = {"('7086', 'Open')": {5: 0.0700000002980232, 6: 0.0649999976158142, 7: 0.0750000029802322, 8: 0.0649999976158142, 9: 0.0700000002980232},
"('7086', 'High')": {5: 0.0700000002980232, 6: 0.0750000029802322, 7: 0.0750000029802322, 8: 0.0750000029802322, 9: 0.0700000002980232},
"('7086', 'Low')": {5: 0.0700000002980232, 6: 0.0649999976158142, 7: 0.0750000029802322, 8: 0.0649999976158142, 9: 0.0700000002980232},
"('7086', 'Close')": {5: 0.0700000002980232, 6: 0.0750000029802322, 7: 0.0750000029802322, 8: 0.0750000029802322, 9: 0.0700000002980232},
"('7086', 'Adj Close')": {5: 0.0700000002980232, 6: 0.0750000029802322, 7: 0.0750000029802322, 8: 0.0750000029802322, 9: 0.0700000002980232},
"('7086', 'Volume')": {5: 0, 6: 3200, 7: 0, 8: 200800, 9: 260000},
"('03028', 'Open')": {5: 0.3600000143051147, 6: 0.3600000143051147, 7: 0.3600000143051147, 8: 0.3600000143051147, 9: 0.3600000143051147},
"('03028', 'High')": {5: 0.3600000143051147, 6: 0.3600000143051147, 7: 0.3600000143051147, 8: 0.3600000143051147, 9: 0.3600000143051147},
"('03028', 'Low')": {5: 0.3600000143051147, 6: 0.3600000143051147, 7: 0.3600000143051147, 8: 0.3600000143051147, 9: 0.3600000143051147},
"('03028', 'Close')": {5: 0.3600000143051147, 6: 0.3600000143051147, 7: 0.3600000143051147, 8: 0.3600000143051147, 9: 0.3600000143051147},
"('03028', 'Adj Close')": {5: 0.3509772419929504, 6: 0.3509772419929504, 7: 0.3509772419929504, 8: 0.3509772419929504, 9: 0.3509772419929504},
"('03028', 'Volume')": {5: 15500.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}}
df = pd.DataFrame(data)
('7086', 'Open') ... ('03028', 'Volume')
5 0.070 ... 15500.0
6 0.065 ... 0.0
7 0.075 ... 0.0
8 0.065 ... 0.0
9 0.070 ... 0.0
[5 rows x 12 columns]
Now, what I hope to perform is to remove the group of columns with the same number code when one of the columns in the number code has only 1 unique number:现在,我希望执行的是当数字代码中的其中一列只有 1 个唯一数字时,删除具有相同数字代码的一组列:
df.nunique()
('7086', 'Open') 3
('7086', 'High') 2
('7086', 'Low') 3
('7086', 'Close') 2
('7086', 'Adj Close') 2
('7086', 'Volume') 4
('03028', 'Open') 1
('03028', 'High') 1
('03028', 'Low') 1
('03028', 'Close') 1
('03028', 'Adj Close') 1
('03028', 'Volume') 2
dtype: int64
As above, we can see in the number code 03028
, that 5 out of 6 columns have only 1 unique value.如上,我们可以在数字代码03028
中看到,6 列中的 5 列只有 1 个唯一值。 Therefore, I wish to remove all the 6 columns with the same number code based on this fact.因此,我希望根据这个事实删除所有具有相同数字代码的6列。 Is there any idea I can do that?有什么想法我可以做到吗? I was thinking of grouping it together and dropping it.我正在考虑将它组合在一起并删除它。 But I'm not sure how to deal with the tuples.但我不确定如何处理元组。 Thanks in advance.提前致谢。
My expected output will be:我预期的 output 将是:
('7086', 'Open') ... ('7086', 'Volume')
5 0.070 ... 0
6 0.065 ... 3200
7 0.075 ... 0
8 0.065 ... 200800
9 0.070 ... 260000
[5 rows x 6 columns]
You can do你可以做
out = df.loc[:,~df.columns.str.contains("'03028',")]
If multiple如果多个
out = df.loc[:,~df.columns.str.contains("'03028',|'00000',")]
Try something like this尝试这样的事情
import ast
df.columns = pd.MultiIndex.from_tuples(map(ast.literal_eval, df.columns))
filter_ = (df.nunique()==1).groupby(level=0).any()
df.drop(filter_[filter_].index, axis=1, level=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.