[英]Pandas Remove Duplicated Columns
I am playing with an excel spreadsheet that is comparing the the value of two items across different currencies.我正在使用 excel 电子表格来比较不同货币中两个项目的价值。 The spreadsheet headings are as below:电子表格标题如下:
USD USD DIFFERENCE |美元 美元差异 | GBP GBP DIFFERENCE | GBP 英镑差价 | JPY JPY DIFFERENCE.......日元日元差价.......
When I import this to pandas and create a dataframe it creates headings named Difference.1, Difference.2, Difference.3...., Difference.n当我将其导入 pandas 并创建 dataframe 时,它会创建名为 Difference.1、Difference.2、Difference.3....、Difference.n 的标题
I want to remove all headings named DIFFERENCE我想删除所有名为DIFFERENCE的标题
Note all the difference headings are uniquely named请注意,所有不同的标题都是唯一命名的
I think you can refer to this link for pandas df.drop functions what we will do is below.我认为您可以参考此链接了解 pandas df.drop 函数,我们将在下面执行此操作。
// get a new df that contains only name with DIFFERENCE in it // 获取一个新的 df,其中只包含带有 DIFFERENCE 的名称
df2 = df.filter(like='DIFFERENCE ', axis=1)
// generate a list for all these column name // 为所有这些列名生成一个列表
x = []
for col in df2.columns:
x.append(col)
// drop columns with these names in x // 在 x 中删除具有这些名称的列
df.drop(columns=x)
// OR you can keep a new df for these updated information //或者你可以为这些更新的信息保留一个新的df
df3 = df.drop(columns=x)
I believe all you need is remove columns that contain 'DIFFERENCE' in it names.我相信您所需要的只是删除名称中包含“DIFFERENCE”的列。 In this case you can simply do:在这种情况下,您可以简单地执行以下操作:
df = pd.read_csv("../path/to/your/file.csv")
df = df[df.columns.drop(list(df.filter(regex='DIFFERENCE')))]
If you have something like this:如果你有这样的事情:
df = pd.DataFrame({"a1": [1,2,3], "a2":[1,1,1], "b":[2,4,6]})
print(df)
Out:
a1 a2 b
0 1 1 2
1 2 1 4
2 3 1 6
Further step is进一步的步骤是
df = df[df.columns.drop(list(df.filter(regex='a')))]
print(df)
Out:
b
0 2
1 4
2 6
You can read more about pandas.DataFrame.drop你可以阅读更多关于pandas.DataFrame.drop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.