简体   繁体   English

Pandas 删除重复的列

[英]Pandas Remove Duplicated Columns

I am playing with an excel spreadsheet that is comparing the the value of two items across different currencies.我正在使用 excel 电子表格来比较不同货币中两个项目的价值。 The spreadsheet headings are as below:电子表格标题如下:

USD USD DIFFERENCE |美元 美元差异 | GBP GBP DIFFERENCE | GBP 英镑差价 | JPY JPY DIFFERENCE.......日元日元差价.......

When I import this to pandas and create a dataframe it creates headings named Difference.1, Difference.2, Difference.3...., Difference.n当我将其导入 pandas 并创建 dataframe 时,它会创建名为 Difference.1、Difference.2、Difference.3....、Difference.n 的标题

I want to remove all headings named DIFFERENCE我想删除所有名为DIFFERENCE的标题

Note all the difference headings are uniquely named请注意,所有不同的标题都是唯一命名的

I think you can refer to this link for pandas df.drop functions what we will do is below.我认为您可以参考此链接了解 pandas df.drop 函数,我们将在下面执行此操作。

  1. search for all the columns name with DIFFERENCE使用DIFFERENCE搜索所有列名
  2. generate a list with all these names in it生成一个包含所有这些名称的列表
  3. drop columns that contain these names删除包含这些名称的列

// get a new df that contains only name with DIFFERENCE in it // 获取一个新的 df,其中只包含带有 DIFFERENCE 的名称

df2 = df.filter(like='DIFFERENCE ', axis=1)

// generate a list for all these column name // 为所有这些列名生成一个列表

x = []
for col in df2.columns:
    x.append(col)

// drop columns with these names in x // 在 x 中删除具有这些名称的列

df.drop(columns=x)

// OR you can keep a new df for these updated information //或者你可以为这些更新的信息保留一个新的df

df3 = df.drop(columns=x)

I believe all you need is remove columns that contain 'DIFFERENCE' in it names.我相信您所需要的只是删除名称中包含“DIFFERENCE”的列。 In this case you can simply do:在这种情况下,您可以简单地执行以下操作:

    df = pd.read_csv("../path/to/your/file.csv")
    df = df[df.columns.drop(list(df.filter(regex='DIFFERENCE')))]

Example例子

If you have something like this:如果你有这样的事情:

df = pd.DataFrame({"a1": [1,2,3], "a2":[1,1,1], "b":[2,4,6]})
print(df)

Out:
   a1  a2  b
0   1   1  2
1   2   1  4
2   3   1  6

Further step is进一步的步骤是

df = df[df.columns.drop(list(df.filter(regex='a')))]
print(df)

Out:
   b
0  2
1  4
2  6

You can read more about pandas.DataFrame.drop你可以阅读更多关于pandas.DataFrame.drop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM