无法在 pandas dataframe 中重新排列具有相同名称的列 Python

Question

我有一个 pandas dataframe，我已将其列变成列表并进行编辑和重新排列。 我正在尝试重新分配列，如下所示：

sapie_columns = sapie_df_working.columns.tolist()
sapie_columns = [sapie_columns[-1]] + sapie_columns[3:-1]

sapie_df_working = sapie_df_working[sapie_columns]

但它将我的 dataframe（最初有 32 列）变成了 164 列。 我认为这是因为许多现有列具有相同的列名（即“90% CI 下限”）。 我很好奇为什么会这样，以及如何根据需要重新排列和编辑数据框的列。

作为参考，这是我的 dataframe 的片段：

# sapie_df_working

2   State FIPS Code County FIPS Code    Postal Code Name    Poverty Estimate, All Ages  90% CI Lower Bound  90% CI Upper Bound  Poverty Percent, All Ages   90% CI Lower Bound  90% CI Upper Bound  ... 90% CI Upper Bound  Median Household Income 90% CI Lower Bound  90% CI Upper Bound  Poverty Estimate, Age 0-4   90% CI Lower Bound  90% CI Upper Bound  Poverty Percent, Age 0-4    90% CI Lower Bound  90% CI Upper Bound
3   00  000 US  United States   38371394    38309115    38433673    11.9    11.9    11.9    ... 14.9    67340   67251   67429   3146325 3133736 3158914 16.8    16.7    16.9
4   01  000 AL  Alabama 714568  695249  733887  14.9    14.5    15.3    ... 20.7    53958   53013   54903   66169   61541   70797   23.3    21.7    24.9
5   01  001 AL  Autauga County  6242    4930    7554    11.2    8.8 13.6    ... 19.3    67565   59132   75998   .   .   .   .   .   .
6   01  003 AL  Baldwin County  20189   15535   24843   8.9 6.8 11  ... 16.1    71135   66540   75730   .   .   .   .   .   .
7   01  005 AL  Barbour County  5548    4210    6886    25.5    19.3    31.7    ... 47.2    38866   33510   44222   .   .   .   .   .   .

Answer 1

df = df[specific_column_names]确实由于重复的列名而产生了这个结果。 在这种情况下使用列名进行过滤很棘手，因为不清楚具体引用的是哪一列。

如果列名重复，我会改用列索引来过滤 DataFrame。

让我们看一个例子：

>>> import pandas as pd
>>> mock_data = [[11.29, 33.1283, -1.219, -33.11, 930.1, 33.91, 0.1213, 0.134], [9.0, 99.101, 9381.0, -940.11, 55.41, -941.1, -1.3913, 1933.1], [-192.1, 0.123, 0.1243, 0.213, 751.1, 991.1, -1.333, 9481.1]]
>>> mock_columns = ['a', 'b', 'c', 'a', 'd', 'b', 'g', 'a']
>>> df = pd.DataFrame(columns=mock_columns, data=mock_data)
>>> df
        a        b          c        a       d       b       g         a
0   11.29  33.1283    -1.2190  -33.110  930.10   33.91  0.1213     0.134
1    9.00  99.1010  9381.0000 -940.110   55.41 -941.10 -1.3913  1933.100
2 -192.10   0.1230     0.1243    0.213  751.10  991.10 -1.3330  9481.100

>>> columns = df.columns.tolist()
>>> filtered_column_indices = [len(columns) - 1] + list(range(3, len(columns) - 1))
>>> df.iloc[:, filtered_column_indices]
          a        a       d       b       g
0     0.134  -33.110  930.10   33.91  0.1213
1  1933.100 -940.110   55.41 -941.10 -1.3913
2  9481.100    0.213  751.10  991.10 -1.3330

在示例中，我没有使用[sapie_columns[-1]] + sapie_columns[3:-1]提取列名，而是提取了等效索引并使用它来使用iloc 。

无法在 pandas dataframe 中重新排列具有相同名称的列 Python

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-03-11 06:03:00

无法在 pandas dataframe 中重新排列具有相同名称的列 Python

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-03-11 06:03:00

解决方案1
1 已采纳 2022-03-11 06:03:00