简体   繁体   English

从 Pandas DataFrame 列标题中获取列表

[英]Get a list from Pandas DataFrame column headers

I want to get a list of the column headers from a Pandas DataFrame.我想从 Pandas DataFrame 中获取列标题列表。 The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called. DataFrame 将来自用户输入,所以我不知道会有多少列或它们将被称为什么。

For example, if I'm given a DataFrame like this:例如,如果给我一个 DataFrame 像这样:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would get a list like this:我会得到这样的列表:

>>> header_list
['y', 'gdp', 'cap']

You can get the values as a list by doing:您可以通过执行以下操作以列表形式获取值:

list(my_dataframe.columns.values)

Also you can simply use: (as shown in Ed Chum's answer ):您也可以简单地使用:(如Ed Chum 的回答所示):

list(my_dataframe)

There is a built in method which is the most performant:有一个内置的方法是性能最好的:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function .tolist to return a list. .columns返回一个索引, .columns.values返回一个数组,它有一个辅助函数.tolist返回一个列表。

If performance is not as important to you, Index objects define a .tolist() method that you can call directly:如果性能对你来说不是那么重要, Index对象定义了一个.tolist()方法,你可以直接调用它:

my_dataframe.columns.tolist()

The difference in performance is obvious:性能差异很明显:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For those who hate typing, you can just call list on df , as so:对于那些讨厌打字的人,您可以在df上调用list ,如下所示:

list(df)

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:做了一些快速测试,也许不出所料,使用dataframe.columns.values.tolist()的内置版本是最快的:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!) (不过,我仍然非常喜欢list(dataframe)框),所以感谢 EdChum!)

Its gets even simpler (by pandas 0.16.0) :它变得更加简单(由 pandas 0.16.0 提供):

df.columns.tolist()

will give you the column names in a nice list.会给你一个不错的列表中的列名。

Surprised I haven't seen this posted so far, so I'll just leave this here.很惊讶我到目前为止还没有看到这个帖子,所以我就把这个留在这里。

Extended Iterable Unpacking (python3.5+): [*df] and Friends扩展迭代解包(python3.5+): [*df]和朋友

Unpacking generalizations (PEP 448) have been introduced with Python 3.5.解包概括 (PEP 448)已在 Python 3.5 中引入。 So, the following operations are all possible.所以,下面的操作都是可能的。

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x 

If you want a list ....如果你想要一个list ......

[*df]
# ['A', 'B', 'C']

Or, if you want a set ,或者,如果你想要一个set

{*df}
# {'A', 'B', 'C'}

Or, if you want a tuple ,或者,如果你想要一个tuple

*df,  # Please note the trailing comma
# ('A', 'B', 'C')

Or, if you want to store the result somewhere,或者,如果您想将结果存储在某处,

*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']

... if you're the kind of person who converts coffee to typing sounds, well, this is going consume your coffee more efficiently ;) ...如果你是那种将咖啡转换成打字声音的人,那么这会更有效地消耗你的咖啡 ;)

PS: if performance is important, you will want to ditch the solutions above in favour of PS:如果性能很重要,你会想要放弃上面的解决方案来支持

df.columns.to_numpy().tolist() # ['A', 'B', 'C']

This is similar to Ed Chum's answer , but updated for v0.24 where .to_numpy() is preferred to the use of .values .这类似于Ed Chum 的回答,但针对 v0.24 进行了更新,其中.to_numpy()比使用.values See this answer (by me) for more information.有关更多信息,请参阅此答案(由我提供)。

Visual Check目视检查
Since I've seen this discussed in other answers, you can utilise iterable unpacking (no need for explicit loops).由于我在其他答案中看到了这一点,因此您可以使用可迭代解包(不需要显式循环)。

print(*df)
A B C

print(*df, sep='\n')
A
B
C

Critique of Other Methods对其他方法的批评

Don't use an explicit for loop for an operation that can be done in a single line (List comprehensions are okay).不要对可以在一行中完成的操作使用显式for循环(列表推导式是可以的)。

Next, using sorted(df) does not preserve the original order of the columns.接下来,使用sorted(df)不会保留的原始顺序 For that, you should use list(df) instead.为此,您应该使用list(df)代替。

Next, list(df.columns) and list(df.columns.values) are poor suggestions (as of the current version, v0.24).接下来, list(df.columns)list(df.columns.values)是不好的建议(截至当前版本,v0.24)。 Both Index (returned from df.columns ) and NumPy arrays (returned by df.columns.values ) define .tolist() method which is faster and more idiomatic. Index (从df.columns返回)和 NumPy 数组(由df.columns.values返回)都定义了.tolist()方法,该方法更快、更惯用。

Lastly, listification ie, list(df) should only be used as a concise alternative to the aforementioned methods for python <= 3.4 where extended unpacking is not available.最后, list(df) ,即list(df)仅应用作上述方法的简洁替代方法,用于 python <= 3.4,其中扩展解包不可用。

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:要在调试器模式下列出数据帧的列,请使用列表理解:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted :顺便说一句,您可以通过使用sorted来获得一个排序列表:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

这可以作为my_dataframe.columns

ADataFrame follows the dict-like convention of iterating over the “keys” of the objects. DataFrame遵循迭代对象的“键”的类似 dict 的约定。

my_dataframe.keys()

Create a list of keys/columns - object method to_list() and pythonic way创建键/列列表 - 对象方法to_list()to_list()方式

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels DataFrame 上的基本迭代返回列标签

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels.不要将 DataFrame 转换为列表,只是为了获取列标签。 Do not stop thinking while looking for convenient code samples.在寻找方便的代码示例时不要停止思考。

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

It's interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:这很有趣,但df.columns.values.tolist()是快了近3倍,然后df.columns.tolist()但我认为他们是相同的:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

In the Notebook在笔记本中

For data exploration in the IPython notebook, my preferred way is this:对于 IPython notebook 中的数据探索,我的首选方式是这样的:

sorted(df)

Which will produce an easy to read alphabetically ordered list.这将产生一个易于阅读的按字母顺序排列的列表。

In a code repository在代码存储库中

In code I find it more explicit to do在代码中,我发现这样做更明确

df.columns

Because it tells others reading your code what you are doing.因为它告诉其他阅读你的代码的人你在做什么。

%%timeit
final_df.columns.values.tolist()
948 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
list(final_df.columns)
14.2 µs ± 79.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.columns.values)
1.88 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
final_df.columns.tolist()
12.3 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
list(final_df.head(1).columns)
163 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I want to get a list of the column headers from a pandas DataFrame.我想从pandas DataFrame获取列标题的列表。 The DataFrame will come from user input so I won't know how many columns there will be or what they will be called. DataFrame将来自用户输入,所以我不知道会有多少列或它们将被称为什么。

For example, if I'm given a DataFrame like this:例如,如果给我这样的DataFrame:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would get a list like this:我会得到这样的列表:

>>> header_list
['y', 'gdp', 'cap']

as answered by Simeon Visser...you could do正如西蒙·维瑟 (Simeon Visser) 所回答的……你可以这样做

list(my_dataframe.columns.values) 

or或者

list(my_dataframe) # for less typing.

But I think most the sweet spot is:但我认为最甜蜜的地方是:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.它是明确的,同时也不会过长。

For a quick, neat, visual check, try this:要快速、整洁、目视检查,请尝试以下操作:

for col in df.columns:
    print col

I feel question deserves additional explanation.我觉得问题值得额外解释。

As @fixxxer noted, the answer depends on the pandas version you are using in your project.正如@fixxxer 所指出的,答案取决于您在项目中使用的 Pandas 版本。 Which you can get with pd.__version__ command.您可以使用pd.__version__命令获得。

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:如果您出于某种原因像我一样(在 debian jessie 上我使用 0.14.1)使用比 0.16.0 旧版本的 Pandas,那么您需要使用:

df.keys().tolist() because there is no df.columns method implemented yet. df.keys().tolist()因为还没有实现df.columns方法。

The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.这种键方法的优点是,它甚至可以在较新版本的 Pandas 中工作,因此它更通用。

import pandas as pd

# create test dataframe
df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(2))

list(df.columns)

Returns退货

['A', 'B', 'C']
n = []
for i in my_dataframe.columns:
    n.append(i)
print n

I want to get a list of the column headers from a pandas DataFrame.我想从pandas DataFrame获取列标题的列表。 The DataFrame will come from user input so I won't know how many columns there will be or what they will be called. DataFrame将来自用户输入,所以我不知道会有多少列或它们将被称为什么。

For example, if I'm given a DataFrame like this:例如,如果给我这样的DataFrame:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would get a list like this:我会得到这样的列表:

>>> header_list
['y', 'gdp', 'cap']

Even though the solution that was provided above is nice.即使上面提供的解决方案很好。 I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax.我也希望像 frame.column_names() 之类的东西是 Pandas 中的一个函数,但由于它不是,也许使用以下语法会很好。 It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function: frame.columns.tolist()通过调用“tolist”函数,它以某种方式保留了您以正确的方式使用熊猫的感觉:frame.columns.tolist()

frame.columns.tolist() 

If the DataFrame happens to have an Index or MultiIndex and you want those included as column names too:如果 DataFrame 恰好有一个 Index 或 MultiIndex 并且您也希望将它们作为列名包含在内:

names = list(filter(None, df.index.names + df.columns.values.tolist()))

It avoids calling reset_index() which has an unnecessary performance hit for such a simple operation.它避免了调用 reset_index() ,因为这样一个简单的操作会对性能造成不必要的影响。

I've run into needing this more often because I'm shuttling data from databases where the dataframe index maps to a primary/unique key, but is really just another "column" to me.我经常遇到这种情况,因为我正在从数据库中穿梭数据,其中数据帧索引映射到主/唯一键,但对我来说实际上只是另一个“列”。 It would probably make sense for pandas to have a built-in method for something like this (totally possible I've missed it).大熊猫有一个内置的方法来处理这样的事情可能是有意义的(我完全有可能错过了)。

listHeaders = [my_dataframe 中 colName 的 colName]

The simplest option would be: list(my_dataframe.columns) or my_dataframe.columns.tolist()最简单的选项是: list(my_dataframe.columns)my_dataframe.columns.tolist()

No need for the complex stuff above :)不需要上面复杂的东西:)

This is the easiest way to reach your goal.这是实现目标的最简单方法。

my_dataframe.columns.values.tolist() my_dataframe.columns.values.tolist()

and if you are Lazy, try this >如果你很懒,试试这个>

list(my_dataframe)列表(my_dataframe)

its the simple code for you:它为您提供了简单的代码:

for i in my_dataframe:
    print(i)

just do it去做就对了

Its very simple.它非常简单。

Like you can do it as:就像你可以这样做:

list(df.columns)列表(df.columns)

此解决方案列出了对象 my_dataframe 的所有列:

print(list(my_dataframe))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM