简体   繁体   English

在 Python Pandas DataFrame 中保留列顺序

[英]Preserving column order in Python Pandas DataFrame

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas?有没有办法在使用 Python Pandas 读取和写入时保留 csv 文件中列的顺序? For example, in this code例如,在这段代码中

import pandas as pd

data = pd.read_csv(filename)
data.to_csv(filename)

the output files might be different because the columns are not preserved.输出文件可能不同,因为未保留列。

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work.当前版本的 Pandas ('0.11.0') 中似乎存在一个错误,这意味着 Matti John 的答案将不起作用。 If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols.如果您指定用于写入文件的列,它们将按字母顺序写入,但只是根据 cols 中的列表重新标记。 For example, this code:例如,这段代码:

import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

results in this (incorrect) output:导致这个(不正确的)输出:

    b   a   c
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12

You can check which version of pandas you have installed by executing:您可以通过执行以下命令检查您安装了哪个版本的熊猫:

pandas.version.version

Documentation for to_csv is here to_csv 的文档在这里

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):实际上,这似乎是一个已知错误,将在即将发布的版本 (0.11.1) 中修复:

https://github.com/pydata/pandas/issues/3489 https://github.com/pydata/pandas/issues/3489

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:更新:仍然没有新版本的熊猫,但这里描述了一种解决方法,它不需要使用不同版本的熊猫:

github.com/pydata/pandas/issues/3454 github.com/pydata/pandas/issues/3454

So changing the last line in the block of code above to the following will work correctly:因此,将上面代码块中的最后一行更改为以下内容将正常工作:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas.更新似乎参数“cols”已重命名为“columns”,并且在最新版本的熊猫中,参数“engine”已被弃用(不再可用)。 Also, this bug is fixed in version 0.19.0.此外,此错误已在版本 0.19.0 中修复。

The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columns keyword argument in to_csv .在读取然后写入这样的 csv 文件时,通常应保留列顺序,但如果由于某种原因它们不是您想要的顺序,您可以在to_csv中使用columns关键字参数。

For example, if you have a csv with columns a, b, c, d:例如,如果您有一个包含 a、b、c、d 列的 csv:

data = pd.read_csv(filename)
data.to_csv(filename, columns=['a', 'b', 'c', 'd'])

Another workaround is to do this:另一种解决方法是这样做:

import pandas as pd
data = pd.read_csv(filename)
data2 = df[['A','B','C']]  #put 'A' 'B' 'C' in the desired order
data2.to_csv(filename)

When the column names are not known in advance事先不知道列名时

... you can easily specify them by reading the first line of your CSV file which contains headers, then converting the colnames to a list, and - as others pointed out - using that the list in read_csv() : ...您可以通过读取包含标题的 CSV 文件的第一行,然后将 colnames 转换为列表来轻松指定它们,并且 - 正如其他人指出的那样 - 使用read_csv()中的列表:

path_to_table = 'path/to/table.csv'

# read the columns in the order as in CSV:
with open(path_to_table) as f:
    first_line = f.readline()
cols = first_line.strip().split(',')
    
# use it:
df = pd.read_csv(path_to_table, names=cols, header=0)[cols]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM