简体   繁体   English

如何在 Python 中使用 Pandas 连接 CSV 文件中的列

[英]How to join columns in CSV files using Pandas in Python

I have a CSV file that looks something like this:我有一个看起来像这样的 CSV 文件:

# data.csv (this line is not there in the file)
Names, Age, Names
John, 5, Jane
Rian, 29, Rath

And when I read it through Pandas in Python I get something like this:当我在 Python 中通过 Pandas 阅读它时,我得到了这样的信息:

import pandas as pd

data = pd.read_csv("data.csv")
print(data)

And the output of the program is:程序的输出是:

  Names   Age  Names
0  John     5   Jane
1  Rian    29   Rath

Is there any way to get:有什么办法可以得到:

  Names   Age  
0  John     5   
1  Rian    29   
2  Jane
3  Rath

First, I'd suggest having unique names for each column.首先,我建议为每列使用唯一的名称。 Either go into the csv file and change the name of a column header or do so in pandas.要么进入 csv 文件并更改列标题的名称,要么在 Pandas 中这样做。

Using 'Names2' as the header of the column with the second occurence of the same column name, try this:使用'Names2'作为第二次出现相同列名的列的标题,试试这个:

Starting from从...开始

datalist = [['John', 5, 'Jane'], ['Rian', 29, 'Rath']]
df = pd.DataFrame(datalist, columns=['Names', 'Age', 'Names2'])

We have我们有

  Names  Age Names
0  John    5  Jane
1  Rian   29  Rath

So, use:所以,使用:

dff = pd.concat([df['Names'].append(df['Names2'])
                                    .reset_index(drop=True), 
                 df.iloc[:,1]], ignore_index=True, axis=1)
                .fillna('').rename(columns=dict(enumerate(['Names', 'Ages'])))

to get your desired result.得到你想要的结果。

From the inside out:从里到外:
df.append combines the columns. df.append组合列。
pd.concat( ... ) combines the results of df.append with the rest of the dataframe. pd.concat( ... )df.append的结果与数据帧的其余部分结合起来。

To discover what the other commands do, I suggest removing them one-by-one and looking at the results.要了解其他命令的作用,我建议将它们一一删除并查看结果。

Please forgive the formating of dff .请原谅dff I'm trying to make everything clear from an educational perspective.我试图从教育的角度把一切都说清楚。 Adjust indents so the code will compile.调整缩进以便代码可以编译。

You can use:您可以使用:
usecols which helps to read only selected columns. usecols有助于仅读取选定的列。
low_memory is used so that we Internally process the file in chunks.使用 low_memory以便我们在内部以块的形式处理文件。

import pandas as pd

data = pd.read_csv("data.csv", usecols = ['Names','Age'], low_memory = False))
print(data)

Please have unique column name in your csv请在您的 csv 中有唯一的列名

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM