如何在 Python 中使用 Pandas 连接 CSV 文件中的列

Question

I have a CSV file that looks something like this:我有一个看起来像这样的 CSV 文件：

# data.csv (this line is not there in the file)
Names, Age, Names
John, 5, Jane
Rian, 29, Rath

And when I read it through Pandas in Python I get something like this:当我在 Python 中通过 Pandas 阅读它时，我得到了这样的信息：

import pandas as pd

data = pd.read_csv("data.csv")
print(data)

And the output of the program is:程序的输出是：

  Names   Age  Names
0  John     5   Jane
1  Rian    29   Rath

Is there any way to get:有什么办法可以得到：

  Names   Age  
0  John     5   
1  Rian    29   
2  Jane
3  Rath

Answer 1

First, I'd suggest having unique names for each column.首先，我建议为每列使用唯一的名称。 Either go into the csv file and change the name of a column header or do so in pandas.要么进入 csv 文件并更改列标题的名称，要么在 Pandas 中这样做。

Using 'Names2' as the header of the column with the second occurence of the same column name, try this:使用'Names2'作为第二次出现相同列名的列的标题，试试这个：

Starting from从...开始

datalist = [['John', 5, 'Jane'], ['Rian', 29, 'Rath']]
df = pd.DataFrame(datalist, columns=['Names', 'Age', 'Names2'])

We have我们有

  Names  Age Names
0  John    5  Jane
1  Rian   29  Rath

So, use:所以，使用：

dff = pd.concat([df['Names'].append(df['Names2'])
                                    .reset_index(drop=True), 
                 df.iloc[:,1]], ignore_index=True, axis=1)
                .fillna('').rename(columns=dict(enumerate(['Names', 'Ages'])))

to get your desired result.得到你想要的结果。

From the inside out:从里到外：
df.append combines the columns. df.append组合列。
pd.concat( ... ) combines the results of df.append with the rest of the dataframe. pd.concat( ... )将df.append的结果与数据帧的其余部分结合起来。

To discover what the other commands do, I suggest removing them one-by-one and looking at the results.要了解其他命令的作用，我建议将它们一一删除并查看结果。

Please forgive the formating of dff .请原谅dff 。 I'm trying to make everything clear from an educational perspective.我试图从教育的角度把一切都说清楚。 Adjust indents so the code will compile.调整缩进以便代码可以编译。

Answer 2

You can use:您可以使用：
usecols which helps to read only selected columns. usecols有助于仅读取选定的列。
low_memory is used so that we Internally process the file in chunks.使用 low_memory以便我们在内部以块的形式处理文件。

import pandas as pd

data = pd.read_csv("data.csv", usecols = ['Names','Age'], low_memory = False))
print(data)

Please have unique column name in your csv请在您的 csv 中有唯一的列名

如何在 Python 中使用 Pandas 连接 CSV 文件中的列

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-09-19 02:55:57

解决方案2
-1 2020-09-19 02:17:45

如何在 Python 中使用 Pandas 连接 CSV 文件中的列

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-09-19 02:55:57

解决方案2 -1 2020-09-19 02:17:45

解决方案1
1 已采纳 2020-09-19 02:55:57

解决方案2
-1 2020-09-19 02:17:45