I have a CSV file that looks something like this:
# data.csv (this line is not there in the file)
Names, Age, Names
John, 5, Jane
Rian, 29, Rath
And when I read it through Pandas in Python I get something like this:
import pandas as pd
data = pd.read_csv("data.csv")
print(data)
And the output of the program is:
Names Age Names
0 John 5 Jane
1 Rian 29 Rath
Is there any way to get:
Names Age
0 John 5
1 Rian 29
2 Jane
3 Rath
First, I'd suggest having unique names for each column. Either go into the csv file and change the name of a column header or do so in pandas.
Using 'Names2'
as the header of the column with the second occurence of the same column name, try this:
Starting from
datalist = [['John', 5, 'Jane'], ['Rian', 29, 'Rath']]
df = pd.DataFrame(datalist, columns=['Names', 'Age', 'Names2'])
We have
Names Age Names
0 John 5 Jane
1 Rian 29 Rath
So, use:
dff = pd.concat([df['Names'].append(df['Names2'])
.reset_index(drop=True),
df.iloc[:,1]], ignore_index=True, axis=1)
.fillna('').rename(columns=dict(enumerate(['Names', 'Ages'])))
to get your desired result.
From the inside out:
df.append
combines the columns.
pd.concat( ... )
combines the results of df.append
with the rest of the dataframe.
To discover what the other commands do, I suggest removing them one-by-one and looking at the results.
Please forgive the formating of dff
. I'm trying to make everything clear from an educational perspective. Adjust indents so the code will compile.
You can use:
usecols which helps to read only selected columns.
low_memory is used so that we Internally process the file in chunks.
import pandas as pd
data = pd.read_csv("data.csv", usecols = ['Names','Age'], low_memory = False))
print(data)
Please have unique column name in your csv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.