简体   繁体   中英

Reading file with delimiter using pandas

I have a data in a file I dont know if it is delimited by space or tab

Data In:

id              Name                                                                                year    Age Score 

123456          ALEX BROWNNIS VND                                                                        0      19     115
123457          MARIA BROWNNIS VND                                                                       0      57     170
123458          jORDAN BROWNNIS VND                                                                      0      27     191

I read it the data with read_csv and using the tab delimited

df = pd.read_csv(data.txt,sep='\t')

out:

     id           Name                                                                                year  Age  Score 
0          123456  ALEX BROWNNIS VND                             ...                                     0   19     115
1          123457  MARIA BROWNNIS VND                            ...                                     0   57     170
2          123458  jORDAN BROWNNIS VND                           ...                                     0   27     191

There is a lot of a white spaces between the column. Am I using delimiter correctly? and when I try to process the column name, I got key error so I basically think the fault is use of \\t .
What are the possible way to fix this problem?

Since you have two columns and the second one has variable number of words, you need to read it as a regular file and then combine second to last words.

id = [] 
Name = [] 
year = []
Age = []
Score = []
with open('data.txt') as f: 
    text = f.read() 
lines = text.split('\n') 
for line in lines: 
    if len(line) < 3: continue 
    words = line.split() 
    id.append(words[0]) 
    Name.append(' '.join(words[1:-3])) 
    year.append(words[-3])
    Age.append(words[-2])
    Score.append(words[-1])
df = pd.DataFrame.from_dict({'id': id, 'Name': Name,
              'year': year, 'Age': Age, 'Score': Score})

Edit: you'd posted the overall data, so I'll change my answer to fit it.

You can use the skipinitialspace parameter like in the following example.

df2 = pd.read_csv('data.txt', sep='\t', delimiter=',', encoding="utf-8", skipinitialspace=True)

Pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Problem Solved:

df = pd.read_csv('data.txt', sep='\t',engine="python")

I added this line of code to remove space between columns and it's work

df.columns = df.columns.str.strip()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM