I have a data in a file I dont know if it is delimited by space or tab
Data In:
id Name year Age Score
123456 ALEX BROWNNIS VND 0 19 115
123457 MARIA BROWNNIS VND 0 57 170
123458 jORDAN BROWNNIS VND 0 27 191
I read it the data with read_csv and using the tab delimited
df = pd.read_csv(data.txt,sep='\t')
out:
id Name year Age Score
0 123456 ALEX BROWNNIS VND ... 0 19 115
1 123457 MARIA BROWNNIS VND ... 0 57 170
2 123458 jORDAN BROWNNIS VND ... 0 27 191
There is a lot of a white spaces between the column. Am I using delimiter correctly? and when I try to process the column name, I got key error
so I basically think the fault is use of \\t
.
What are the possible way to fix this problem?
Since you have two columns and the second one has variable number of words, you need to read it as a regular file and then combine second to last words.
id = []
Name = []
year = []
Age = []
Score = []
with open('data.txt') as f:
text = f.read()
lines = text.split('\n')
for line in lines:
if len(line) < 3: continue
words = line.split()
id.append(words[0])
Name.append(' '.join(words[1:-3]))
year.append(words[-3])
Age.append(words[-2])
Score.append(words[-1])
df = pd.DataFrame.from_dict({'id': id, 'Name': Name,
'year': year, 'Age': Age, 'Score': Score})
Edit: you'd posted the overall data, so I'll change my answer to fit it.
You can use the skipinitialspace
parameter like in the following example.
df2 = pd.read_csv('data.txt', sep='\t', delimiter=',', encoding="utf-8", skipinitialspace=True)
Pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
Problem Solved:
df = pd.read_csv('data.txt', sep='\t',engine="python")
I added this line of code to remove space between columns and it's work
df.columns = df.columns.str.strip()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.