I have a raw file as :
RollNo Address1 City State ZipCode Age Branch Subject Marks1 Marks2
10000 6505 N MGM W ROAD MMUMBAI CITY IN 46360 77 0 0 -1 1
10002 1721 HAZAREER DR. DR. UNIT 8 BELAGHIA FL 33756 86 0 0 -1 2
How can I convert this to a comma separated file in python as:
RollNo,Address1,City,State,ZipCode,Age,Branch,Subject,Marks1,Marks2
10000,6505 N MGM W ROAD,MMUMBAI CITY,IN,46360,77,0,0,-1,1
10002,1721 HAZAREER DR. DR. UNIT 8,BELAGHIA,FL,33756,86,0,0,-1,2
I tried to convert it to a list, so later i can convert it to a comma separated string, using \\t as the delimiter, but seems like it won't give me the desired output.
My code was:
files_list=[[i for i in line.strip().split(' ')] for line in open('C:/Users/Vinny/Desktop/Python/file2cnvrt.txt').readlines()]
The output I got:
[['RollNo', 'Address1', 'City', 'State', 'ZipCode', 'Age', 'Branch', 'Subject', 'Marks1', 'Marks2'],
['10000 6505 N MGM W ROAD MMUMBAI CITY IN 46360 77 0 0 -1 1'],
['10002 1721 HAZAREER DR. DR. UNIT 8 BELAGHIA FL 33756 86 0 0 -1 2']]
Can anyone suggest ?
Try this:
def read_file(filename):
indices = [13, 113, 145, 153, 184, 196, 211, 225, 237, 0]
columns = []
data = []
with open(filename) as f:
lines = f.readlines()
columns = lines[0].strip().split(' ')
for line in lines[1:]:
row = []
line = line.strip()
for i in range(len(indices) - 1):
row.append(line[indices[i-1]:indices[i]].rstrip())
data.append(row)
return [columns] + data
The indices were gathered from the data you gave us. I assumed that everything was perfectly aligned.
This may not be the most optimised way, though it produces a comma separated file of the values. Where FILE_IN and FILE_OUT are the filenames of input and output files respectively.
# Read file lines to list as values
file_in = open(FILE_IN, 'r')
lines_of_values = []
for line in file_in:
# Split line, remove whitespace and remove empty fields
line_values = list(filter(None, line.strip().split(' ')))
values = [value.strip() for value in line_values]
lines_of_values.append(values)
file_in.close()
# Open file to save comma separated values
file_out = open(FILE_OUT, 'w')
for values in lines_of_values:
print("{:s}".format(",".join(values)), file=file_out)
file_out.close()
Several things. First of all, don't use open()
directly in your list comprehension.
If you want to use open()
, always use a context manager, which guarantees that the file will be closed when you are done with it:
with open('filename..txt') as f:
lines = f.readlines()
Second: you'll find your life a lot easier to not bother with open()
at all and start using the amazing pathlib
module .
import Path from pathlib
f_path = Path('C:/Users/Vinny/Desktop/Python/file2cnvrt.txt')
# get text as one big string:
file_str = f_path.read_text()
# get text as a tuple of lines (splits along new line characters):
lines_tuple = f_path.read_text().split('\n')
# get text as a list of lines (use a list if you intend to edit the lines):
lines = list(f_path.read_text().split('\n'))
Third: instead of copying and pasting the entire path to your desktop, you can automatically find its location using the Windows USERPROFILE environment variable:
from pathlib import Path
import os
# os.getenv just gives you a dictionary with all the Windows environment variables
# (such as USERPROFILE and APPDATA)
user_folder_str = os.getenv['%USERPROFILE%']
desktop_path = Path(user_folder_str)/'Desktop'
file_path = Path(user_folder_str)/'Desktop'/'my_file.txt'
lines = list(file_path.read_text().split('\n'))
Fourth: it appears that the sample raw file you pasted does not have any tab characters ( '\\t'
) in it. It has 4 spaces ( ' '
) instead. If this is actually the case, this should work:
[[i for i in line.strip().split(' ') if i] for line in lines]
Note the if i
part. That makes sure any consecutive sets of 4 spaces don't put empty strings ( ''
) in your list.
However, your pasted code- which is equivalent to the above- is producing the wrong result. I think it may be because your second and third line actually do have tab characters ( '\\t'
) in them rather than 4 spaces. So you'll need to split()
using both 4 spaces and a tab character.
The easiest way to do this is to replace the tabs with 4 spaces. Use the same if i
again to avoid empty strings.
[[i for i in line.strip().replace('\t', ' ').split(' ') if i] for line in lines]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.