I am new here and a python beginner. I received a text file containing 100k lines each containing 120 characters. Every line is representing data for 14 columns but as some values are shorter the other they are filled up with blank. That way I don´t have a separator like ",". If I would choose blank as separator, the values would not go to the correct column.
Lines are like
O2020august Opel .
L2015may BMW .
L2016april Mercedes.
O2021january Opel .
L2023februaryAudi .
I am stuck with
df = pd.read_csv('text.txt', index_col=0, header = None)
print (data)
I am happy for any approach suggested. Doesn´t need to be pandas.
Cheers Jeanny
Or you can use a simple helper function that does the job for you.
def split_by_pos(string_to_split, *args):
"""
Splits a string at the given positions
:param string_to_split: the string to be split
:param args: the positions where the function will split the string.
:return: the splitted string as a tuple
"""
return_value = list()
args = sorted(args)
previous = 0
for position in args:
return_value.append(string_to_split[previous:position])
previous = position
return_value.append(string_to_split[previous:])
return tuple(return_value)
with open("a_random_file.txt", "r", encoding="utf-8") as fp:
lines = fp.readlines()
for line in lines:
print(split_by_pos(line, 1, 5, 12))
I believe something like that can solve your problem.
for line in txt:
#line should point something like that => "O2020august Opel"
print(line)
s1 = line[:1]
s2 = line[1:5]
s3 = line[5:13]
.
.
.
print(s1, s2, s3)
You can use readline
and readlines
methods of Python file read API.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.