Huge newbie to python and this is probably really easy, but I can't get my head around it at all.
I have a text file with a number of rows following this format
nothing doing nothing[0] doing[0]
hello world hello[0] world[2]
There are only spaces between the strings, no markers.
I'd like to extract these strings into excel file in the following format - so that each 'set' of strings are in a separate column.
| 1 | 2 | 3
------------------------------------------------------
1 | nothing doing | nothing[0] | doing[0]
------------------------------------------------------
2 | hello world | hello[0] | world[2]
I've been looking at answers on here but they don't quite full fill this question.
Alright, here's how you'd write to an actual Excel file. Note that my method of splitting isn't as complicated as others because this is mostly about writing to Excel. You'll need the python-excel package to do this.
>>> data = []
>>> with open("data.txt") as f:
... for line in f:
... data.append([word for word in line.split(" ") if word])
...
>>> print data
[['nothing doing', 'nothing[0]', 'doing[0]\n'], ['hello world', 'hello[0]', 'world[2]']]
>>>
>>> import xlwt
>>> wb = xlwt.Workbook()
>>> sheet = wb.add_sheet("New Sheet")
>>> for row_index in range(len(data)):
... for col_index in range(len(data[row_index])):
... sheet.write(row_index, col_index, data[row_index][col_index])
>>>
>>> wb.save("newSheet.xls")
>>>
This produces a workbook with one sheet called "New Sheet" that looks like this
Hopefully this helps
You could use numpy
to read the txt file and csv
to write it as csv file. The csv
package among others allows you to choose the delimiter of your preference.
import numpy
import csv
data = numpy.loadtxt('txtfile.txt', dtype=str)
with open('csvfile.csv', 'w') as fobj:
csvwriter = csv.writer(fobj, delimiter=',')
for row in data:
csvwriter.writerow(row)
The following assumes that each "column" is separated by two or more space characters in a row and that they will never contain a comma in their content.
import csv
import re
splitting_pattern = re.compile(r" {2,}") # two or more spaces in a row
input_filepath = 'text_file_strings.txt'
output_filepath = 'output.csv'
with open(input_filepath, 'rt') as inf, open(output_filepath, 'wb') as outf:
writer = csv.writer(outf, dialect='excel')
writer.writerow([''] + range(1, 4)) # header row
for i, line in enumerate(inf, 1):
line = splitting_pattern.sub(',', line.strip())
writer.writerow([i] + line.split(','))
Contents of output.csv
file created:
,1,2,3
1,nothing doing,nothing[0],doing[0]
2,hello world,hello[0],world[2]
Sometimes people who use mostly Excel get confused about the difference between how Excel displays its sheets and the csv representation in a file. Here, even though @martineau gave you exactly what you showed you wanted, I think what you're actually going to want is something more like:
import re, csv
with open("infile.txt") as fp_in, open("outfile.csv", "wb") as fp_out:
writer = csv.writer(fp_out)
for line in fp_in:
row = re.split("\s\s+", line.strip())
writer.writerow(row)
which will turn
$ cat infile.txt
nothing doing nothing[0] doing[0]
hello world hello[0] world[2]
into
$ cat outfile.csv
nothing doing,nothing[0],doing[0]
hello world,hello[0],world[2]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.