简体   繁体   中英

Sorting a numeric column in a text file with python

I am trying to sort a text file by the 4th column that contains over 1000 numbers. I can isolate the number column fine but I am unable to sort in ascending order. Here is what I believed was correct. But I keep getting the following error:

'str' object has no attribute 'sort'

Any advise would be great!

file = open("MyFile.txt")

column = []  

for line in file:
    column = line[1:].split("\t")[3]

    print (column.sort())

If I'm right, you're trying to sort the rows , using the 4th column as an index, no?

sorted(open("MyFile.txt").readlines(), key=lambda line: int(line.split('\t')[3]))

Should give you the lines, sorted by the integer value of their 4th tab-split column.

line.split() returns a string, as does reading a line from a file. You cannot sort a string because it is immutable. You can say:

for line in file:
    column.append(float(line[1:].split("\t")[3]))

column.sort()

Since you say that the file contains numbers separated by the tab character, you could use the csv module to process it. Note that I show 'statistic' since csv files contain headers that allow keys. If you do not have that or do not want to use it, just substitute the column index (in your case 3). If there is no header line, use the fieldnames parameter to set the column names.

import csv
ifile = open('file.csv', 'rb')
infile = csv.DictReader(ifile, delimiter='\t')
# If the first line does not contain the header then specify the header
try:
  sortedlist = sorted(infile, key=lambda d: float(d['statistic']))
except ValueError:
  #First line was the header, go back and skip it
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(infile, key=lambda d: float(d['statistic']))
ifile.close()

# now process sortedlist and build an output file to write using csv.DictWriter()

try this code:

file = open("a")
column = []

for line in file:
    column.append(int(line.split("\t")[3]))

column.sort()
print(column)

file.close()

what changed:

  1. line.split("\\t") returns a list of strings, so doing column.append(int(line.split("\\t")[3])) we select the fourth element of this list, transform it into an integer and add this integer to our list (column)
  2. doing print (column.sort()) would print the output of the sort method, which is None so we first have to sort the list before we print it. Another solution would be to use the sorted function print(sorted(column)) (see here too undestand the difference)
  3. we close the file we opened, no memory leak

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM