簡體   English   中英

Python:如何讀取文本文件中不同長度的空格分隔數據並解析它

[英]Python: How to read space delimited data with different length in text file and parse it

我在文本文件中有空格分隔的數據,如下所示:

0 1 2 3

1 2 3   

3 4 5 6

1 3 5   

1           

2 3 5   

3 5     

每條線都有不同的長度。

我需要從第 2 行('1 2 3')開始閱讀它並解析它並獲得以下信息:

  1. 唯一數據數 = (1,2,3,4,5,6)=6

  2. 每個數據的計數:

    計數數據 (1)=3

    計數數據 (2)=2

    計數數據(3)=5

    計數數據 (4)=1

    計數數據 (5)=4

    計數數據 (6)=1

  3. 行數=6

  4. 按降序對數據進行排序:

    數據 (3)

    數據 (5)

    數據 (1)

    數據 (2)

    數據 (4)

    數據 (6)

我這樣做了:

file=open('data.txt')
csvreader=csv.reader(file)
header=[]
header=next(csvreader)
print(header)
rows=[]
for row in csvreader:
    rows.append(row)
print(rows)

這一步之后,我該怎么做才能得到預期的結果?

我會做這樣的事情:

from collections import Counter

with open('data.txt', 'r') as file:
    lines = file.readlines()

lines = lines[1:]  # skip first line
data = []
for line in lines:
    data += line.strip().split(" ")

counter = Counter(data)
print(f'unique data: {list(counter.keys())}')
print(f'count data: {list(sorted(counter.most_common(), key=lambda x: x[0]))}')
print(f'number of lines: {len(lines)}')
print(f'sort data: {[x[0] for x in counter.most_common()]}')

一個簡單的蠻力方法:

nums = []
counts = {}
for row in open('data.txt'):
    if row[0] == '0':
        continue
    nums.extend( [int(k) for k in row.rstrip().split()] )
print(nums)
for n in nums:
    if n not in counts:
        counts[n] = 1
    else:
        counts[n] += 1
print(counts)
ordering = list(sorted(counts.items(), key=lambda k: -k[1]))
print(ordering)

這是另一種方法

def getData(infile):
    """ Read file lines and return lines 1 thru end"""
    lnes = []
    with open(infile, 'r') as data:
        lnes = data.readlines()
    return lnes[1:]

def parseData(ld):
    """ Parse data and print desired results """
    unique_symbols = set()
    all_symbols = dict()
    for l in ld:
        symbols = l.strip().split()
        for s in symbols:
            unique_symbols.add(s)
            cnt = all_symbols.pop(s, 0)
            cnt += 1
            all_symbols[s] = cnt
    print(f'Number of Unique Symbols = {len(unique_symbols)}')
    print(f'Number of Lines Processed = {len(ld)}')
    for symb in unique_symbols:
        print(f'Number of {symb} = {all_symbols[symb]}')
    print(f"Descending Sort of Symbols = {', '.join(sorted(list(unique_symbols), reverse=True))}")   

執行時:

infile = r'spaced_text.txt'
parseData(getData(infile))  

產生:

Number of Unique Symbols = 6
Number of Lines Processed = 6
Number of 2 = 2
Number of 5 = 4
Number of 3 = 5
Number of 1 = 3
Number of 6 = 1
Number of 4 = 1
Descending Sort of Symbols = 6, 5, 4, 3, 2, 1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM