![](/img/trans.png)
[英]Python - group by columns A + B and count row values for columns C for each unique occurrence of A + B
[英]python group by and count Columns for each line
我有一個 txt 文件,其中包含 n 行,每行有 n 列,帶有一個分隔符。
File :
x|x|x|x
x|x|x|x|x|x
x|x|x|x|x|x|x|x|x|x|x
x|x|x
x|x|x|x
x|x|x
我想像下面的輸出
out:
按列分組(列數相同) - 列數 - 行號
2 - 4 - line 1, line 5
1 - 6 - line 2
1 - 11 - line 3
2 - 3 - line 4,line 6
你能幫我嗎? 我嘗試過熊貓,但我無法成功。
當然。 你絕對不需要 Pandas; collections.defaultdict
是你的朋友。
import io
from collections import defaultdict
# Could be a `open(...)` instead, but we're using a
# StringIO to make this a self-contained program.
data = io.StringIO("""
x|x|x|x
x|x|x|x|x|x
x|x|x|x|x|x|x|x|x|x|x
x|x|x
x|x|x|x
x|x|x
""".strip())
linenos_by_count = defaultdict(set)
for lineno, line in enumerate(data, 1):
count = line.count("|") + 1 # Count delimiters, add 1
linenos_by_count[count].add(lineno)
for count, linenos in sorted(linenos_by_count.items()):
lines_desc = ", ".join(f"line {lineno}" for lineno in sorted(linenos))
print(f"{len(linenos)} - {count} - {lines_desc}")
產出
2 - 3 - line 4, line 6
2 - 4 - line 1, line 5
1 - 6 - line 2
1 - 11 - line 3
這是使用基於@AKX 方法的itertools.groupby
的替代方法:
from itertools import groupby
print('\n'.join([f'{len(G)} - {k} - '+', '.join([f'line {x[0]+1}' for x in G])
for k, g in groupby(sorted(enumerate([s.count('x')
for s in data.split('\n')
]),
key=lambda x: x[1]),
lambda x: x[1]
)
for G in [list(g)]
]))
輸出:
2 - 3 - line 4, line 6
2 - 4 - line 1, line 5
1 - 6 - line 2
1 - 11 - line 3
下面是一個不那么野蠻的格式:
from itertools import groupby
counts = [s.count('x') for s in data.split('\n')]
for k, g in groupby(sorted(enumerate(counts),
key=lambda x: x[1]),
lambda x: x[1]):
G = list(g)
print(f'{len(G)} - {k} - '+', '.join([f'line {x[0]+1}' for x in G]))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.