I have a data set with two columns and I need to change it from this format:
10 1
10 5
10 3
11 5
11 4
12 6
12 2
to this
10 1 5 3
11 5 4
12 6 2
I need every unique value in the first column to be on its own row.
I am a beginner with Python and beyond reading in my text file, I'm at a loss for how to proceed.
You can use Pandas dataframes.
import pandas as pd
df = pd.DataFrame({'A':[10,10,10,11,11,12,12],'B':[1,5,3,5,4,6,2]})
print(df)
Output:
A B
0 10 1
1 10 5
2 10 3
3 11 5
4 11 4
5 12 6
6 12 2
Let's use groupby
and join
:
df.groupby('A')['B'].apply(lambda x:' '.join(x.astype(str)))
Output:
A
10 1 5 3
11 5 4
12 6 2
Name: B, dtype: object
an example using itertools.groupby
only; this is all in the python standard library (although the pandas
version is way more concise!).
assuming the keys you want to group are adjacent this could all be done lazily (no need to have all your data in-memory at any time):
from io import StringIO
from itertools import groupby
text = '''10 1
10 5
10 3
11 5
11 4
12 6
12 2'''
# read and group data:
with StringIO(text) as file:
keys = []
res = {}
data = (line.strip().split() for line in file)
for k, g in groupby(data, key=lambda x: x[0]):
keys.append(k)
res[k] = [item[1] for item in g]
print(keys) # ['10', '11', '12']
print(res) # {'12': ['6', '2'], '10': ['1', '5', '3'], '11': ['5', '4']}
# write grouped data:
with StringIO() as out_file:
for key in keys:
out_file.write('{:3s}'.format(key))
out_file.write(' '.join(['{:3s}'.format(item) for item in res[key]]))
out_file.write('\n')
print(out_file.getvalue())
# 10 1 5 3
# 11 5 4
# 12 6 2
you can then replace the with StringIO(text) as file:
with something like with open('infile.txt', 'r') as file
for the program to read your actual file (and similar for the output file with open('outfile.txt', 'w')
).
again: of course you could directly write to the output file every time a key is found; this way you would not need to have all the data in-memory at any time:
with StringIO(text) as file, StringIO() as out_file:
data = (line.strip().split() for line in file)
for k, g in groupby(data, key=lambda x: x[0]):
out_file.write('{:3s}'.format(k))
out_file.write(' '.join(['{:3s}'.format(item[1]) for item in g]))
out_file.write('\n')
print(out_file.getvalue())
Using collections.defaultdict subclass:
import collections
with open('yourfile.txt', 'r') as f:
d = collections.defaultdict(list)
for k,v in (l.split() for l in f.read().splitlines()): # processing each line
d[k].append(v) # accumulating values for the same 1st column
for k,v in sorted(d.items()): # outputting grouped sequences
print('%s %s' % (k,' '.join(v)))
The output:
10 1 5 3
11 5 4
12 6 2
Using pandas
may be easier. You can use read_csv
function to read txt
file where data is separated by space or spaces.
import pandas as pd
df = pd.read_csv("input.txt", header=None, delimiter="\s+")
# setting column names
df.columns = ['col1', 'col2']
df
This is will give output of dataframe
as:
col1 col2
0 10 1
1 10 5
2 10 3
3 11 5
4 11 4
5 12 6
6 12 2
After reading txt
file to dataframe
, similar to apply
in previous other answer , you can also use aggregate
and join
:
df_combine = df.groupby('col1')['col2'].agg(lambda col: ' '.join(col.astype('str'))).reset_index()
df_combine
Output:
col1 col2
0 10 1 5 3
1 11 5 4
2 12 6 2
I found this solution using dictonaries:
with open("data.txt", encoding='utf-8') as data:
file = data.readlines()
dic = {}
for line in file:
list1 = line.split()
try:
dic[list1[0]] += list1[1] + ' '
except KeyError:
dic[list1[0]] = list1[1] + ' '
for k,v in dic.items():
print(k,v)
OUTPUT
10 1 5 3
11 5 4
12 6 2
Something more functional
def getdata(datafile):
with open(datafile, encoding='utf-8') as data:
file = data.readlines()
dic = {}
for line in file:
list1 = line.split()
try:
dic[list1[0]] += list1[1] + ' '
except KeyError:
dic[list1[0]] = list1[1] + ' '
for k,v in dic.items():
v = v.split()
print(k, ':',v)
getdata("data.txt")
OUTPUT
11 : ['5', '4']
12 : ['6', '2']
10 : ['1', '5', '3']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.