简体   繁体   中英

String Compression in Python

I have the following input:

 my_list = ["x d1","y d1","z d2","t d2"]

And would like to transform it into:

Expected_result = ["d1(x,y)","d2(z,t)"]

I had to use brute force, and also had to call pandas to my rescue, since I didn't find any way to do it in plain/vanilla python. Do you have any other way to solve this?

import pandas as pd 

my_list = ["x d1","y d1","z d2","t d2"]

df = pd.DataFrame(my_list,columns=["col1"])

df2 = df["col1"].str.split(" ",expand = True)
df2.columns = ["col1","col2"]
grp = df2.groupby(["col2"])

result = []
for grp_name, data in grp:
  res =  grp_name +"(" + ",".join(list(data["col1"])) + ")"
  result.append(res)
print(result)
  1. The code defines an empty dictionary.
  2. It then iterates over each item in your list and uses the split() method to split item into a key and a value .
  3. Then uses thesetdefault() method to add the key and the value to the empty dictionary. If the value already exists as a key in the dictionary, it appends the key to that value's existing list of keys. And if the value does not exist as a key in the dictionary, it creates a new key-value pair with the value as the key and the key as the first element in the new list.
  4. Finally, the list comprehension iterates over the items in the dictionary and creates a string for each key-value pair using join() method to concatenate the keys in the value list into a single string.
result = {}

for item in my_list:
    key, value = item.split()
    result.setdefault(value, []).append(key)
    
output = [f"{k}({', '.join(v)})" for k, v in result.items()]
print(output)

['d1(x, y)', 'd2(z, t)']

If your values are already sorted by key (d1, d2), you can use itertools.groupby :

from itertools import groupby

out = [f"{k}({','.join(x[0] for x in g)})"
       for k, g in groupby(map(str.split, my_list), lambda x: x[1])]

Output:

['d1(x,y)', 'd2(z,t)']

Otherwise you should use a dictionary as shown by @Jamiu .

A variant of your pandas solution:

out = (df['col1'].str.split(n=1, expand=True)
       .groupby(1)[0]
       .apply(lambda g: f"{g.name}({','.join(g)})")
       .tolist()
      )
my_list = ["x d1","y d1","z d2","t d2"]
res = []
 
for item in my_list:

    a, b, *_ = item.split()
 
    if len(res) and b in res[-1]:
            res[-1] = res[-1].replace(')', f',{a})')
    else:
        res.append(f'{b}({a})')

print(res)
['d1(x,y)', 'd2(z,t)']

Let N be the number that follows d, this code works for any number of elements within dN, as long as N is ordered, that is, d1 comes before d2, which comes before d3, ... Works with any value of N, and you can use any letter in the d link as long as it has whatever value is in dN and then dN, keeping that order, "val_in_dN dN"

If you need something that works even if the dN are not in sequence, just say the word, but it will cost a little more

Another possible solution, which is based on pandas :

(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
 .groupby('a')['b'].apply(lambda x: f'({x.values[0]}, {x.values[1]})')
 .reset_index().sum(axis=1).tolist())

Output:

['d1(x, y)', 'd2(z, t)']

EDIT

The OP, @ShckTchamna, would like to see the above solution modified, in order to be more general: The reason of this edit is to provide a solution that works with the example the OP gives in his comment below.

my_list = ["x d1","y d1","z d2","t d2","kk d2","m d3", "n d3", "s d4"] 

(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
 .groupby('a')['b'].apply(lambda x: f'({",".join(x.values)})')
 .reset_index().sum(axis=1).tolist())

Output:

['d1(x,y)', 'd2(z,t,kk)', 'd3(m,n)', 'd4(s)']
import pandas as pd

df = pd.DataFrame(data=[e.split(' ') for e in ["x d1","y d1","z d2","t d2"]])
r = (df.groupby(1)
       .apply(lambda r:"{0}({1},{2})".format(r.iloc[0,1], r.iloc[0,0], r.iloc[1,0]))
       .reset_index()
       .rename({1:"points", 0:"coordinates"}, axis=1)
         )

print(r.coordinates.tolist())
# ['d1(x,y)', 'd2(z,t)']

print(r)
#   points coordinates
# 0    d1     d1(x,y)
# 1    d2     d2(z,t)

In replacement of my previous one (that works too):

import itertools as it

my_list = [e.split(' ') for e in ["x d1","y d1","z d2","t d2"]]

r=[]
for key, group in it.groupby(my_list, lambda x: x[1]):
    l=[e[0] for e in list(group)]
    r.append("{0}({1},{2})".format(key, l[0], l[1]))

print(r)
Output :

['d1(x,y)', 'd2(z,t)']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM