简体   繁体   English

为pandas行中的逗号分隔字符串生成组合

[英]Generate combinations for a comma separated strings in a pandas row

I have a dataframe like this: 我有这样的数据帧:

ID, Values
1   10, 11, 12, 13
2   14
3   15, 16, 17, 18

I want to create a new dataframe like this: 我想创建一个像这样的新数据框:

ID COl1 Col2
1  10   11
1  11   12
1  12   13
2  14
3  15   16
3  16   17
3  17   18

Please help me in how to do this??? 请帮我怎么做??? Note: The rows in Values column of input df are str type. 注意:输入df的Values列中的行是str类型。

Use list comprehension with flattening and small change - if i > 0: to if i == 2: for correct working with one element values: 使用列表理解与展平和小变化 - if i > 0: if i == 2:正确使用一个元素值:

from collections import deque

#https://stackoverflow.com/a/36586925
def chunks(iterable, chunk_size=2, overlap=1):
    # we'll use a deque to hold the values because it automatically
    # discards any extraneous elements if it grows too large
    if chunk_size < 1:
        raise Exception("chunk size too small")
    if overlap >= chunk_size:
        raise Exception("overlap too large")
    queue = deque(maxlen=chunk_size)
    it = iter(iterable)
    i = 0
    try:
        # start by filling the queue with the first group
        for i in range(chunk_size):
            queue.append(next(it))
        while True:
            yield tuple(queue)
            # after yielding a chunk, get enough elements for the next chunk
            for i in range(chunk_size - overlap):
                queue.append(next(it))
    except StopIteration:
        # if the iterator is exhausted, yield any remaining elements
        i += overlap
        if i == 2:
            yield tuple(queue)[-i:]

L = [[x] + list(z) for x, y in zip(df['ID'], df['Values']) for z in (chunks(y.split(', ')))]

df = pd.DataFrame(L, columns=['ID','Col1','Col2']).fillna('')
print (df)
   ID Col1 Col2
0   1   10   11
1   1   11   12
2   1   12   13
3   2   14     
4   3   15   16
5   3   16   17
6   3   17   18

Tried slightly different approach. 尝试略有不同的方法。 Created a function which will return numbers in pairs from the initial comma separated string. 创建了一个函数,它将从最初的逗号分隔字符串中成对返回数字。

def pairup(mystring):
    """Function to return paired up list from string"""
    mylist = mystring.split(',')
    if len(mylist) == 1: return [mylist]
    splitlist = []
    for index, item in enumerate(mylist):
        try:
            splitlist.append([mylist[index], mylist[index+1]])
        except:
            pass
    return splitlist

Now let's create the new data frame. 现在让我们创建新的数据框。

# https://stackoverflow.com/a/39955283/3679377
new_df = df[['ID']].join(
    df.Values.apply(lambda x: pd.Series(pairup(x)))
      .stack()
      .apply(lambda x: pd.Series(x))
      .fillna("")
      .reset_index(level=1, drop=True), 
    how='left').reset_index(drop=True)
new_df.columns = ['ID', 'Col 1', 'Col 2']

Here's the output of print(new_df) . 这是print(new_df)的输出。

   ID Col 1 Col 2
0   1    10    11
1   1    11    12
2   1    12    13
3   2    14      
4   3    15    16
5   3    16    17
6   3    17    18

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM