简体   繁体   English

如何在新的列熊猫数据框中获取逗号分隔的值?

[英]How to get comma separated values in new column pandas dataframe?

I have the following dataframe 我有以下数据框

   import pandas as pd


def remove_dup(string):
    temp=string.split(',')
    temp=[x.strip() for x in temp]
    return ','.join(set(temp))

compnaies = ['Microsoft', 'Google', 'Amazon', 'Microsoft', 'Facebook', 'Google','Google']
products = ['OS', 'Search', 'E-comm', 'X-box', 'Social Media', 'Android','Search']

df = pd.DataFrame({'company' : compnaies, 'product':products })

new_df=df.groupby('company').product.agg([('Number', 'count'), ('Product list', ', '.join)]).reset_index()

#create uniquevalues
new_df['uniquevalues']=new_df['Product list'].apply(remove_dup)

#create uniquecount
new_df['uniquecount']=new_df['uniquevalues'].str.split(',').str.len()

How to get comma seperated values in new column 如何在新列中获取逗号分隔的值

ie: Each new unique product as seperated column as shown in expected column : Expected Output: 即:每个新的唯一产品作为单独的列,如预期列所示:预期输出:

    company Number  Product list    uniquevalues    uniquecount uniqueProduct 1 uniqueProduct 1 Count uniqueProduct2 uniqueProduct2 Count
    0   Amazon      1   E-comm                 E-comm      1      E-comm             1
    1   Facebook    1   Social Media       Social Media    1      Social Media     1
    2   Google      3   Search, Android,   Android,Search  2      Android          1                 Search                2
                               Search               
    3   Microsoft   2   OS, X-box           X-box,OS       2       X-box              1              Os                      1

Use split with expand=True , change columns names and new column uniquecount is count by DataFrame.count for avoid double split : splitexpand=True ,更改列名称,新列的uniquecountDataFrame.count计数,以避免两次split

new_df=df.groupby('company').product.agg([('Number', 'count'), 
                                          ('Product list', ', '.join)]).reset_index()

#create uniquevalues
new_df['uniquevalues']=new_df['Product list'].apply(remove_dup)

df1 = new_df['uniquevalues'].str.split(',', expand=True)
df1.columns = ['uniqueProduct{}'.format(x+1) for x in df1.columns]

new_df['uniquecount'] = df1.count(axis=1)
new_df = new_df.join(df1)
print (new_df)
     company  Number             Product list    uniquevalues  uniquecount  \
0     Amazon       1                   E-comm          E-comm            1   
1   Facebook       1             Social Media    Social Media            1   
2     Google       3  Search, Android, Search  Search,Android            2   
3  Microsoft       2                OS, X-box        OS,X-box            2   

  uniqueProduct1 uniqueProduct2  
0         E-comm           None  
1   Social Media           None  
2         Search        Android  
3             OS          X-box  

If want replace None to empty list add fillna to last row of code: 如果想要将None替换为空列表,则将fillna添加到代码的最后一行:

new_df = new_df.join(df1.fillna(''))
print (new_df)
     company  Number             Product list    uniquevalues  uniquecount  \
0     Amazon       1                   E-comm          E-comm            1   
1   Facebook       1             Social Media    Social Media            1   
2     Google       3  Search, Android, Search  Search,Android            2   
3  Microsoft       2                OS, X-box        OS,X-box            2   

  uniqueProduct1 uniqueProduct2  
0         E-comm                 
1   Social Media                 
2         Search        Android  
3             OS          X-box  

EDIT: 编辑:

df = pd.DataFrame({'company' : compnaies, 'product':products })

def f(x):
    count = x.count()
    join = ','.join(x)
    uniq = ','.join(x.unique())
    uniqc = x.nunique()
    vals = [count, join, uniq, uniqc]
    names1 = ['Number','list','uniquevalues','uniquecount']

    s = [y for x in list(x.value_counts().items()) for y in x]
    L = ['uniqueProduct','count']
    names = ['{}{}'.format(x, y) for y in range(1, len(s)//2+1) for x in L]
    return pd.DataFrame([vals + s], columns=names1 + names)

new_df = (df.groupby('company')['product'].apply(f)
           .reset_index(level=1, drop=True)
           .reset_index()
           .fillna(''))

print (new_df)
     company  Number                   list    uniquevalues  uniquecount  \
0     Amazon       1                 E-comm          E-comm            1   
1   Facebook       1           Social Media    Social Media            1   
2     Google       3  Search,Android,Search  Search,Android            2   
3  Microsoft       2               OS,X-box        OS,X-box            2   

  uniqueProduct1  count1 uniqueProduct2 count2  
0         E-comm       1                        
1   Social Media       1                        
2         Search       2        Android      1  
3             OS       1          X-box      1  

you entire solution at once which covers this question: How to give column names after count and joins? 您一次解决此问题的整个解决方案: 如何计算和联接后的列名?

df1 = df.groupby('company').product.agg([('count', 'count'), ('product', ', '.join)]).reset_index()

df1 = df1.drop('company',axis=1).join(df.groupby('company')['product'].unique().reset_index(),rsuffix='_unique')

df1['unique_values'] =[len(df1.product_unique[i]) for i in list(df1.product_unique.index)]

df1.product_unique = [(",".join(df1.product_unique[n])) for n in list(df1.product_unique.index)]
df1.join(df1.product_unique.str.split(",",expand=True))

You can then rename the columns: - df1.rename(columns={0:'Unique1',1:'Unique2'},inplace=True) 然后,您可以重命名列: df1.rename(columns={0:'Unique1',1:'Unique2'},inplace=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用逗号在CSV中给逗号分隔的值添加一个新列? - How to give comma separated values a new column in csv with pandas? groupby逗号分隔值在单个DataFrame列python / pandas中 - groupby comma-separated values in single DataFrame column python/pandas Python Pandas为逗号分隔的值提供新列 - Python pandas give comma separated values new column pandas 合并列以使用逗号分隔值创建新列 - pandas merge columns to create new column with comma separated values 如果列中包含逗号分隔的元素,如何将逗号分隔的 `csv` 文件读入 pandas dataframe? - How to read a comma separated `csv` file into pandas dataframe if it contains a comma separated elements in a column? pandas:来自dict的数据帧,以逗号分隔的值 - pandas: dataframe from dict with comma separated values 如何为熊猫中的列中的每个逗号分隔值创建一个新行 - How to create a new row for each comma separated value in a column in pandas 如何在 pandas 的单个列中合并(逗号分隔的)行值? - How to combine (comma-separated) row values in a single column in pandas? 在新列中计算 dataframe 中的逗号分隔字符串 - Counting comma separated string in dataframe in a new column 通过使用基于另一个 dataframe 的查找替换逗号分隔列的值来创建新列 - Create a new column by replacing comma-separated column's values with a lookup based on another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM