简体   繁体   English

如何将函数应用于多列以在Pandas中创建多列?

[英]How to apply a function to multiple columns to create multiple columns in Pandas?

I am trying to apply a function on multiple columns and in turn create multiple columns to count the length of each entry. 我试图在多个列上应用一个函数,然后创建多个列以计算每个条目的长度。

Basically I have 5 columns with indexes 5,7,9,13 and 15 and each entry in those columns is a string of the form 'WrappedArray(|2008-11-12, |2008-11-12)' and in my function I try to strip the wrappedArray part and split the two values and count the (length - 1) using the following; 基本上,我有5列,分别具有索引5、7、9、13和15,这些列中的每个条目都是形式为'WrappedArray(|2008-11-12, |2008-11-12)'的字符串,并且在我的函数中我尝试剥离wrappedArray部分并拆分两个值,并使用以下方法计算(length - 1)

def updates(row,num_col):
    strp = row[num_col.strip('WrappedAway')
    lis  = list(strp.split(','))
    return len(lis) - 1

where num_col is the index of the column and cal take the value 5,7,9,13,15. 其中num_col是列的索引,并且cal取值5,7,9,13,15。 I have done this but only for 1 column: 我已经做到这一点,但仅适用于1列:

fn = lambda row: updates(row,5)
col = df.apply(fn, axis=1)
df = df.assign(**{'count1':col.values})

I basically want to apply this function to ALL the columns (not just 5 as above) with the indexes mentioned and then create a separate column associated with columns 5,7,9,13 and 15 all in short code instead of doing that separately for each value. 我基本上想将此功能应用到具有索引的所有列(不仅仅是上面的5个),然后用短代码创建一个与列5,7,9,13和15关联的单独列,而不是分别针对每个值。

I hope I made sense. 我希望我有道理。

You are confusing row-wise and column-wise operations by trying to do both in one function. 通过尝试在一个函数中同时执行,会混淆行和列操作。 Choose one or the other. 选择一个或另一个。 Column-wise operations are usually more efficient and you can utilize Pandas str methods. 逐列操作通常更有效,您可以利用Pandas str方法。

Setup 设定

df = pd.DataFrame({'A': ['WrappedArray(|2008-11-12, |2008-11-12, |2008-10-11)', 'WrappedArray(|2008-11-12, |2008-11-12)'],
                   'B': ['WrappedArray(|2008-11-12,|2008-11-12)', 'WrappedArray(|2008-11-12|2008-11-12)']})

Logic 逻辑

# perform operations on strings in a series
def calc_length(series):
    return series.str.strip('WrappedAway').str.split(',').str.len() - 1

# apply to each column and join to original dataframe
df = df.join(df.apply(calc_length).add_suffix('_Length'))

Result 结果

print(df)

                                                   A  \
0  WrappedArray(|2008-11-12, |2008-11-12, |2008-1...   
1             WrappedArray(|2008-11-12, |2008-11-12)   

                                       B  A_Length  B_Length  
0  WrappedArray(|2008-11-12,|2008-11-12)         2         1  
1   WrappedArray(|2008-11-12|2008-11-12)         1         0  

I think we can use pandas str.count() 我认为我们可以使用熊猫str.count()

df= pd.DataFrame({
    "col1":['WrappedArray(|2008-11-12, |2008-11-12)',
            'WrappedArray(|2018-11-12, |2017-11-12, |2018-11-12)'],
    "col2":['WrappedArray(|2008-11-12, |2008-11-12,|2008-11-12,|2008-11-12)',
            'WrappedArray(|2018-11-12, |2017-11-12, |2018-11-12)']})
df["col1"].str.count(',')

In regards to finding the amount of elements in the list, looks like you could simply use str.count() to find the amount of ',' in the strings. 关于查找列表中元素的数量,看起来您可以简单地使用str.count()查找字符串中的','数量。 And in order to apply a defined function to a set of columns you could do something like: 为了将定义的函数应用于一组列,您可以执行以下操作:

cols = [5,7,9,13,15]

for col in cols:
    col_counts = {'{}_count'.format(col): df.iloc[:,col].apply(lambda x: x.count(','))}
    df = df.assign(**col_counts)

Alternatively you can also use strip('WrappedAway').split(',') as you where using: 另外,您也可以在以下位置使用strip('WrappedAway').split(',')

def count_elements(x):
    return len(x.strip('WrappedAway').split(',')) - 1

for col in cols:
    col_counts = {'{}_count'.format(col): 
                   df.iloc[:,col].apply(count_elements)}
    df = df.assign(**col_counts)

So for example with the following dataframe: 因此,例如以下数据框:

df = pd.DataFrame({'A': ['WrappedArray(|2008-11-12, |2008-11-12, |2008-10-11)', 'WrappedArray(|2008-11-12, |2008-11-12)'],
               'B': ['WrappedArray(|2008-11-12,|2008-11-12)', 'WrappedArray(|2008-11-12, |2008-11-12)'],
               'C': ['WrappedArray(|2008-11-12|2008-11-12)', 'WrappedArray(|2008-11-12|2008-11-12)']})

Redefining the set of columns on which we want to count the amount of elements: 重新定义我们要在其上计算元素数量的列集:

for col in [0,1,2]:
    col_counts = {'{}_count'.format(col): 
                  df.iloc[:,col].apply(count_elements)}
    df = df.assign(**col_counts)

Would yield: 将产生:

 A  \
0  WrappedArray(|2008-11-12, |2008-11-12, |2008-1...   
1             WrappedArray(|2008-11-12, |2008-11-12)   

                                    B  \
0   WrappedArray(|2008-11-12,|2008-11-12)   
1  WrappedArray(|2008-11-12, |2008-11-12)   

                                  C         0_count  1_count  2_count  
0  WrappedArray(|2008-11-12|2008-11-12)        2        1        0  
1  WrappedArray(|2008-11-12|2008-11-12)        1        1        0 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM