How to create new column of lists in Pandas dataframe using existing column names and values?

Question

I'm currently using R for data science and I'm learning Python and Pandas to expand my toolkit. I'd like to create a new column of lists within a Pandas dataframe using existing column names and values.

For the following Pandas dataframe:

  test1  test2  test3
1      0      1      1
2      0      1      0
3      1      1      1
4      1      0      0
5      0      0      0

A new column would contain a list for each row that takes the column name wherever there's a '1' value, strips the 'test' prefix, and concatenates the list using a '-' separator.

   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

I can create the column in R and data.table using the following code:

df [, combo := apply (df == 1, 1, function(x) {
   paste (gsub("test", "", names(which(x))), collapse = "-")
}
)]

This is the closest I've come in Pandas:

def test(x):
    paste(loc[x])

df['combo'] = df.apply(test, df == 1, axis = 1)

TypeError: apply() got multiple values for argument 'axis'

Am I on the right path?

Answer 1

df['combo'] = df.apply(lambda x: '-'.join(list(x[x == 1].index)).replace('test', ''), axis=1)

produces the following output:

In [8]: df
Out[8]:
   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

The function lambda x: '-'.join(list(x[x == 1].index)).replace('test', '') picks the index of series elements that are equal to 1. The index for rows are the column names test1, test2, test3 , so after joining the list, it is necessary to replace 'test' from the resulting string with '' .

And we need to apply this function along rows, hence we pass axis=1 . The default axis=0 applies the function along columns.

Answer 2

You can first rename column and then use apply to extract the column names and then join them.

df['combo'] = (
   df.rename(columns=lambda x: x.replace('test',''))
   .astype(bool)
   .apply(lambda x: '-'.join(x.loc[x].index), axis=1)
)

df
Out[15]: 
   test1  test2  test3  combo
1      0      1      1    2-3
2      0      1      0      2
3      1      1      1  1-2-3
4      1      0      0      1
5      0      0      0

How to create new column of lists in Pandas dataframe using existing column names and values?

Question

2 answers

solution1
1 ACCPTED 2018-03-05 19:12:52

solution2
0 2018-03-05 22:47:18

How to create new column of lists in Pandas dataframe using existing column names and values?

Question

2 answers

solution1 1 ACCPTED 2018-03-05 19:12:52

solution2 0 2018-03-05 22:47:18

solution1
1 ACCPTED 2018-03-05 19:12:52

solution2
0 2018-03-05 22:47:18