简体   繁体   中英

Apply function to pandas dataframe that returns multiple rows

I would like to apply a function to a pandas DataFrame that splits some of the rows into two. So for example, I may have this as input:

df = pd.DataFrame([{'one': 3, 'two': 'a'}, {'one': 5, 'two': 'b,c'}], index=['i1', 'i2'])
    one  two
i1    3    a
i2    5  b,c

And I want something like this as output:

      one  two
i1      3    a
i2_0    5    b
i2_1    5    c

My hope was that I could just use apply() on the data frame, calling a function that returns a dataframe with 1 or more rows itself, which would then get merged back together. However, this does not seem to work at all. Here is a test case where I am just trying to duplicate each row:

dfa = df.apply(lambda s: pd.DataFrame([s.to_dict(), s.to_dict()]), axis=1)
    one  two
i1  one  two
i2  one  two

So if I return a DataFrame, the column names of that DataFrame seem to become the contents of the rows. This is obviously not what I want.

There is another question on here that was solved by using .groupby() , however I don't think this applies to my case since I don't actually want to group by anything.

What is the correct way to do this?

You have a messed up database (comma separated string where you should have separate columns). We first fix this:

df2 = pd.concat([df['one'], pd.DataFrame(df.two.str.split(',').tolist(), index=df.index)], axis=1)

Which gives us something more neat as

In[126]: df2
Out[126]: 
    one  0     1
i1    3  a  None
i2    5  b     c

Now, we can just do

In[125]: df2.set_index('one').unstack().dropna()
Out[125]: 
   one
0  3      a
   5      b
1  5      c

Adjusting the index (if desired) is trivial and left to the reader as an exercise.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM