I would like to apply a function to a pandas DataFrame that splits some of the rows into two. So for example, I may have this as input:
df = pd.DataFrame([{'one': 3, 'two': 'a'}, {'one': 5, 'two': 'b,c'}], index=['i1', 'i2'])
one two
i1 3 a
i2 5 b,c
And I want something like this as output:
one two
i1 3 a
i2_0 5 b
i2_1 5 c
My hope was that I could just use apply() on the data frame, calling a function that returns a dataframe with 1 or more rows itself, which would then get merged back together. However, this does not seem to work at all. Here is a test case where I am just trying to duplicate each row:
dfa = df.apply(lambda s: pd.DataFrame([s.to_dict(), s.to_dict()]), axis=1)
one two
i1 one two
i2 one two
So if I return a DataFrame, the column names of that DataFrame seem to become the contents of the rows. This is obviously not what I want.
There is another question on here that was solved by using .groupby()
, however I don't think this applies to my case since I don't actually want to group by anything.
What is the correct way to do this?
You have a messed up database (comma separated string where you should have separate columns). We first fix this:
df2 = pd.concat([df['one'], pd.DataFrame(df.two.str.split(',').tolist(), index=df.index)], axis=1)
Which gives us something more neat as
In[126]: df2
Out[126]:
one 0 1
i1 3 a None
i2 5 b c
Now, we can just do
In[125]: df2.set_index('one').unstack().dropna()
Out[125]:
one
0 3 a
5 b
1 5 c
Adjusting the index (if desired) is trivial and left to the reader as an exercise.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.