I have a pandas DataFrame where each cell is a set of numbers. I would like to go through the DataFrame and run each number along with the row index in a function. What's the most pandas-esque and efficient way to do this? Here's an example of one way to do it with for-loops, but I'm hopeful that there's a better approach.
def my_func(a, b):
pass
d = {"a": [{1}, {4}], "b": [{1, 2, 3}, {2}]}
df = pd.DataFrame(d)
for index, item in df.iterrows():
for j in item:
for a in list(j):
my_func(index, a)
Instead of iterating we can reshape the values into 1 column using stack
then explode
into separate rows:
s
:
df.stack().explode()
0 a 1
b 1
b 2
b 3
1 a 4
b 2
dtype: object
We can further droplevel
if we don't want the old column labels:
s = df.stack().explode().droplevel(1)
s
:
0 1
0 1
0 2
0 3
1 4
1 2
dtype: object
reset_index
can be used to create a DataFrame instead of a Series:
new_df = df.stack().explode().droplevel(1).reset_index()
new_df.columns = ['a', 'b'] # Rename columns to whatever
new_df
:
a b
0 0 1
1 0 1
2 0 2
3 0 3
4 1 4
5 1 2
If i fully understood your problem. This might be one way of doing it:
[list(item) for sublist in df.values.tolist() for item in sublist]
The output will look like this:
[[1], [1, 2, 3], [4], [2]]
Since this is a nested list, you can flatten it if your requirement is a single list.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.