简体   繁体   中英

Extract all elements from sets in pandas DataFrame

I have a pandas DataFrame where each cell is a set of numbers. I would like to go through the DataFrame and run each number along with the row index in a function. What's the most pandas-esque and efficient way to do this? Here's an example of one way to do it with for-loops, but I'm hopeful that there's a better approach.

def my_func(a, b):
    pass


d = {"a": [{1}, {4}], "b": [{1, 2, 3}, {2}]}
df = pd.DataFrame(d)

for index, item in df.iterrows():
    for j in item:
        for a in list(j):
            my_func(index, a)

Instead of iterating we can reshape the values into 1 column using stack then explode into separate rows:

s :

df.stack().explode()
0  a    1
   b    1
   b    2
   b    3
1  a    4
   b    2
dtype: object

We can further droplevel if we don't want the old column labels:

s = df.stack().explode().droplevel(1)

s :

0    1
0    1
0    2
0    3
1    4
1    2
dtype: object

reset_index can be used to create a DataFrame instead of a Series:

new_df = df.stack().explode().droplevel(1).reset_index()
new_df.columns = ['a', 'b']  # Rename columns to whatever

new_df :

   a  b
0  0  1
1  0  1
2  0  2
3  0  3
4  1  4
5  1  2

If i fully understood your problem. This might be one way of doing it:

[list(item) for sublist in df.values.tolist() for item in sublist]

The output will look like this:

[[1], [1, 2, 3], [4], [2]]

Since this is a nested list, you can flatten it if your requirement is a single list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM