简体   繁体   中英

Iterate through rows in pandas dataframe and match tuples from a list and create a new df column

I have a dataframe with a column of tuples (df.row_col) that I need to search using a list of tuples. If a tuple from the list is present in the dataframe column, I want to return that row and add a new column to the dataframe. I tried this list comprehension, but I'm not sure if I can loop through a list like this. Much appreciate the help!

    data_tuples= 
    
        [(7, 45),
         (13, 34),
         (17, 51),
         (17, 52),
         (17, 53),
         (17, 54),
         (17, 55),
         (18, 50)]
    Dataframe to search:
        index   farm    layer   row column  Qmax    row_col
        0   1   1   3   7   36  0.0 (7, 36)
        1   2   1   3   7   37  0.0 (7, 37)
        2   3   1   3   8   35  0.0 (8, 35)
        3   4   1   3   8   36  0.0 (8, 36)
        4   5   1   3   8   37  0.0 (8, 37)

for tup in data_tuples:
    new_df = df[df["row_col"].apply(lambda x: True if tup in x else False)]
    return new_df

You can use Series.map(...) to accomplish what you're trying to do. First, you can create a boolean mask (a column of True/False) based on whether the tuple is present in data_tuples or not:

tuple_present_in_list = df["row_col"].map(lambda x: x in data_tuples)

Then, you can filter your original DataFrame down to just those rows (if that's what you're trying to do):

new_df = df[tuple_present_in_list]

The key thing here is that .map() applies your logic to a single column (which is a pandas Series) to check each "row_col" value to see if it's in your tuple list.

Here's another answer about the difference between apply and map: Difference between map, applymap and apply methods in Pandas

And here's the pandas documentation for .map() : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html

isin lets you check if a value is in a list (or iterable) object

For example If you have the following:

data_tuples = [
         (8, 36),
         (7, 37)
]

df
+----+-----+---------+--------+---------+-------+----------+--------+-----------+
|    |   a |   index |   farm |   layer |   row |   column |   Qmax | row_col   |
|----+-----+---------+--------+---------+-------+----------+--------+-----------|
|  0 |   0 |       1 |      1 |       3 |     7 |       36 |      0 | (7, 36)   |
|  1 |   1 |       2 |      1 |       3 |     7 |       37 |      0 | (7, 37)   |
|  2 |   2 |       3 |      1 |       3 |     8 |       35 |      0 | (8, 35)   |
|  3 |   3 |       4 |      1 |       3 |     8 |       36 |      0 | (8, 36)   |
|  4 |   4 |       5 |      1 |       3 |     8 |       37 |      0 | (8, 37)   |
+----+-----+---------+--------+---------+-------+----------+--------+-----------+

Then we can use isin function

df[df["row_col"].isin(data_tuples)]

+----+-----+---------+--------+---------+-------+----------+--------+-----------+
|    |   a |   index |   farm |   layer |   row |   column |   Qmax | row_col   |
|----+-----+---------+--------+---------+-------+----------+--------+-----------|
|  1 |   1 |       2 |      1 |       3 |     7 |       37 |      0 | (7, 37)   |
|  3 |   3 |       4 |      1 |       3 |     8 |       36 |      0 | (8, 36)   |
+----+-----+---------+--------+---------+-------+----------+--------+-----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM