简体   繁体   中英

Get row and column in Pandas for a cell with a certain value

I am trying to read an Excel spreadsheet that is unformatted using Pandas. There are multiple tables within a single sheet and I want to convert these tables into dataframes. Since it is not already "indexed" in the traditional way, there are no meaningful column or row indices. Is there a way to search for a specific value and get the row, column where that is? For example, say I want to get a row, column number for all cells that contain the string "Title".

I have already tried things like DataFrame.filter but that only works if there are row and column indices.

Create a df with NaN where your_value is not found.
Drop all rows that don't contain the value.
Drop all columns that don't contain the value

a = df.where(df=='your_value').dropna(how='all').dropna(axis=1)

To get the row(s)

a.index

To get the columns(s)

a.columns  

You can simply create a mask of the same shape than your df by calling df == 'title' . You can then combines this with the df.where() method, which will set all fields to NA that are different to your keyword, and finally you can use dropna() to reduce it to all valid fields. Then you can use the df.columnns and df.index like you're use to.

df = pd.DataFrame({"a": [0,1,2], "b": [0, 9, 7]})
print(df.where(df == 0).dropna().index)
print(df.where(df == 0).dropna().columns)

#Int64Index([0], dtype='int64')
#Index(['a', 'b'], dtype='object')

You can do some long and hard to read list comprehension:

# assume this df and that we are looking for 'abc'
df = pd.DataFrame({'col':['abc', 'def','wert','abc'], 'col2':['asdf', 'abc', 'sdfg', 'def']})

[(df[col][df[col].eq('abc')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].eq('abc')].index))]

out:

[(0, 0), (3, 0), (1, 1)]

I should note that this is (index value, column location)

you can also change .eq() to str.contains() if you are looking for any strings that contains a certain value:

[(df[col][df[col].str.contains('ab')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].str.contains('ab')].index))]

Here's an example to fetch all the row and column index of the cells containing word 'title' -

df = pd.DataFrame({'A':['here goes the title', 'tt', 'we have title here'],
                  'B': ['ty', 'title', 'complex']})
df


+---+---------------------+---------+
|   |          A          |    B    |
+---+---------------------+---------+
| 0 | here goes the title | ty      |
| 1 | tt                  | title   |
| 2 | we have title here  | complex |
+---+---------------------+---------+


idx = df.apply(lambda x: x.str.contains('title'))

col_idx = []
for i in range(df.shape[1]):
    col_idx.append(df.iloc[:,i][idx.iloc[:,i]].index.tolist())


out = []
cnt = 0
for i in col_idx:
    for j in range(len(i)):
        out.append((i[j], cnt))
    cnt += 1
out

# [(0, 0), (2, 0), (1, 1)]   # Expected output

Another approach that's in the vein of @It_is_Chris's solution, but may be a little easier to read:

# assuming this df and that we are looking for 'abc'
df = pd.DataFrame({'col':['abc', 'def','wert','abc'], 'col2':['asdf', 'abc', 'sdfg', 'def']})
[x[1:] for x in ((v, i, j) for i, row_tup in enumerate(df.itertuples(index=False)) for j, v in enumerate(row_tup)) if x[0] == "abc"]

Output

[(0, 0), (1, 1), (3, 0)]

如果第二个 dropna 得到 how='all',@firefly 的答案也有效,就像这样:

a = df.where(targetMap == 'your_value').dropna(how='all').dropna(how='all',axis=1)

Similar to what Chris said, I found this to work for me, although it's not the prettiest or shortest way. This returns all the row,column pairs matching a regular expression in a dataframe:

for row in df.itertuples():
    col_count = 0
    for col in row:
        if regex.match(str(col)):
            tuples.append((row_count, col_count))
            col_count+=1
        row_count+=1

return tuples

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM