简体   繁体   中英

How do I slice a two column pandas dataframe starting with a row containing a given string?

df = pd.DataFrame({'A':['A','B','C','D'],
                   'B':[4,5,6,7]})

AB
A 4
B 5
C 6
D 7

I would like to return a way to return all rows starting at a given string, say 'B' in column A.

AB
B 5
C 6
D 7

Go Deacs!

If the string always exists, you can use idxmax() with a condition Series to find out the index of the first appearance of the string and then use tail() method to extract rows after the index:

df.tail(-(df.A == "B").idxmax())   # this method works if the string exists in the column
# and the index of the data frame is a normal sequence as given by range(n)

#   A   B
#1  B   5
#2  C   6
#3  D   7

Another probably safer method, which still works even if the string doesn't exist in the column:

df[(df.A == "B").cumsum().astype(bool)]  

#   A   B
#1  B   5
#2  C   6
#3  D   7

Presuming the data in column A is sorted in alphabetical order, you could use subsets, that is something like

df[df['A'] >= 'B']

would do the trick.

You can use this solution, if column A is not sorted in alphabetical order.

Also, this will start the data frame from the row where B occurs for the first time in column A , if column A contains more than one value B .

idx = df[df['A'] == 'B'].index[0]
df = df[idx:]
print(df)
   A  B
1  B  5
2  C  6
3  D  7

An answer that generalizes well could use numpy.argwhere

idx = np.argwhere(df.A == 'B')[0][0]
df.iloc[idx:]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM