df = pd.DataFrame({'A':['A','B','C','D'],
'B':[4,5,6,7]})
AB
A 4
B 5
C 6
D 7
I would like to return a way to return all rows starting at a given string, say 'B' in column A.
AB
B 5
C 6
D 7
Go Deacs!
If the string always exists, you can use idxmax()
with a condition Series to find out the index of the first appearance of the string and then use tail()
method to extract rows after the index:
df.tail(-(df.A == "B").idxmax()) # this method works if the string exists in the column
# and the index of the data frame is a normal sequence as given by range(n)
# A B
#1 B 5
#2 C 6
#3 D 7
Another probably safer method, which still works even if the string doesn't exist in the column:
df[(df.A == "B").cumsum().astype(bool)]
# A B
#1 B 5
#2 C 6
#3 D 7
Presuming the data in column A is sorted in alphabetical order, you could use subsets, that is something like
df[df['A'] >= 'B']
would do the trick.
You can use this solution, if column A
is not sorted in alphabetical order.
Also, this will start the data frame from the row where B
occurs for the first time in column A
, if column A
contains more than one value B
.
idx = df[df['A'] == 'B'].index[0]
df = df[idx:]
print(df)
A B
1 B 5
2 C 6
3 D 7
An answer that generalizes well could use numpy.argwhere
idx = np.argwhere(df.A == 'B')[0][0]
df.iloc[idx:]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.