Not sure how to return the desired output for k for the condition on the string "START".
The column names dont need to be used/retained.
NOTE : I will not know in which row and column my string will be, hence the use of the for/if statements.
Input:
df = pd.DataFrame({'A': ['43', 23, 'Ndfg', 34, 0, 56],
'B': ['5', 23, 'START', 89, 0, 4],
'C': ['65', 7, 'dsfgA', 65, 47, 3],
'D': ['65', 7, 'gfd', 3, 0, 7],
'E': ['76', 7, 'Start', 5, 12, 1],
'F': ['65', 7, 'sdfA', 5, 0, 4],
'G': ['12', 7, 'START', 5, 8, 9],
'H': ['89', 7, 'gfA', 5, 0, 8],
'I': ['23', 7, 'sdfA', 5, 7, 23]})
k = []
for rw in range(df.shape[0]):
for colm in range(df.shape[1]):
if df.iloc[rw, colm] == 'START':
row = rw + 1
col = colm
k.append(df.iloc[row:, col:])
break
Output : df and k
df
A B C D E F G H I
0 43 5 65 65 76 65 12 89 23
1 23 23 7 7 7 7 7 7 7
2 Ndfg START dsfgA gfd Start sdfA START gfA sdfA
3 34 89 65 3 5 5 5 5 5
4 0 0 47 0 12 0 8 0 7
5 56 4 3 7 1 4 9 8 23
k
[ B C D E F G H I
3 89 65 3 5 5 5 5 5
4 0 47 0 12 0 8 0 7
5 4 3 7 1 4 9 8 23]
Desired Output for k : Only the columns of numbers under "Start"
[ B E G
3 89 5 5
4 0 12 8
5 4 1 9]
(df[df.astype(str)
.apply(lambda s: s.str.lower())
.eq("start")
.cummax()]
.dropna(how="all", axis=0)
.dropna(how="all", axis=1)
.iloc[1:])
get a True/False mask dataframe indicating value is "start" or not; to do this
now we have
In [350]: df
Out[350]:
A B C D E F G H I
0 43 5 65 65 76 65 12 89 23
1 23 23 7 7 7 7 7 7 7
2 Ndfg START dsfgA gfd Start sdfA START gfA sdfA
3 34 89 65 3 5 5 5 5 5
4 0 0 47 0 12 0 8 0 7
5 56 4 3 7 1 4 9 8 23
In [351]: df.astype(str).apply(lambda s: s.str.lower()).eq("start")
Out[351]:
A B C D E F G H I
0 False False False False False False False False False
1 False False False False False False False False False
2 False True False False True False True False False
3 False False False False False False False False False
4 False False False False False False False False False
5 False False False False False False False False False
In [352]: _.cummax()
Out[352]:
A B C D E F G H I
0 False False False False False False False False False
1 False False False False False False False False False
2 False True False False True False True False False
3 False True False False True False True False False
4 False True False False True False True False False
5 False True False False True False True False False
(_ is the last output above this)
df
with itIn [357]: df[_]
Out[357]:
A B C D E F G H I
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN START NaN NaN Start NaN START NaN NaN
3 NaN 89 NaN NaN 5 NaN 5 NaN NaN
4 NaN 0 NaN NaN 12 NaN 8 NaN NaN
5 NaN 4 NaN NaN 1 NaN 9 NaN NaN
how="all"
In [358]: _.dropna(how="all", axis=0).dropna(how="all", axis=1)
Out[358]:
B E G
2 START Start START
3 89 5 5
4 0 12 8
5 4 1 9
.iloc[1:]
it outIn [359]: _.iloc[1:]
Out[359]:
B E G
3 89 5 5
4 0 12 8
5 4 1 9
# this is 6000 x 9 In [364]: df = pd.concat([df] * 1000, ignore_index=True) In [366]: %%timeit...: (...: df...: .applymap(lambda x: type(x) == str and x.upper() == "START")...: .stack()...: .loc[lambda x: x]...: .reset_index()...: .apply(lambda x: df.loc[slice(x["level_0"] + 1, None), x['level_1']], axis=1)...: .T...: )...: ...: 4.11 s ± 43.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [367]: %%timeit...: (df[df.astype(str)...: .apply(lambda s: s.str.lower())...: .eq("start").cummax()]...: .dropna(how="all", axis=0)...: .dropna(how="all", axis=1)...: .iloc[1:])...: 27.1 ms ± 344 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
~150 times
One way to do:
df2 = df.apply(lambda x:x.str.upper()) #First convert str to upper
# Find the index where the START may appear - I am assuming there is only one such row.
df2 = df2.query('|'.join(df2.columns.map('{}=="START"'.format).tolist()))
row_index = df2.index[0] #get the index where START appears
s = df2.eq('START').iloc[0] #Find 'START'
col_index = s[s].index # get the columns where START appears
df.iloc[row_index+1:][col_index] # Then locate
output:
B E G
3 89 5 5
4 0 12 8
5 4 1 9
(
df
.applymap(lambda x: type(x) == str and x.upper() == "START")
.stack()
.loc[lambda x: x]
.reset_index()
.apply(lambda x: df.loc[slice(x["level_0"] + 1, None), x['level_1']], axis=1)
.T
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.