按開始值和結束值對 pandas dataframe 列進行切片

Question

例如，我有一個看起來像這樣的 dataframe：

0       -- end
1       QQQQ
2       GEO
3       DEF
4       ABC
5       -- start
6       -- end
7       apple
8.      -- start

是否可以通過“--end”和“--start”動態地對列進行切片。 意思是，我想獨立處理 -- start 和 -- end 之間的數據。

start_end = df[df.col.str.contains('-- end')+1:df.col.str.contains('-- start')]

無濟於事，也許這在 pandas 中甚至是不可能的，但我會喜歡輸入。

謝謝你們。

Answer 1

您可以嘗試如下：

data = {'column': {0: '-- end',
  1: 'QQQQ',
  2: 'GEO',
  3: 'DEF',
  4: 'ABC',
  5: '-- start',
  6: '-- end',
  7: 'apple',
  8: '-- start'}}

df = pd.DataFrame(data)

exclude_lst = ['-- start','-- end']

# get False for members of exclude_lst, True for the rest
bools = ~df.column.isin(['-- start','-- end'])

# get sequences: [1, 2, 2, 2, 2, 3, 3, 4, 5]
sequences = (bools != bools.shift()).cumsum()

# keep only sequences where bools == True (so, only 2 and 4)
groups = df[bools].groupby([sequences])

# now you can loop through each slice, and perform some operation on them
for gr in groups:
    print(gr)
    
# or put them in a list and go from there:
gr_lst = list(groups)

print(gr_lst[0])

(2,   column
1   QQQQ
2    GEO
3    DEF
4    ABC)

# so, we end up with tuples. Here gr_lst[0][0] == 2, a ref to first slice as [2, 2, 2, 2]

# use gr_lst[i][1] to access an actual slice, e.g.:
print(gr_lst[1][1])

  column
7  apple

按開始值和結束值對 pandas dataframe 列進行切片

問題描述

1 個解決方案

解決方案1
0 已采納 2022-07-26 21:44:21

按開始值和結束值對 pandas dataframe 列進行切片

問題描述

1 個解決方案

解決方案1 0 已采納 2022-07-26 21:44:21

解決方案1
0 已采納 2022-07-26 21:44:21