[英]Selecting rows in dataframe until a condition is met and stop the loop- Pandas
I have a txt file that has more than 8k rows, have managed to clean it and now i want to loop through the whole data to split it into two excel sheets.我有一个超过 8k 行的 txt 文件,已设法清理它,现在我想遍历整个数据以将其拆分为两个 excel 表。 Category A and Category B. I want to loop and select rows and once meet a row that contains some string, once this key word is met on the first occurrence the looping should stop and the result exported to excel
类别 A 和类别 B。我想循环并选择行,一旦遇到包含某个字符串的行,一旦在第一次出现时遇到这个关键字,循环应该停止并将结果导出到 excel
I have tried using for loop in checking the rows i want and dropping the ones i don't want but the similar rows from another appear in all categories... sample data我已经尝试使用 for 循环来检查我想要的行并删除我不想要的行,但是来自另一个的类似行出现在所有类别中......样本数据
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (A)
-----------------------------------------------------------------------------------------------------------------------------------
40 10NOV 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
41 10NOV 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
41 10NOV 07:10:25 4318711071280219 657407041764 041764 431871 0200 010000 051 02 00 3,000.00 KES 24.65CR
TA ID: XYZ00803 TA10001XYZ00803 25/SRA ONN THE WAY LMR RD /ED 0000 FPI: 8C1
42 10NOV 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
42 10NOV 07:31:06 HHHH060000359699 657407041774 041774 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
42 10NOV 07:57:07 4221740021146317 657407041781 041781 422174 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
43 10NOV 08:10:50 4036490000012644 657408041784 041784 403649 0200 010000 051 02 51 20,000.00 KES 0.00
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (A)
-----------------------------------------------------------------------------------------------------------------------------------
44 11NOV 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
44 11NOV 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
45 11NOV 07:18:35 4930005028593164 657407041769 041769 493000 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ04201 TA10000XYZ04201 25/SRA MTWAPA /ED 0000 FPI: 8C1
45 11NOV 07:19:29 4930005028593164 657407041770 041770 493000 0200 010000 051 02 00 30,000.00 KES 246.51CR
TA ID: XYZ04201 TA10000XYZ04201 25/SRA MTWAPA /ED 0000 FPI: 8C1
46 11NOV 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
46 11NOV 07:31:06 HHHH060000359699 657407041774 041774 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
47 11NOV 07:38:05 4034910028476291 657407041777 041777 403491 0200 012000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ01401 TA10000XYZ01401 25/SRA ELDORET /ED 0000 FPI: 8C1
47 11NOV 07:38:35 HHHH060000359699 657407041778 041778 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
48 11NOV 07:57:07 4221740021146317 657407041781 041781 422174 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
48 11NOV 08:10:50 4036490000012644 657408041784 041784 403649 0200 010000 051 02 51 20,000.00 KES 0.00
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (A)
-----------------------------------------------------------------------------------------------------------------------------------
49 14NOV 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
49 14NOV 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
50 14NOV 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
50 14NOV 07:31:06 HHHH060000359699 657407041774 041774 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
50 14NOV 07:55:26 4221740021146317 657407041780 041780 422174 0200 010000 051 02 00 6,000.00 KES 49.30CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
50 14NOV 07:57:07 4221740021146317 657407041781 041781 422174 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
50 10NOV 08:10:50 4036490000012644 657408041784 041784 403649 0200 010000 051 02 51 20,000.00 KES 0.00
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (A)
-----------------------------------------------------------------------------------------------------------------------------------
51 15NOV 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
51 15NOV 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
51 15NOV 07:10:25 4318711071280219 657407041764 041764 431871 0200 010000 051 02 00 3,000.00 KES 24.65CR
TA ID: XYZ00803 TA10001XYZ00803 25/SRA ONN THE WAY LMR RD /ED 0000 FPI: 8C1
52 15NOV 07:18:35 4930005028593164 657407041769 041769 493000 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ04201 TA10000XYZ04201 25/SRA MTWAPA /ED 0000 FPI: 8C1
52 15NOV 07:21:20 4922950014377066 657407041772 041772 492295 0200 010000 051 02 00 30,000.00 KES 246.51CR
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
53 15NOV 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
53 15NOV 07:38:05 4034910028476291 657407041777 041777 403491 0200 012000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ01401 TA10000XYZ01401 25/SRA ELDORET /ED 0000 FPI: 8C1
54 15NOV 07:38:35 HHHH060000359699 657407041778 041778 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
54 11NOV 08:10:50 4036490000012644 657408041784 041784 403649 0200 010000 051 02 51 20,000.00 KES 0.00
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (B)
-----------------------------------------------------------------------------------------------------------------------------------
40 10OCT 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
41 10OCT 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
41 10OCT 07:21:20 4922950014377066 657407041772 041772 492295 0200 010000 051 02 00 30,000.00 KES 246.51CR
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
42 10OCT 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
42 10OCT 07:31:06 HHHH060000359699 657407041774 041774 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
42 10OCT 07:57:07 4221740021146317 657407041781 041781 422174 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
43 10OCT 08:10:50 4036490000012644 657408041784 041784 403649 0200 010000 051 02 51 20,000.00 KES 0.00
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (B)
-----------------------------------------------------------------------------------------------------------------------------------
44 18OCT 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
44 18OCT 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
45 18OCT 07:20:00 4906385159141011 657407041771 041771 490638 0200 010000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ02701 TA10000XYZ02701 25/XYZ KAKAMEGA /ED 0000 FPI: 8C1
45 18OCT 07:21:20 4922950014377066 657407041772 041772 492295 0200 010000 051 02 00 30,000.00 KES 246.51CR
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
46 18OCT 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
46 18OCT 07:31:46 HHHH060000359699 657407041775 041775 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
47 18OCT 07:38:05 4034910028476291 657407041777 041777 403491 0200 012000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ01401 TA10000XYZ01401 25/SRA ELDORET /ED 0000 FPI: 8C1
47 18OCT 07:38:35 HHHH060000359699 657407041778 041778 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
48 18OCT 07:57:07 4221740021146317 657407041781 041781 422174 0200 010000 051 02 00 40,000.00 KES 328.68CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
48 18OCT 08:10:50 4036490000012644 657408041784 041784 403649 0200 010000 051 02 51 20,000.00 KES 0.00
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (B)
-----------------------------------------------------------------------------------------------------------------------------------
49 13SEP 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
49 13SEP 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
49 13SEP 07:21:20 4922950014377066 657407041772 041772 492295 0200 010000 051 02 00 30,000.00 KES 246.51CR
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
50 13SEP 07:30:21 4863480011789758 657407041773 041773 486348 0200 300000 051 00 00 0.00 KES 0.00
TA ID: XYZ01101 TA10000XYZ01101 25/SRA MALINDI /ED 0000
50 13SEP 07:31:06 HHHH060000359699 657407041774 041774 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
50 13SEP 07:38:35 HHHH060000359699 657407041778 041778 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
50 13SEP 07:55:26 4221740021146317 657407041780 041780 422174 0200 010000 051 02 00 6,000.00 KES 49.30CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
VAT XMIT(GMT)/LOCL RETRIEVAL TRACE SENDER ID/ SRAM PROCSS ENT REAS CN/ RSP --ACTION-- SETTLEMENT
NUM DATE TIME TARD NUMBER REF NUMBER NUMBER TRMNL/NAME TYPE CODE MOD CODE STP CD AMOUNT CUR AMOUNT (B)
-----------------------------------------------------------------------------------------------------------------------------------
51 15NOV 06:57:36 4634050200885367 657406041760 041760 746842 0200 012000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ06101 TA10000XYZ06101 25/SRA NAROK /ED 0000 FPI: 8C1
51 15NOV 07:07:38 4580160118732868 657407041761 041761 458016 0200 010000 051 02 00 14,900.00 KES 122.43CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
51 15NOV 07:10:25 4318711071280219 657407041764 041764 431871 0200 010000 051 02 00 3,000.00 KES 24.65CR
TA ID: XYZ00803 TA10001XYZ00803 25/SRA ONN THE WAY LMR RD /ED 0000 FPI: 8C1
52 15NOV 07:20:00 4906385159141011 657407041771 041771 490638 0200 010000 051 02 00 25,000.00 KES 205.42CR
TA ID: XYZ02701 TA10000XYZ02701 25/XYZ KAKAMEGA /ED 0000 FPI: 8C1
52 15NOV 07:21:20 4922950014377066 657407041772 041772 492295 0200 010000 051 02 00 30,000.00 KES 246.51CR
TA ID: XYZ04301 TA10000XYZ04301 25/SRA LAMU /ED 0000 FPI: 8C1
54 15NOV 07:38:35 HHHH060000359699 657407041778 041778 HHHH06 0200 300000 051 00 06 0.00 KES 0.00
TA ID: XYZ03201 TA10000XYZ03201 25/SRA VOI /ED 0000
54 15NOV 07:55:26 4221740021146317 657407041780 041780 422174 0200 010000 051 02 00 6,000.00 KES 49.30CR
TA ID: XYZ02001 TA10001XYZ02001 25/XYZ DIANI /ED 0000 FPI: 8C1
[here is code snippet][1]
[1]: https://i.stack.imgur.com/TW1cl.png
Assuming your text file has been converted to a pandas dataframe and the key word is expected to be found in a particular column named a
, you can do something like this to split the input dataframe into two outputs:假设您的文本文件已转换为 pandas 数据框,并且关键字预计会在名为
a
的特定列中找到,您可以执行类似这样的操作,将输入数据框拆分为两个输出:
import pandas as pd
keyWord = 'foo'
df = pd.DataFrame({
'a':['bar' for _ in range(4)] + [keyWord] + ['baz' for _ in range(3)],
'b':range(8)
})
print('input', df, sep='\n')
found = (df.a == keyWord).cumsum().shift(fill_value=0)
print('first sheet', df[found == 0], sep='\n')
print('second sheet', df[found > 0], sep='\n')
Sample output:示例输出:
input
a b
0 bar 0
1 bar 1
2 bar 2
3 bar 3
4 foo 4
5 baz 5
6 baz 6
7 baz 7
first sheet
a b
0 bar 0
1 bar 1
2 bar 2
3 bar 3
4 foo 4
second sheet
a b
5 baz 5
6 baz 6
7 baz 7
Explanation:解释:
a
with the key word.a
中的值与关键字进行比较。cumsum()
to count these;cumsum()
来计算这些; for rows prior to the first occurrence of the key word, cumsum
will give 0, and starting with the first matching row, it will give a number > 0.cumsum
将给出 0,并且从第一个匹配行开始,它将给出一个 > 0 的数字。shift()
to move the results of cumsum
downward by one row;shift()
将cumsum
的结果向下移动一行; this will cause the first matching row to be kept with the first sheet rather than the second sheet. Note:笔记:
shift
from the initialization of found
like this:found
的初始化中删除shift
,如下所示:found = (df.a == keyWord).cumsum()
After a long long and critical thinking on this i finally figured outon ow to circumnavigate on this.. first find all the occurrences of the keyword second, push the all the occurrences to a list list=[], list.append(x)
extract the first item from the lisst and use it and row index to stop at.经过长时间和批判性的思考,我终于想出如何绕过这个。首先找到关键字的所有出现,其次,将所有出现的推到列表
list=[], list.append(x)
提取列表中的第一项并使用它和行索引停止。 varx=list[0]
will get you the 1st item which is an int. varx=list[0]
将为您提供第一项,它是一个整数。 thirdly, use iloc()
to truncate.第三,使用
iloc()
截断。 the varx
will be the ending point for slicing or the start point: acording to your needs. varx
将是切片的终点或起点:根据您的需要。 ie iloc[:varx]
or iloc[varx:
] finally export to excel as desired.即
iloc[:varx]
或iloc[varx:
] 最终根据需要导出到 excel。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.