[英]How can I extract a data from one tag when there are more same type of tag around
[英]When multiple messages occur across same date how can I tag the first one as induced?
我有一個很大的汽車維修信息數據框。 我正在嘗試清理此數據並刪除所有引起的消息。
每當出現汽車消息44時,我的代碼都會標記所有同時出現的消息。 我試圖顛倒我的邏輯,以便任何時候出現消息44和另一條消息時,都將其標記為誘發。
我已將其過濾,因此出現的任何日期的第一則消息將是消息44。
我的代碼是這樣的:
df['MsgCat'] = 'New'
for i in range(1,len(df)):
if df['MsgCat'].iloc[i] == 'New':
if df['CarSerial'].iloc[i] == df['CarSerial'].iloc[i-1]:
if df['Date'].iloc[i] == df['Date'].iloc[i-1]:
df['MsgCount'].iloc[i] = df['MsgCount'].iloc[i-1] + 1
if df['MsgId'].iloc[i-((df['MsgCount'].iloc[i])-1)] == 1:
df['MsgCat'].iloc[i] = 'Induced'
else:
df['MsgCount'].iloc[i] = 1
else:
df['MsgCount'].iloc[i] = 1
else:
df['MsgCount'].iloc[i] = 1
輸出:
CarSerial Date MessageNum MsgId MsgCount MsgCat
015 10/14/2015 44 1 1 New
015 10/14/2015 21 2 2 Induced
015 10/14/2015 22 3 3 Induced
015 10/20/2015 30 5 1 New
022 5/1/2015 44 1 1 New
022 7/10/2015 44 1 1 New
022 1/4/2016 44 1 1 New
141 1/10/2016 17 9 1 New
141 1/10/2016 18 10 2 New
008 1/21/2016 44 1 1 New
008 2/4/2016 44 1 1 New
008 2/4/2016 30 5 2 Induced
008 2/4/2016 31 6 3 Induced
期望的輸出:
CarSerial Date MessageNum MsgId MsgCount MsgCat
015 10/14/2015 44 1 1 Induced
015 10/14/2015 21 2 2 New
015 10/14/2015 22 3 3 New
015 10/20/2015 30 5 1 New
022 5/1/2015 44 1 1 New
022 7/10/2015 44 1 1 New
022 1/4/2016 44 1 1 New
141 1/10/2016 17 9 1 New
141 1/10/2016 18 10 2 New
008 1/21/2016 44 1 1 New
008 2/4/2016 44 1 1 Induced
008 2/4/2016 30 5 2 New
008 2/4/2016 31 6 3 New
提前致謝!!
好拼圖!
將行按CarSerial
和Date
CarSerial
。 對於每個組記錄是否具有MessageNum
通過將項目添加到詞典稱為在它和多於一個行44 changes
。 字典中的項目包含一個基於dict
類的類,該類將'Induced'分配給44,將'New'分配給其他所有東西。 因此,任何符合條件的組都將由changes
字典中的一項表示, MsgCat
為需要更改的記錄分配所需的MsgCat
標簽。 使用change_if_need_be
函數可以通過在changes
查找並分配結果來檢查每一行的內容,因為這兩個記錄都包含在changes
而其他所有記錄都包括在內。
>>> import pandas as pd
>>> df = pd.read_csv('cars.csv', sep='\s+')
>>> df
CarSerial Date MessageNum MsgId MsgCount
0 15 10/14/2015 44 1 1
1 15 10/14/2015 21 2 2
2 15 10/14/2015 22 3 3
3 15 10/20/2015 30 5 1
4 22 5/1/2015 44 1 1
5 22 7/10/2015 44 1 1
6 22 1/4/2016 44 1 1
7 141 1/10/2016 17 9 1
8 141 1/10/2016 18 10 2
9 8 1/21/2016 44 1 1
10 8 2/4/2016 44 1 1
11 8 2/4/2016 30 5 2
12 8 2/4/2016 31 6 3
>>> grouping = df.groupby(df['CarSerial'].apply(lambda n: str(n)) + ' ' + df['Date'])
>>> class Once(dict):
... def __missing__(self, key):
... return 'New'
...
>>> once = Once()
>>> once[44] = 'Induced'
>>> def change_if_need_be(row):
... key = str(row['CarSerial'])+' '+row['Date']
... if key in changes:
... return changes[key][row['MessageNum']]
... else:
... return 'New'
...
>>> changes = {}
>>> for g in grouping:
... if any(g[1].MessageNum == 44) and g[1].MessageNum.count()>1:
... changes[g[0]] = once
...
>>> df['MsgCat'] = df.apply(change_if_need_be, axis=1)
>>> df
CarSerial Date MessageNum MsgId MsgCount MsgCat
0 15 10/14/2015 44 1 1 Induced
1 15 10/14/2015 21 2 2 New
2 15 10/14/2015 22 3 3 New
3 15 10/20/2015 30 5 1 New
4 22 5/1/2015 44 1 1 New
5 22 7/10/2015 44 1 1 New
6 22 1/4/2016 44 1 1 New
7 141 1/10/2016 17 9 1 New
8 141 1/10/2016 18 10 2 New
9 8 1/21/2016 44 1 1 New
10 8 2/4/2016 44 1 1 Induced
11 8 2/4/2016 30 5 2 New
12 8 2/4/2016 31 6 3 New
編輯:我想過將運行得更快的改進。
將功能更改為此。
>>> def change_if_need_be(row):
... key = str(row['CarSerial'])+' '+row['Date']
... if key in changes:
... return once[row['MessageNum']]
... else:
... return 'New'
...
像這樣將changes
從dict
更改為列表。
>>> changes = []
>>> for g in grouping:
... if any(g[1].MessageNum == 44) and g[1].MessageNum.count()>1:
... changes.append(g[0])
...
編輯 :簡化(消除從dict
派生的class
)並合並。
>>> import pandas as pd
>>> df = pd.read_csv('cars.csv', sep='\s+')
>>> df
CarSerial Date MessageNum MsgId MsgCount MsgCat
0 15 10/14/2015 44 1 1 New
1 15 10/14/2015 21 2 2 Induced
2 15 10/14/2015 22 3 3 Induced
3 15 10/20/2015 30 5 1 New
4 22 5/1/2015 44 1 1 New
5 22 7/10/2015 44 1 1 New
6 22 1/4/2016 44 1 1 New
7 141 1/10/2016 17 9 1 New
8 141 1/10/2016 18 10 2 New
9 8 1/21/2016 44 1 1 New
10 8 2/4/2016 44 1 1 New
11 8 2/4/2016 30 5 2 Induced
12 8 2/4/2016 31 6 3 Induced
>>> grouping = df.groupby(df['CarSerial'].apply(lambda n: str(n)) + ' ' + df['Date'])
>>> changes = []
>>> for g in grouping:
... if any(g[1].MessageNum == 44) and g[1].MessageNum.count()>1:
... changes.append(g[0])
...
>>> def change_if_need_be(row):
... key = str(row['CarSerial'])+' '+row['Date']
... if key in changes:
... return {44: 'Induced'}.get(row['MessageNum'], 'New')
... else:
... return 'New'
...
>>> df['MsgCat'] = df.apply(change_if_need_be, axis=1)
結果相同。
僅僅反轉邏輯是不夠的:當您發現消息44被誘導時,您已經傳遞了消息44。 您有兩個基本選擇:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.