[英]Pair each item with the last non-null value in a row
我正在嘗試制作一個函數,在其中提供經過301跳的URL列表,並為我拉平它。 我想將結果列表另存為CSV,以便將其交給可以實現它並擺脫301躍點的開發人員。
例如,我的搜尋器將產生以下301個躍點列表:
URL1 | URL2 | URL3 | URL4
example.com/url1 | example.com/url2 | |
example.com/url3 | example.com/url4 | example.com/url5 |
example.com/url6 | example.com/url7 | example.com/url8 | example.com/10
example.com/url9 | example.com/url7 | example.com/url8 |
example.com/url23 | example.com/url10 | |
example.com/url24 | example.com/url45 | example.com/url46 |
example.com/url25 | example.com/url45 | example.com/url46 |
example.com/url26 | example.com/url45 | example.com/url46 |
example.com/url27 | example.com/url45 | example.com/url46 |
example.com/url28 | example.com/url45 | example.com/url46 |
example.com/url29 | example.com/url45 | example.com/url46 |
example.com/url30 | example.com/url45 | example.com/url46 |
我想要得到的輸出是
URL1 | URL2
example.com/url1 | example.com/url2
example.com/url3 | example.com/url5
example.com/url4 | example.com/url5
example.com/url6 | example.com/10
example.com/url7 | example.com/10
example.com/url8 | example.com/10
example.com/url23 | example.com/url10
...
我已使用以下代碼將Pandas數據框轉換為列表列表:
import pandas as pd
import numpy as np
csv1 = pd.read_csv('Example_301_sheet.csv', header=None)
outlist = []
def link_flat(csv):
for row in csv.iterrows():
index, data = row
outlist.append(data.tolist())
return outlist
這會將每一行作為列表返回,並將它們全部嵌套在一個列表中,如下所示:
[['example.com/url1', 'example.com/url2', nan, nan],
['example.com/url3', 'example.com/url4', 'example.com/url5', nan],
['example.com/url6',
'example.com/url7',
'example.com/url8',
'example.com/10'],
['example.com/url9', 'example.com/url7', 'example.com/url8', nan],
['example.com/url23', 'example.com/url10', nan, nan],
['example.com/url24', 'example.com/url45', 'example.com/url46', nan],
['example.com/url25', 'example.com/url45', 'example.com/url46', nan],
['example.com/url26', 'example.com/url45', 'example.com/url46', nan],
['example.com/url27', 'example.com/url45', 'example.com/url46', nan],
['example.com/url28', 'example.com/url45', 'example.com/url46', nan],
['example.com/url29', 'example.com/url45', 'example.com/url46', nan],
['example.com/url30', 'example.com/url45', 'example.com/url46', nan]]
如何將每個嵌套列表中的每個URL與同一列表中的最后一個URL匹配,以產生上述列表?
您需要使用groupby
+ last
確定每行的最后一個有效項,然后重塑dataFrame並使用melt
構建一個兩列映射。
df.columns = range(len(df.columns))
df = (
df.assign(URL2=df.stack().groupby(level=0).last())
.melt('URL2', value_name='URL1')
.drop('variable', 1)
.dropna()
.drop_duplicates()
.query('URL1 != URL2')
.sort_index(axis=1)
.reset_index(drop=True)
)
df
URL1 URL2
0 example.com/url1 example.com/url2
1 example.com/url3 example.com/url5
2 example.com/url6 example.com/10
3 example.com/url9 example.com/url8
4 example.com/url23 example.com/url10
5 example.com/url24 example.com/url46
6 example.com/url25 example.com/url46
7 example.com/url26 example.com/url46
8 example.com/url27 example.com/url46
9 example.com/url28 example.com/url46
10 example.com/url29 example.com/url46
11 example.com/url30 example.com/url46
12 example.com/url4 example.com/url5
13 example.com/url7 example.com/10
14 example.com/url7 example.com/url8
15 example.com/url45 example.com/url46
16 example.com/url8 example.com/10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.