[英]Python Pandas - Rearrange Table Data into 1 Line
我們有這段代碼可從iframe中提取數據(感謝Cody):
import requests
from bs4 import BeautifulSoup
s = requests.Session()
r = s.get("https://www.aliexpress.com/store/feedback-score/1665279.html")
soup = BeautifulSoup(r.content, "html.parser")
iframe_src = soup.select_one("#detail-displayer").attrs["src"]
r = s.get(f"https:{iframe_src}")
soup = BeautifulSoup(r.content, "html.parser")
for row in soup.select(".history-tb tr"):
print("\t".join([e.text for e in row.select("th, td")]))
返回此:
Feedback 1 Month 3 Months 6 Months
Positive (4-5 Stars) 154 562 1,550
Neutral (3 Stars) 8 19 65
Negative (1-2 Stars) 8 20 57
Positive feedback rate 95.1% 96.6% 96.5%
我們需要此輸出,所有輸出都在1行中:
我們該怎么做呢?
只是set_index
和unstack
:
df:
Feedback 1 Month 3 Months 6 Months store
0 Positive (4-5 Stars) 154 562 1,550 1665279
1 Neutral (3 Stars) 8 19 65 1665279
2 Negative (1-2 Stars) 8 20 57 1665279
3 Positive feedback rate 95.1% 96.6% 96.5% 1665279
然后:
df = df[~df['Feedback'].str.contains('Positive feedback rate')]
new = df.set_index(['store', 'Feedback']).unstack(level=1)
# use f-strings with list comprehension
new.columns = new.columns = [f'{x} {y[:3]}' for x,y in new.columns]
或者您可以使用pivot
:
df = df[~df['Feedback'].str.contains('Positive feedback rate')]
new = df.pivot('store', 'Feedback')
new.columns = new.columns = [f'{x} {y[:3]}' for x,y in new.columns]
兩者之間的性能大致相同:
unstack: 3.61 ms ± 186 µs per loop (mean ± std. dev. of 3 runs, 1000 loops each)
pivot: 3.59 ms ± 114 µs per loop (mean ± std. dev. of 3 runs, 1000 loops each)
這是完成工作的完整代碼。
import pandas as pd
import requests
from bs4 import BeautifulSoup
pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', 50)
url = "https://www.aliexpress.com/store/feedback-score/1665279.html"
s = requests.Session()
r = s.get(url)
soup = BeautifulSoup(r.content, "html.parser")
iframe_src = soup.select_one("#detail-displayer").attrs["src"]
r = s.get(f"https:{iframe_src}")
soup = BeautifulSoup(r.content, "html.parser")
rows = []
for row in soup.select(".history-tb tr"):
print("\t".join([e.text for e in row.select("th, td")]))
rows.append([e.text for e in row.select("th, td")])
print
df = pd.DataFrame.from_records(
rows,
columns=['Feedback', '1 Month', '3 Months', '6 Months'],
)
# remove first row with column names
df = df.iloc[1:]
df['Shop'] = url.split('/')[-1].split('.')[0]
pivot = df.pivot(index='Shop', columns='Feedback')
pivot.columns = [' '.join(col).strip() for col in pivot.columns.values]
column_mapping = dict(
zip(pivot.columns.tolist(), [col[:12] for col in pivot.columns.tolist()]))
# column_mapping
# {'1 Month Negative (1-2 Stars)': '1 Month Nega',
# '1 Month Neutral (3 Stars)': '1 Month Neut',
# '1 Month Positive (4-5 Stars)': '1 Month Posi',
# '1 Month Positive feedback rate': '1 Month Posi',
# '3 Months Negative (1-2 Stars)': '3 Months Neg',
# '3 Months Neutral (3 Stars)': '3 Months Neu',
# '3 Months Positive (4-5 Stars)': '3 Months Pos',
# '3 Months Positive feedback rate': '3 Months Pos',
# '6 Months Negative (1-2 Stars)': '6 Months Neg',
# '6 Months Neutral (3 Stars)': '6 Months Neu',
# '6 Months Positive (4-5 Stars)': '6 Months Pos',
# '6 Months Positive feedback rate': '6 Months Pos'}
pivot.columns = [column_mapping[col] for col in pivot.columns]
pivot.to_excel('Report.xlsx')
您可能需要手動對pivot.columns
進行排序,因為它們是按字母順序排序的(“ 1 Month Negative (1-2 Stars)'
pivot.columns
'1 Month Neutral (3 Stars)'
之前是'1 Month Neutral (3 Stars)'
1 Month Negative (1-2 Stars)'
'1 Month Neutral (3 Stars)'
)。 設置好列的映射后,您只需為它們中的每一個選擇一個合適的名稱,然后它們就會被映射(因此,您不必在每次決定切換中立和負立位置時都對它們進行重新排序,因為實例)。 這要歸功於字典查找。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.