[英]Read CSV Transpose pandas
我有一個數據集,如下所示:
Name : joe
Job : Crazy Consultant
Hired : 4/12/2011 3:38:55 AM
Stats : crazy, bald head
Pay : $5000 Monthly
Name : Matt
Job : Crazy Receptionist
Hired : 4/12/2014 3:38:55 PM
Stats : crazy, Lots of hair
Name : Adam
Job : Crazy Drinker
Hired : 4/12/2017 3:38:55 AM
Stats : crazy, unknown
Term : 4/12/2017 3:38:55 PM
我讀入並獲取如下數據:
df = pd.read_csv(r"pathtomycsv.csv", encoding="UTF-16", delimiter='\s+:').transpose()
以上輸出:(僅作為示例)
Name Job Hired Stats Name Job Hired Stats
Joe Crazy Consultant 4/12/2011 3:38:55 AM crazy, bald head Matt Crazy Consultant 4/12/2011 3:38:55 AM crazy, bald head
最后,我想從上面獲取我的數據集,並通過將所有標題組合在一起將其轉換為如下所示的數據集,如下所示:
Name Job Hired Stats Pay Term
Joe Crazy Consultant 4/12/2011 3:38:55 AM crazy, bald head $5000 Monthly N/A
Matt Crazy Receptionist 4/12/2014 3:38:55 PM crazy, Lots of hair N/A N/A
Adam Crazy Drinker 4/12/2017 3:38:55 AM crazy, unknown N/A 4/12/2017 3:38:55 PM
出現問題的原因是您在日期中有更多冒號。 使用"\\s+:\\s+"
作為分隔符。 (是的,它可以是正則表達式。)
以下代碼可用於將您的文件轉換為所需的表。 我假設'Name'始終是集合中的第一行。
df = pd.read_csv("yourfile", delimiter='\s+:\s+',header=None)
df = df.reset_index()
df['index'][df[0]!='Name'] = np.nan
df['index'] = df['index'].fillna(method='ffill').astype(int)
df.set_index(['index',0])[1].unstack().set_index('Name')
#0 Hired Job Pay
#Name
#joe 4/12/2011 3:38:55 AM Crazy Consultant $5000 Monthly
#Matt 4/12/2014 3:38:55 PM Crazy Receptionist None
#Adam 4/12/2017 3:38:55 AM Crazy Drinker None
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.