簡體   English   中英

如何從 dataframe 創建列表列表

[英]How to create a list of list from a dataframe

我有一個 dataframe df ,我想將 dataframe 轉換為列表

    left_side                                  right_side                             similarity
0114600043776001 loan payment receipt         0421209017073500 loan payment receipt     0.689008
0114600043776001 loan payment receipt         0421209017073500 loan payment receipt     0.689008
vat onverve*issuance fee*506108               vat onverve*issuance fee*5061087       0.743522
vat onverve*issuance fee*506108               verve*issuance fee*506108*********1112    0.684342
verve*issuance fee*506108                     verve*issuance fee*506108*********8296    0.717817
verve*issuance fee*506108                     vat onverve*issuance fee*506108**         0.684342

maint fee recovery jun 2018                   vat maint fee recovery jun 2018          0.896607
maint fee recovery jun 2018                  vat maint fee recovery jun 2018         0.896607
maint fee recovery jun 2018                  vat maint fee recovery jun 2018         0.896607

預期的 output 應如下所示:

[[0114600043776001 loan payment receipt, 0421209017073500 loan payment receipt,
  0421209017073500 loan payment receipt],
[vat onverve*issuance fee*506108, vat onverve*issuance fee*5061087, 
  verve*issuance fee*506108*********1112], 
[verve*issuance fee*506108*********8296, verve*issuance fee*506108                    
 vat onverve*issuance fee*506108** ],...]

我嘗試按left_side column對上述 df 進行分組並將生成的 df 轉換為列表,但 output 不是我所期望的。 請在這方面需要你的幫助

grouup_df = df.groupby(['left_side']).right_side.sum().to_frame()

grouup_df.values.tolist()

output 看起來像這樣:

['0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt0421209017073500 loan payment receipt']
['vat maint fee recovery jun 2018vat maint fee recovery jun 2018vat maint fee recovery jun 2018maint fee recovery jul 2018maint fee recovery oct 2018maint fee recovery jul 2018maint fee recovery jul 2018']
import pandas as pd

dfold = {'left_side': ['string','string','string','string'],
            'right_side': ['string','string','string','string']
            }

df = pd.DataFrame(dfold, columns= ['left_side', 'right_side'])
print(df)
df_list = df.values.tolist()
print(df_list)

您可以使用df.groupby

>>> [[k, *g] for k, g in df.groupby('left_side', sort=False)['right_side']]

[['0114600043776001 loan payment receipt',
  '0421209017073500 loan payment receipt',
  '0421209017073500 loan payment receipt'],
 ['vat onverve*issuance fee*506108',
  'vat onverve*issuance fee*5061087',
  'verve*issuance fee*506108*********1112'],
 ['verve*issuance fee*506108',
  'verve*issuance fee*506108*********8296',
  'vat onverve*issuance fee*506108**'],
 ['maint fee recovery jun 2018',
  'vat maint fee recovery jun 2018',
  'vat maint fee recovery jun 2018',
  'vat maint fee recovery jun 2018']]

我相信您正在尋找數據報上的to_records()方法。 試試df.to_records() ,你可以在這里找到它的文檔

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM