簡體   English   中英

pandas/dask csv 多行讀取

[英]pandas/dask csv multiple line read

我有 CSV 這種方式:

name,sku,description
Bryce Jones,lay-raise-best-end,"Art community floor adult your single type. Per back community former stock thing."
John Robinson,cup-return-guess,Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.
Theresa Taylor,step-onto,"**Choice should lead budget task. Author best mention.
Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show.**"

整個多行是第 3 行描述列的值

但當

df = ddf.read_csv(
    file_path,blocksize=2000,engine="python",encoding='utf-8-sig',quotechar='"',delimiter='[,]',quoting=csv.QUOTE_MINIMAL
)

我使用上面的代碼,它以這種方式讀取

['Bryce Jones', 'lay-raise-best-end', '"Art community floor adult your single type. Per back community former stock thing."']
['John Robinson', 'cup-return-guess', 'Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.']
['Theresa Taylor', 'step-onto', '"Choice should lead budget task. Author best mention.']
['Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show."', None, None]

這個怎么做?

1

您可以在行之間使用雙換行符,在文本中使用單換行符, pandas會理解。 因此, csv將是-

name,sku,description

Bryce Jones,lay-raise-best-end,"Art community floor adult your single type. Per back community former stock thing."

John Robinson,cup-return-guess,Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.

Theresa Taylor,step-onto,"Choice should lead budget task. Author best mention.
Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show."

這就是你如何閱讀它。

df = pd.read_csv(filepath) # you can keep other parameters if you want

output 是,

             name                 sku  \
0     Bryce Jones  lay-raise-best-end   
1   John Robinson    cup-return-guess   
2  Theresa Taylor           step-onto   

                                         description  
0  Art community floor adult your single type. Pe...  
1  Produce successful hot tree past action young ...  
2  Choice should lead budget task. Author best me...  

2

在需要換行符的地方使用\n

name,sku,description
Bryce Jones,lay-raise-best-end,"Art community floor adult your single type. Per back community former stock thing."
John Robinson,cup-return-guess,Produce successful hot tree past action young song. Himself then tax eye little last state vote. Country down list that speech economy leave.
Theresa Taylor,step-onto,"Choice should lead budget task. Author best mention.\nOften stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show."

閱讀時,使用 python 的codecs庫。

import codecs
df = pd.read_csv('../../data/stack.csv')
print(codecs.decode(df.iloc[2,2], 'unicode_escape'))

Output:

Choice should lead budget task. Author best mention.
Often stuff professional today allow after door instead. Model seat fear evidence. Now sing opportunity feeling no season show.

我們不得不使用codecs.decode()因為pandas\\轉義字符\ 並解碼撤消它。 如果沒有print() function,您將看不到換行符。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM