Pandas.read_csv 帶有多個分隔符用於行和列

Question

我正在嘗試將 csv 讀入 pandas dataframe 中，它用括號分隔行，用逗號分隔列：“]”等。 文件文本中也有雙引號。 例如，這應該產生 4 列和 3 行。

slug,site_id,page_id,page_text
"[""act"", 1, 24, ""Hi, thank you so much for RSVP'ing""]","[""act"", 1, 43, ""Thank you for taking the time to tell us why wireless matters to you!“”]”,"[""uoaa"", 2, 238, ""First published at Oregonlive.com on January 28th, 2019.“”]”

我正在嘗試的代碼只是把它弄得一團糟，在有逗號的地方創建了 1 行和許多列。 它沒有記錄括號之間的所有內容都是單行，而一組新的括號意味着它是一個新行。

df = pd.read_csv(tar.extractfile(csv_path), header=0, sep=r'\[|\]|,', quotechar='"',quoting=1, engine = 'python')

任何幫助將不勝感激。

Answer 1

行由分隔,一行在"[...]"之間：

"[""act"", 1, 24, ""Hi, thank you so much for RSVP'ing""]","[""act"", 1, 43, ""Thank you for taking the time to tell us why wireless matters to you!""]"


import pandas as pd
import ast
import re

ROWS = re.compile(r'''(\"{1}\[.*\]\"{1}),(\"{1}\[.*\]\"{1})*''')

records = [ast.literal_eval(re.sub(r'"("*)', r'\1', row))
               for row in ROWS.findall(open('data.csv').read())[0]]

df = pd.DataFrame(records)

>>> df
     0  1   2                                                  3
0  act  1  24                 Hi, thank you so much for RSVP'ing
1  act  1  43  Thank you for taking the time to tell us why w...

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   0       2 non-null      object
 1   1       2 non-null      int64
 2   2       2 non-null      int64
 3   3       2 non-null      object
dtypes: int64(2), object(2)
memory usage: 192.0+ bytes

Pandas.read_csv 帶有多個分隔符用於行和列

問題描述

1 個解決方案

解決方案1
1 2021-05-10 20:17:02

Pandas.read_csv 帶有多個分隔符用於行和列

問題描述

1 個解決方案

解決方案1 1 2021-05-10 20:17:02

解決方案1
1 2021-05-10 20:17:02