[英]Pandas.read_csv with multiple delimiters for lines and versus columns
I am trying to read a csv into a pandas dataframe that separates rows by bracket and columns by commas: "["column1, column2, etc "]".我正在尝试将 csv 读入 pandas dataframe 中,它用括号分隔行,用逗号分隔列:“]”等。 There are also double quotes in the file text.文件文本中也有双引号。 For example this should produce 4 columns and 3 rows.例如,这应该产生 4 列和 3 行。
slug,site_id,page_id,page_text
"[""act"", 1, 24, ""Hi, thank you so much for RSVP'ing""]","[""act"", 1, 43, ""Thank you for taking the time to tell us why wireless matters to you!“”]”,"[""uoaa"", 2, 238, ""First published at Oregonlive.com on January 28th, 2019.“”]”
The code I'm trying just makes a mess of it, creating 1 row and many columns with wherever there are commas.我正在尝试的代码只是把它弄得一团糟,在有逗号的地方创建了 1 行和许多列。 It is not registering that everything in between the brackets is a single row, and a new set of brackets means its a new row.它没有记录括号之间的所有内容都是单行,而一组新的括号意味着它是一个新行。
df = pd.read_csv(tar.extractfile(csv_path), header=0, sep=r'\[|\]|,', quotechar='"',quoting=1, engine = 'python')
Any help would be greatly appreciated.任何帮助将不胜感激。
Rows are separated by ,
and a row is between "[...]"
:行由 分隔,
一行在"[...]"
之间:
"[""act"", 1, 24, ""Hi, thank you so much for RSVP'ing""]","[""act"", 1, 43, ""Thank you for taking the time to tell us why wireless matters to you!""]"
import pandas as pd
import ast
import re
ROWS = re.compile(r'''(\"{1}\[.*\]\"{1}),(\"{1}\[.*\]\"{1})*''')
records = [ast.literal_eval(re.sub(r'"("*)', r'\1', row))
for row in ROWS.findall(open('data.csv').read())[0]]
df = pd.DataFrame(records)
>>> df
0 1 2 3
0 act 1 24 Hi, thank you so much for RSVP'ing
1 act 1 43 Thank you for taking the time to tell us why w...
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 2 non-null object
1 1 2 non-null int64
2 2 2 non-null int64
3 3 2 non-null object
dtypes: int64(2), object(2)
memory usage: 192.0+ bytes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.