[英]Extract list data from CSV file
我有以下示例内容的csv文件,我只需要将列表部分存储为csv格式。
FILE.CSV:
Row 1: [123, abc, aa-dd daw, 122, 2011-11-11 00:00:00, None, None, None, GA GH, 1.9912109375]
Row 2: [234, bcd, bc-dd acs, 332, 2012-11-11 00:00:00, None, addad, None, GB GG, 1.22]
Row 3: [345, cda, cd-dd adc, 12312, 2013-11-11 00:00:00, None, None, dsa, GV GA, 1.925262]
码:
import re
file=open('file.csv')
file_contents=file.read()
regx = re.compile(r'\[(.*)\]')
column_fetch=regx.findall(file_contents)
print column_fetch
预期输出(file.csv):
123, abc, aa-dd daw, 122, 2011-11-11 00:00:00, None, None, None, GA GH, 1.9912109375
234, bcd, bc-dd acs, 332, 2012-11-11 00:00:00, None, addad, None, GB GG, 1.22
345, cda, cd-dd adc, 12312, 2013-11-11 00:00:00, None, None, dsa, GV GA, 1.925262
实际输出:
[123, abc, aa-dd daw, 122, 2011-11-11 00:00:00, None, None, None, GA GH, 1.9912109375 234, bcd, bc-dd acs, 332, 2012-11-11 00:00:00, None, addad, None, GB GG, 1.22 345, cda, cd-dd adc, 12312, 2013-11-11 00:00:00, None, None, dsa, GV GA, 1.925262]
尝试这种方式,将行作为列表,然后您可以做任何想做的事情:
import re
file=open('test-001.csv')
file_contents=file.readlines()
regx = re.compile(r'\[(.*)\]')
for line in file_contents:
line_fetch=regx.findall(line)
print (line_fetch)
# print (line_fetch.__class__) # uncomment to see
问题是由您的正则表达式r'\\[(.*)\\]'
因为*
是贪婪的搜索,因此它正在寻找最长的匹配项,因此您要从头开始[
到末尾]
进行匹配,以避免这种情况情况使用*?
表示非贪婪搜索,例如:
data = '''Row 1: [123, abc, aa-dd daw, 122, 2011-11-11 00:00:00, None, None, None, GA GH, 1.9912109375]
Row 2: [234, bcd, bc-dd acs, 332, 2012-11-11 00:00:00, None, addad, None, GB GG, 1.22]
Row 3: [345, cda, cd-dd adc, 12312, 2013-11-11 00:00:00, None, None, dsa, GV GA, 1.925262]'''
rows = [i[1] for i in re.findall(r'(\[)(.*?)(\])',data)]
print(rows)
输出:
['123, abc, aa-dd daw, 122, 2011-11-11 00:00:00, None, None, None, GA GH, 1.9912109375', '234, bcd, bc-dd acs, 332, 2012-11-11 00:00:00, None, addad, None, GB GG, 1.22', '345, cda, cd-dd adc, 12312, 2013-11-11 00:00:00, None, None, dsa, GV GA, 1.925262']
例如,为了清楚起见,我省略了对文件部分的读取和写入,而是直接将字符串分配给data
。 请注意,我使用正则表达式分组创建了三个组:
对于[
用于实际数据
对于]
然后提取中间一个。
作为熟悉熊猫的人,我会做这样的事情:
import pandas as pd
pd.read_csv('file.csv')
pd.to_csv('file_out.csv')
但是我不确定这正是您想要的。 至少当csv作为pd.DataFrame时,您有很多选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.