简体   繁体   English

使用Python解析大型.csv文件行

[英]Parse Large .csv File Rows with Python

A large .csv file has a typical row with approx 3000 data elements separated by commas. 一个大的.csv文件通常有一行,其中大约3000个数据元素用逗号分隔。 Approximately 50% of this data is fluff(non-value added data) and can be removed. 此数据中约有50%是绒毛(非增值数据),可以删除。 How can I remove this fluff with multiple string removals? 如何通过多次删除字符串来删除该绒毛? I am new to Python. 我是Python的新手。

I can read the data. 我可以读取数据。 I am unable to change the data. 我无法更改数据。 Variable x in the code below would be the changed string by row. 下面代码中的变量x将是逐行更改的字符串。

with open('som_w.csv','r+') as file:
    reader = csv.reader(file, delimiter=',')
    for i, row in enumerate(reader):
        print(row)
        print(i+1)

writer = csv.writer(file, delimiter=',')
for row in writer:
    x = re.sub(r'<.*?>',"",writer)
    print(x)

file.close()

The current error is the csv.writer is not iterable. 当前错误是csv.writer不可迭代。 I believe I'm heading down the wrong path. 我相信我正在走错路。

Take a look at comments. 看一下评论。 I think it should help. 我认为这应该有所帮助。

with open('som_w.csv','r+') as file:
    reader = csv.reader(file, delimiter=',')
    for i, row in enumerate(reader):
        print(row)
        print(i+1)

writer = csv.writer(file, delimiter=',') # isn't `file` out of scope?
for row in writer:
    x = re.sub(r'<.*?>',"",writer)
    print(x)

file.close() # while using `with`, it's unnecessary to close file.

Look at this post , there is an example for a function which replace all lines with help of regular expression. 看一下这篇文章 ,有一个函数示例,该函数可以使用正则表达式替换所有行。

Then try this: 然后试试这个:

import fileinput
import sys

def replaceAll(file, searchExp, replaceExp):
    with fileinput.input(file) as f:
        for line in f:
            if searchExp in line:
                line = line.replace(searchExp, replaceExp)
            sys.stdout.write(line)

replaceAll('som_w.csv', r'<.*?>', "")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM