简体   繁体   English

Python:跳过没有字母/数字的所有行

[英]Python: skip all lines without a letter/number

I couldn't find this asked anywhere, I'm kind of surprised.我在任何地方都找不到这个问题,我有点惊讶。 I'm trying to read in a huge file line-by-line using this:我正在尝试使用以下命令逐行读取一个巨大的文件:

with open("file.csv") as f:
    for line in f:
        splitline = line.split()

If I print(splitline) , I get many many lines with just commas, that I don't want:如果我print(splitline) ,我会得到很多行,只是逗号,我不想要:

[',,,,,,,']

Lines I do want look like:我想要的行看起来像:

['XZ02345,AAA,BBB,1.0,11.0,15.0,1.0,1.0']

I have tried all kinds of if 'XZ' in splitline: print(splitline) and if splitline[0] == "": continue type solutions, but anything I try either prints all lines or none.我已经尝试了各种if 'XZ' in splitline: print(splitline)if splitline[0] == "": continue type 解决方案,但我尝试的任何内容要么打印所有行,要么不打印。

Desired output is no lines that are just commas [',,,,,,,']所需的输出不是只有逗号[',,,,,,,']

you can use regex你可以使用regex

import re

with open("file.csv") as f:
    for line in f:
        if re.match(r'^\,*$', line) is None:
            splitline = line.split()
            print(splitline)

This tries to find string only with , , and if string has anything else, it processes the string这会尝试仅使用,查找字符串,如果字符串还有其他内容,则处理该字符串

Try this:尝试这个:

with open("file.csv") as f:
    for line in f:
        if(re.sub('[^0-9a-zA-Z]+', '', str(line))):
            splitline = line.split()

you got [',,,,,,,'] because you do not have data filled in the csv row你得到 [',,,,,,,,'] 因为你没有在 csv 行中填充数据

you could print if there is actual data:如果有实际数据,您可以打印:

delimiter = ','
if splitline.replace(delimiter, '').strip():
    print(splitline.split())

If you are really just interested if there are not only commas (which is different from your question title), you can use the following:如果您真的只是对逗号(与您的问题标题不同)感兴趣,则可以使用以下内容:

empty = True
for char in splitline[0]: 
    if char != ',':
        empty = False
        break

if empty:
   continue

Less verbose不那么冗长

if not any([char != ',' for char in splitline[0]]):
    continue

If it's really just about the exact pattern you should be able to use the comment from Aran-Fey :如果它真的只是关于确切的模式,你应该能够使用来自 Aran-Fey 的评论:

if splitline == [',,,,,,,']:
    continue

在 split() 函数下面添加条件 if splitline.replace(',','') !='': 控制打印不打印 ,,,,,,,, 行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM