[英]Validate first 3 rows of txt file (tsv) in Python
I have been trying to build a validation rule for txt files that get uploaded to my environment.我一直在尝试为上传到我的环境的 txt 文件构建验证规则。 The files are tab separated and I need to validate the first 3 rows that are in a format such as:
这些文件是制表符分隔的,我需要验证格式如下的前 3 行:
## This Text Here
## This Text Here
## This Text Here
I need to build a pass fail validation.我需要建立一个通过失败验证。 I have tried doing this with the inbuilt csv function in python with no luck so far.
我已经尝试使用 python 中的内置 csv function 执行此操作,但到目前为止没有运气。 Would appreciate any advice on the best route to go.
希望获得有关通往 go 的最佳路线的任何建议。
Try this:尝试这个:
### it depends on how you open the file but...
# open using with..
with open("test.tsv") as inData:
# split lines on tabs...
allLines = [l.split("\t") for l in inData]
# get the lines in question:
testLines = [l[0] for l in allLines[:3]]
# then you could use assert
for l in testLines:
assert(l.startswith("##"))
# and whatever other validation you need for the string
### you could ad try/except
try:
for l in testLines:
assert(l.startswith("##"))
except AssertionError as e:
print(e, "please use a validated file!")
Further reading: https://www.tutorialspoint.com/python/python_exceptions.htm进一步阅读: https://www.tutorialspoint.com/python/python_exceptions.htm
Maybe you should give a try pandas:也许您应该尝试一下 pandas:
import pandas as pd
file_name = # your file name
csv = pd.read_csv(file_name, sep='\t')
# do your stuff
Documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html文档: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.