简体   繁体   English

在 python 操作行中读取文件

[英]Reading a file in python manipulating lines

So i have this file:所以我有这个文件:

#Name, IdNb, Age, Direct, Fitness, Immune:
Bruno, cvd443, 37, <cvd221, cvd343, cvd245, cvd556>, 3, Yes 
Manuela, cvd256, 72, <cvd173, cvd132>, 4, No

I want to read it and create a list of lists and i coded this:我想阅读它并创建一个列表列表,我对此进行了编码:

def readSocialNetwork (filename):

    inFile = open (filename, "r")
    fileContent = inFile.readlines()
    fileContent = fileContent [1:]

    socialNetworkList = []

    for line in fileContent:

        socialDetails = line.strip().split (", ")
        socialNetworkList.append(socialDetails) #socialNetworkObject

    return socialNetworkList

And it returns this:它返回这个:

[['Bruno', 'cvd443', '37', '<cvd221', 'cvd343', 'cvd245', 'cvd556>', '3', 'Yes'], ['Manuela', 'cvd256', '72', '<cvd173', 'cvd132>', '4', 'No']]

The only problem is i want the info between < > to be all together inside the same string, but because of the split function it doesnt happen.唯一的问题是我希望 < > 之间的信息都在同一个字符串中,但是由于拆分 function 它不会发生。 Any ideas on how to solve this?关于如何解决这个问题的任何想法?

Thanks for the help!谢谢您的帮助!

You can use the csv library to parse the comma-separated values (csv) file:您可以使用 csv 库来解析逗号分隔值 (csv) 文件:

import csv


def read_csv(filename):
    with open(filename) as stream:
        next(stream)  # skip the first line
        csv_reader = csv.reader(stream, skipinitialspace=True)
        return list(csv_reader)


for stuff in read_csv("data.csv"):
    print(stuff)

Output: Output:

['Bruno', 'cvd443', '37', '<cvd221', 'cvd343', 'cvd245', 'cvd556>', '3', 'Yes ']
['Manuela', 'cvd256', '72', '<cvd173', 'cvd132>', '4', 'No']

Update更新

My initial solution was dead wrong.我最初的解决方案是完全错误的。 In my revised solution, I added quotes around the < and >:在我修改后的解决方案中,我在 < 和 > 周围添加了引号:

import csv


def translate(lines):
    for line in lines:
        yield line.replace('<', '"<').replace('>', '>"')

def read_csv(filename):
    with open(filename) as stream:
        next(stream)  # Skip the first line
        stream = translate(stream)
        csv_reader = csv.reader(stream, skipinitialspace=True)
        return list(csv_reader)


for stuff in read_csv("data.csv"):
    print(stuff)

Output: Output:

['Bruno', 'cvd443', '37', '<cvd221, cvd343, cvd245, cvd556>', '3', 'Yes ']
['Manuela', 'cvd256', '72', '<cvd173, cvd132>', '4', 'No']

The translate function takes a bunch of lines, then add the quotes around the < and >.翻译 function 需要一堆行,然后在 < 和 > 周围添加引号。

The easiest solution is to use a standardized file format such as CSV .最简单的解决方案是使用标准化文件格式,例如 CSV If you can't do that, you'll need a regular expression.如果你不能这样做,你将需要一个正则表达式。 Here's on that works on your sample input:这是适用于您的示例输入的内容:

(?:\s|^)(?:([^<].*?)(?:,|\n|$)|<(.*?)>(?:,|\n|$))

Here's how it works:以下是它的工作原理:

  • (?:\s|^) matches the beginning of the input or a whitespace token (ignored) (?:\s|^)匹配输入的开头或空白标记(忽略)
  • (?: opens a non-capturing group that's going to contain our two match options (?:打开一个非捕获组,它将包含我们的两个匹配选项
  • ([^<].*?)(?:,|\n|$) matches and captures content that's not surrounded by <> , and is followed by a comma, newline, or string end. ([^<].*?)(?:,|\n|$)匹配并捕获未被<>包围的内容,后跟逗号、换行符或字符串结尾。
  • | or, since our non-capturing group has two options或者,因为我们的非捕获组有两个选项
  • <(.*?)>(?:,|\n|$) matches and captures content that's surrounded by <> , and is followed by a comma, newline, or string end. <(.*?)>(?:,|\n|$)匹配并捕获由<>包围的内容,后跟逗号、换行符或字符串结尾。
  • ) closes the non-capturing group containing the new options. )关闭包含新选项的非捕获组。

Note this doesn't handle comment lines, I'll leave that to you to figure out as I'm a strong believer in being able to work with your own code base.请注意,这不处理注释行,我将把它留给您自己弄清楚,因为我坚信能够使用您自己的代码库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM