在 python 操作行中读取文件

Question

So i have this file:所以我有这个文件：

#Name, IdNb, Age, Direct, Fitness, Immune:
Bruno, cvd443, 37, <cvd221, cvd343, cvd245, cvd556>, 3, Yes 
Manuela, cvd256, 72, <cvd173, cvd132>, 4, No

I want to read it and create a list of lists and i coded this:我想阅读它并创建一个列表列表，我对此进行了编码：

def readSocialNetwork (filename):

    inFile = open (filename, "r")
    fileContent = inFile.readlines()
    fileContent = fileContent [1:]

    socialNetworkList = []

    for line in fileContent:

        socialDetails = line.strip().split (", ")
        socialNetworkList.append(socialDetails) #socialNetworkObject

    return socialNetworkList

And it returns this:它返回这个：

[['Bruno', 'cvd443', '37', '<cvd221', 'cvd343', 'cvd245', 'cvd556>', '3', 'Yes'], ['Manuela', 'cvd256', '72', '<cvd173', 'cvd132>', '4', 'No']]

The only problem is i want the info between < > to be all together inside the same string, but because of the split function it doesnt happen.唯一的问题是我希望 < > 之间的信息都在同一个字符串中，但是由于拆分 function 它不会发生。 Any ideas on how to solve this?关于如何解决这个问题的任何想法？

Thanks for the help!谢谢您的帮助！

Answer 1

You can use the csv library to parse the comma-separated values (csv) file:您可以使用 csv 库来解析逗号分隔值 (csv) 文件：

import csv


def read_csv(filename):
    with open(filename) as stream:
        next(stream)  # skip the first line
        csv_reader = csv.reader(stream, skipinitialspace=True)
        return list(csv_reader)


for stuff in read_csv("data.csv"):
    print(stuff)

Output: Output：

['Bruno', 'cvd443', '37', '<cvd221', 'cvd343', 'cvd245', 'cvd556>', '3', 'Yes ']
['Manuela', 'cvd256', '72', '<cvd173', 'cvd132>', '4', 'No']

Update更新

My initial solution was dead wrong.我最初的解决方案是完全错误的。 In my revised solution, I added quotes around the < and >:在我修改后的解决方案中，我在 < 和 > 周围添加了引号：

import csv


def translate(lines):
    for line in lines:
        yield line.replace('<', '"<').replace('>', '>"')

def read_csv(filename):
    with open(filename) as stream:
        next(stream)  # Skip the first line
        stream = translate(stream)
        csv_reader = csv.reader(stream, skipinitialspace=True)
        return list(csv_reader)


for stuff in read_csv("data.csv"):
    print(stuff)

Output: Output：

['Bruno', 'cvd443', '37', '<cvd221, cvd343, cvd245, cvd556>', '3', 'Yes ']
['Manuela', 'cvd256', '72', '<cvd173, cvd132>', '4', 'No']

The translate function takes a bunch of lines, then add the quotes around the < and >.翻译 function 需要一堆行，然后在 < 和 > 周围添加引号。

Answer 2

The easiest solution is to use a standardized file format such as CSV .最简单的解决方案是使用标准化文件格式，例如 CSV 。 If you can't do that, you'll need a regular expression.如果你不能这样做，你将需要一个正则表达式。 Here's on that works on your sample input:这是适用于您的示例输入的内容：

(?:\s|^)(?:([^<].*?)(?:,|\n|$)|<(.*?)>(?:,|\n|$))

Here's how it works:以下是它的工作原理：

(?:\s|^) matches the beginning of the input or a whitespace token (ignored) (?:\s|^)匹配输入的开头或空白标记（忽略）
(?: opens a non-capturing group that's going to contain our two match options (?:打开一个非捕获组，它将包含我们的两个匹配选项
([^<].*?)(?:,|\n|$) matches and captures content that's not surrounded by <> , and is followed by a comma, newline, or string end. ([^<].*?)(?:,|\n|$)匹配并捕获未被<>包围的内容，后跟逗号、换行符或字符串结尾。
| or, since our non-capturing group has two options或者，因为我们的非捕获组有两个选项
<(.*?)>(?:,|\n|$) matches and captures content that's surrounded by <> , and is followed by a comma, newline, or string end. <(.*?)>(?:,|\n|$)匹配并捕获由<>包围的内容，后跟逗号、换行符或字符串结尾。
) closes the non-capturing group containing the new options. )关闭包含新选项的非捕获组。

Note this doesn't handle comment lines, I'll leave that to you to figure out as I'm a strong believer in being able to work with your own code base.请注意，这不处理注释行，我将把它留给您自己弄清楚，因为我坚信能够使用您自己的代码库。

在 python 操作行中读取文件

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-11 19:07:49

Update更新

解决方案2
0 2020-05-11 19:10:27

在 python 操作行中读取文件

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-11 19:07:49

Update更新

解决方案2 0 2020-05-11 19:10:27

解决方案1
1 已采纳 2020-05-11 19:07:49

解决方案2
0 2020-05-11 19:10:27