[英]Reading a file in python manipulating lines
So i have this file:所以我有这个文件:
#Name, IdNb, Age, Direct, Fitness, Immune:
Bruno, cvd443, 37, <cvd221, cvd343, cvd245, cvd556>, 3, Yes
Manuela, cvd256, 72, <cvd173, cvd132>, 4, No
I want to read it and create a list of lists and i coded this:我想阅读它并创建一个列表列表,我对此进行了编码:
def readSocialNetwork (filename):
inFile = open (filename, "r")
fileContent = inFile.readlines()
fileContent = fileContent [1:]
socialNetworkList = []
for line in fileContent:
socialDetails = line.strip().split (", ")
socialNetworkList.append(socialDetails) #socialNetworkObject
return socialNetworkList
And it returns this:它返回这个:
[['Bruno', 'cvd443', '37', '<cvd221', 'cvd343', 'cvd245', 'cvd556>', '3', 'Yes'], ['Manuela', 'cvd256', '72', '<cvd173', 'cvd132>', '4', 'No']]
The only problem is i want the info between < > to be all together inside the same string, but because of the split function it doesnt happen.唯一的问题是我希望 < > 之间的信息都在同一个字符串中,但是由于拆分 function 它不会发生。 Any ideas on how to solve this?关于如何解决这个问题的任何想法?
Thanks for the help!谢谢您的帮助!
You can use the csv library to parse the comma-separated values (csv) file:您可以使用 csv 库来解析逗号分隔值 (csv) 文件:
import csv
def read_csv(filename):
with open(filename) as stream:
next(stream) # skip the first line
csv_reader = csv.reader(stream, skipinitialspace=True)
return list(csv_reader)
for stuff in read_csv("data.csv"):
print(stuff)
Output: Output:
['Bruno', 'cvd443', '37', '<cvd221', 'cvd343', 'cvd245', 'cvd556>', '3', 'Yes ']
['Manuela', 'cvd256', '72', '<cvd173', 'cvd132>', '4', 'No']
My initial solution was dead wrong.我最初的解决方案是完全错误的。 In my revised solution, I added quotes around the < and >:在我修改后的解决方案中,我在 < 和 > 周围添加了引号:
import csv
def translate(lines):
for line in lines:
yield line.replace('<', '"<').replace('>', '>"')
def read_csv(filename):
with open(filename) as stream:
next(stream) # Skip the first line
stream = translate(stream)
csv_reader = csv.reader(stream, skipinitialspace=True)
return list(csv_reader)
for stuff in read_csv("data.csv"):
print(stuff)
Output: Output:
['Bruno', 'cvd443', '37', '<cvd221, cvd343, cvd245, cvd556>', '3', 'Yes ']
['Manuela', 'cvd256', '72', '<cvd173, cvd132>', '4', 'No']
The translate function takes a bunch of lines, then add the quotes around the < and >.翻译 function 需要一堆行,然后在 < 和 > 周围添加引号。
The easiest solution is to use a standardized file format such as CSV .最简单的解决方案是使用标准化文件格式,例如 CSV 。 If you can't do that, you'll need a regular expression.如果你不能这样做,你将需要一个正则表达式。 Here's on that works on your sample input:这是适用于您的示例输入的内容:
(?:\s|^)(?:([^<].*?)(?:,|\n|$)|<(.*?)>(?:,|\n|$))
Here's how it works:以下是它的工作原理:
(?:\s|^)
matches the beginning of the input or a whitespace token (ignored) (?:\s|^)
匹配输入的开头或空白标记(忽略)(?:
opens a non-capturing group that's going to contain our two match options (?:
打开一个非捕获组,它将包含我们的两个匹配选项([^<].*?)(?:,|\n|$)
matches and captures content that's not surrounded by <>
, and is followed by a comma, newline, or string end. ([^<].*?)(?:,|\n|$)
匹配并捕获未被<>
包围的内容,后跟逗号、换行符或字符串结尾。|
or, since our non-capturing group has two options或者,因为我们的非捕获组有两个选项<(.*?)>(?:,|\n|$)
matches and captures content that's surrounded by <>
, and is followed by a comma, newline, or string end. <(.*?)>(?:,|\n|$)
匹配并捕获由<>
包围的内容,后跟逗号、换行符或字符串结尾。)
closes the non-capturing group containing the new options. )
关闭包含新选项的非捕获组。Note this doesn't handle comment lines, I'll leave that to you to figure out as I'm a strong believer in being able to work with your own code base.请注意,这不处理注释行,我将把它留给您自己弄清楚,因为我坚信能够使用您自己的代码库。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.