简体   繁体   English

如何从csv文件的两个特定列创建元组列表?

[英]How to create a list of tuples from two specific columns of a csv file?

My csv file looks something like this: 我的csv文件看起来像这样:

{http://www.omg.org/XMI}id,begin,end,Character
18074,15,19,Tony
18120,39,46,Tony
18172,129,134,Clint
18217,175,180,Bucky
18202,245,249,Tony
18307,352,357,Bucky
18376,1297,1302,Steve
18421,1499,1504,Bruce
18489,1546,1553,Natasha
18527,1709,1712,Bucky

I would like to be able to create a list of tuples from the columns begin and end , starting from the second row, ignoring the title-row, of course. 我希望能够从beginend列创建元组列表,从第二行开始,当然忽略标题行。

So far I can create a list of tuples but for all rows and columns: 到目前为止,我可以为所有行和列创建一个元组列表:

import csv
import tkinter
from tkinter import filedialog
root_tk = tkinter.Tk()
root_tk.wm_withdraw()

filename = filedialog.askopenfilename()

with open(filename, 'r') as f:
    data=[tuple(line) for line in csv.reader(f)]

print(data)



root_tk.destroy()
root_tk.mainloop()

Current output: 电流输出:

[('{http://www.omg.org/XMI}id', 'begin', 'end', 'Character'), ('18646', '518', '520', 'Anakin'), ('18699', '982', '985', 'Jedi'), ('18714', '1018', '1020', 'Anakin'), ('18766', '1057', '1059', 'Anakin'),...

Desired output: 所需的输出:

[(15, 19), (39,46), (129, 134), (175, 180)...]

How do I limit the output to those two columns, while ignoring the first row and create a list of tuples from them? 如何在忽略第一行并从中创建元组列表的同时将输出限制为这两列?

Thanks in advance! 提前致谢!

EDIT : 编辑

I am now able to print the tuples I want, but I can't remove the first row from the output still. 现在,我可以打印所需的元组,但是仍然无法从输出中删除第一行。

Also, how do I convert the output from a string tuple to integer? 另外,如何将输出从字符串元组转换为整数?

You could use DictReader and create tuples from columns you need 您可以使用DictReader并从所需的列中创建元组

import csv

filename = filedialog.askopenfilename()
with open(filename, 'r') as f:
    data=[(int(line['begin']),int(line['end'])) for line in  csv.DictReader(f)]
    print data

output: 输出:

[(15, 19), (39, 46), (129, 134), (175, 180), (245, 249), (352, 357), (1297, 1302), (1499, 1504), (1546, 1553), (1709, 1712)]

Hope this helped :) 希望这有帮助:)

You could use pandas 你可以用pandas

In [615]: df = pd.read_csv('eg.csv')

In [616]: [(begin, end) for _, begin, end, _ in df.values.tolist()]
Out[616]:
[(15, 19),
 (39, 46),
 (129, 134),
 (175, 180),
 (245, 249),
 (352, 357),
 (1297, 1302),
 (1499, 1504),
 (1546, 1553),
 (1709, 1712)]

If you want to use the csv module you can try 如果要使用csv模块,可以尝试

In [627]: with open('eg.csv', 'r') as f:
     ...:     csv_data = next(csv.reader(f), None)          # Skip first row
     ...:     data=[(int(line[1]), int(line[2])) for line in csv.reader(f) if line]
     ...:

Or without any imported modules at all 或者根本没有任何导入的模块

In [639]: with open('eg.csv', 'r') as f:
     ...:     f.readline()               # Skip first row
     ...:     data=[tuple(map(int, line.split(',')[1:3])) for line in f.readlines() if line.strip()]
     ...:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM