简体   繁体   English

读取 csv 文件:列表索引超出范围

[英]reading csv files: list index out of range

I'm supposed to read a CSV file with regards to facebook updates on donald trump.我应该阅读有关唐纳德特朗普的 facebook 更新的 CSV 文件。 I need to create dictionaries in a list like so:我需要在这样的列表中创建字典:

[{'link_name': 'Timeline Photos',
'num_angrys': '7',
'num_comments': '543',
'num_hahas': '17',
'num_likes': '6178',
'num_loves': '572',
'num_reactions': '6813',
'num_sads': '0',
'num_shares': '359',
'num_wows': '39',
'status_id': '153080620724_10157915294545725',
'status_link': 'https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10157915294545725/?type=3',
'status_message': 'Beautiful evening in Wisconsin- THANK YOU for your incredible support tonight! Everyone get out on November 8th - and VOTE! LETS MAKE AMERICA GREAT AGAIN! -DJT',
'status_published': '10/17/2016 20:56:51',
'status_type': 'photo'},

using code.使用代码。 I need to get the first two status updates but when I enter the code I get an error that says, "list index out of range".我需要获取前两个状态更新,但是当我输入代码时,我收到一条错误消息,显示“列表索引超出范围”。

here is the code这是代码

def read_csv(input_file, delimiter=","):
    # your code here
    import csv
    csv_data= []
    with open(filename, "r") as csvfile: 
        for row in csvfile:
            row = row.strip("\n")
            columns = row.split(",")

            dict_row = {"link_name": columns [0],
                        "num_angrys": columns [1],
                        "num_comments":columns[2],
                        "num_hahas": columns [3],
                        "num_loves": columns [4],
                        "num_reactions": columns [5],
                        "num_sads": columns [6],
                        "num_shares": columns[7],
                        "num_wows": columns [8],
                        "status_id": columns[9],
                        "status_link": columns[10],
                        "status_message": columns [11],
                        "status_published": columns[12],
                        "status_type": columns[13]}
            csv_data.append(dict_row)


filename = "../Data/csv_data/trump_facebook.tsv"
status_updates = read_csv(filename, delimiter="\t") 
status_updates[0:2]

and this is the error message这是错误信息

IndexError                                Traceback (most recent call 
last)
<ipython-input-16-352e8f130d5d> in <module>

 27 filename = "../Data/csv_data/trump_facebook.tsv"
---> 28 status_updates = read_csv(filename, delimiter="\t")
 29 status_updates[0:2]

<ipython-input-16-352e8f130d5d> in read_csv(input_file, delimiter)
  9 
 10             dict_row = {"link_name": columns [0],
---> 11                        "num_angrys": columns [1],
 12                        "num_comments":columns[2],
 13                        "num_hahas": columns [3],

IndexError: list index out of range

any help would be greatly appreciated!任何帮助将不胜感激!

Update: I've solved it with this new code, but printing status_updates [0:2] gets me an output with the headers like so:更新:我已经用这个新代码解决了这个问题,但是打印 status_updates [0:2] 给我一个 output ,标题如下:

def read_csv(input_file, delimiter=","):
# your code here
csv_data= []
with open(filename, "r") as csvfile:     
    for row in csvfile:
        row = row.strip("\n")
        columns = row.split("\t")
        dict_row = {"link_name":columns[2],
                    "num_angrys" : columns[14],
                     "num_comments": columns[7],
                    "num_hahas": columns[12],
                     "num_likes": columns[9],
                    "num_loves": columns[10],
                     "num_reactions": columns [6],
                    "num_sads": columns[13],
                    "num_shares": columns [8],
                     "num_wows": columns [11],
                    "status_id": columns [0],
                    "status_link": columns [4],
                   "status_message": columns [1],
                    "status_published": columns [5],
                   "status_type": columns [3],}
        csv_data.append(dict_row)
return csv_data

filename = "../Data/csv_data/trump_facebook.tsv"
status_updates = read_csv(filename, delimiter="\t") 
status_updates[0:2]

output: output:

[{'link_name': 'link_name',
'num_angrys': 'num_angrys',
'num_comments': 'num_comments',
'num_hahas': 'num_hahas',
'num_likes': 'num_likes',
'num_loves': 'num_loves',
'num_reactions': 'num_reactions',
'num_sads': 'num_sads',
'num_shares': 'num_shares',
'num_wows': 'num_wows',
'status_id': 'status_id',
'status_link': 'status_link',
'status_message': 'status_message',
'status_published': 'status_published',
'status_type': 'status_type'},
{'link_name': 'Timeline Photos',
'num_angrys': '7',
'num_comments': '543',
'num_hahas': '17',
'num_likes': '6178',
'num_loves': '572',
'num_reactions': '6813',
'num_sads': '0',
'num_shares': '359',
'num_wows': '39',
'status_id': '153080620724_10157915294545725',
'status_link': 'https://www.facebook.com/DonaldTrump/photos/a.488852220724.393301.153080620724/10157915294545725/?type=3',
'status_message': 'Beautiful evening in Wisconsin- THANK YOU for your 
incredible support tonight! Everyone get out on November 8th - and VOTE! LETS 
MAKE AMERICA GREAT AGAIN! -DJT',
'status_published': '10/17/2016 20:56:51',
'status_type': 'photo'}] 

I can easily replace status_update[0:2] to [1:3] but there has to be a more elegant way to remove the header lines so I don't have to worry about using the 1 index each time I call this function.我可以轻松地将 status_update[0:2] 替换为 [1:3],但必须有一种更优雅的方法来删除 header 行,因此我不必担心每次调用此 function 时使用 1 索引。 Appreciate all your help!感谢您的所有帮助!

What does the csv file looks like. csv 文件是什么样的。 I see that the function call uses delimeter as \t but the actual code always uses , .我看到 function 调用使用分隔符作为\t但实际代码始终使用,

Also you might want to consider the python csv module for this work.此外,您可能需要考虑使用 python csv模块来完成这项工作。

You can solve your problem using this approach:您可以使用这种方法解决您的问题:

 1. read first string of csv file as header 2. construct mapping "header-row" using `zip()` function
def row_preprocess(row, delimiter='\t'):
    return row.strip('\n').split(delimiter)

def read_csv(path_to_file, delimiter='\t'):
    csv_data = []
    with open(path_to_file, 'r') as f:
        column_values = row_preprocess(next(f), delimiter)
        for row in f:
            row_values = row_preprocess(row, delimiter)
            mapping = dict(zip(column_values, row_values))
            csv_data.append(mapping)
    return csv_data 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM