简体   繁体   English

将段元组列表写入csv文件

[英]write list of paragraph tuples to a csv file

The following code is designed to write a tuple, each containing a large paragraph of text, and 2 identifiers behind them, to a single line per each entry. 以下代码用于编写元组,每个元组包含大段文本,后面有2个标识符,每个条目包含一行。

import urllib2
import json
import csv

base_url = "https://www.eventbriteapi.com/v3/events/search/?page={}
writer = csv.writer(open("./data/events.csv", "a"))
writer.writerow(["description", "category_id", "subcategory_id"])

def format_event(event):
    return event["description"]["text"].encode("utf-8").rstrip("\n\r"), event["category_id"], event["subcategory_id"]

for x in range(1, 2):
    print "fetching page - {}".format(x)
    formatted_url = base_url.format(str(x))
    resp = urllib2.urlopen(formatted_url)
    data = resp.read()
    j_data = json.loads(data)
    events = map(format_event, j_data["events"])
    for event in events:
        #print event
        writer.writerow(event)

    print "wrote out events for page - {}".format(x)

The ideal format would be to have each line contain a single paragraph, followed by the other fields listed above, yet here is a screenshot of how the data comes out. 理想的格式是让每一行包含一个段落,然后是上面列出的其他字段,下面是数据如何输出的屏幕截图。

在此输入图像描述

If instead I this line to the following: 如果相反,我将这一行改为:

writer.writerow([event])

Here is how the file now looks: 以下是文件现在的样子: 在此输入图像描述

It certainly looks much closer to what I want, but its got parenthesis around each entry which are undesirable. 它当然看起来更接近我想要的东西,但它在每个条目周围都有括号,这是不可取的。

EDIT here is a snippet that contains a sample of the data Im working with. 这里的 EDIT是一个片段,其中包含我正在使用的数据样本。

Change your csv writer to be DictWriter . 将您的csv 编写器更改为DictWriter

Make a few tweaks: 做一些调整:

def format_event(event):
    return {"description": event["description"]["text"].encode("utf-8").rstrip("\n\r"), 
            "category_id": event["category_id"], 
            "subcategory_id": event["subcategory_id"]}

May be a few other small things you need to do, but using DictWriter and formatting your data appropriately has been the easiest way to work with csv files that I've found. 可能是您需要做的其他一些小事情,但使用DictWriter并正确格式化数据是使用我发现的csv文件最简单的方法。

Can you try writing to the CSV file directly without using using the csv module? 您是否可以尝试直接写入CSV文件而无需使用csv模块? You can write/append comma-delimited strings to the CSV file just like writing to typical text files. 您可以将逗号分隔的字符串写入/附加到CSV文件,就像写入典型的文本文件一样。 Also, the way you deal with removing \\r and \\n characters might not be working. 此外,处理删除\\r\\n字符的方式可能无法正常工作。 You can use regex to find those characters and replace them with an empty string "" : 您可以使用正则表达式查找这些字符并将其替换为空字符串""

import urllib2
import json
import re

base_url = "https://www.eventbriteapi.com/v3/events/search/?page={}"

def format_event(event):
    ws_to_strip = re.compile(r"(\r|\n)")
    description = re.sub(ws_to_strip, "", event["description"]["text"].encode("utf-8"))
    return [description, event["category_id"], event["subcategory_id"]]

with open("./data/events.csv", "a") as events_file:
    events_file.write(",".join(["description", "category_id", "subcategory_id"]))

    for x in range(1, 2):
        print "fetching page - {}".format(x)
        formatted_url = base_url.format(str(x))
        resp = urllib2.urlopen(formatted_url)
        data = resp.read()
        j_data = json.loads(data)
        events = map(format_event, j_data["events"])

        for event in events:
            events_file.write(",".join(event))

        print "wrote out events for page - {}".format(x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM