简体   繁体   English

无法用python写入csv文件

[英]Cannot write csv file with python

I am trying to convert json file into csv file.我正在尝试将 json 文件转换为 csv 文件。 The json file came from tweepy. json 文件来自 tweepy。

import json
import csv

fo = open('Sclass.json', 'r')
fw = open('Hasil_Tweets.csv', 'a')

for line in fo:
        try:
                tweet = json.loads(line)
                fw.write(tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],tweet['user']['statuses_count'],tweet['user']['friends_count'],tweet['user']['followers_count'],tweet['place']['bounding_box']['coordinates'],tweet['text']+"\n")
        except:
                continue

But when I print it, it works.但是当我打印它时,它起作用了。 And when I write just fw.write(tweet['text']) it works.当我只写fw.write(tweet['text'])它工作。

I am a noob on either python and tweepy.我是 python 和 tweepy 的菜鸟。 But my instinct say, this problem is related to the json file itself.但我的直觉说,这个问题与json文件本身有关。

This is the json file itself:这是json文件本身:

{
    "created_at": "Wed Oct 11 08:36:21 +0000 2017",
    "id": 918032510927355904,
    "id_str": "918032510927355904",
    "text": "@irfanzayo @puisisi @tasyak Lo tuh kebiasaan overthinking \ud83d\ude24",
    "display_text_range": [
        28,
        59
    ],
    "source": "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>",
    "truncated": false,
    "in_reply_to_status_id": 918032029094047746,
    "in_reply_to_status_id_str": "918032029094047746",
    "in_reply_to_user_id": 60049976,
    "in_reply_to_user_id_str": "60049976",
    "in_reply_to_screen_name": "irfanzayo",
    "user": {
        "id": 59980455,
        "id_str": "59980455",
        "name": "Mutiara Sisyanni D",
        "screen_name": "MutiaraSisyanni",
        "location": "Jakarta, Indonesia",
        "url": "http://mutiarasyn.wixsite.com/mutiarasisyanni",
        "description": null,
        "translator_type": "none",
        "protected": false,
        "verified": false,
        "followers_count": 354,
        "friends_count": 237,
        "listed_count": 1,
        "favourites_count": 326,
        "statuses_count": 6507,
        "created_at": "Sat Jul 25 04:31:47 +0000 2009",
        "utc_offset": 25200,
        "time_zone": "Jakarta",
        "geo_enabled": true,
        "lang": "en",
        "contributors_enabled": false,
        "is_translator": false,
        "profile_background_color": "FA8C9E",
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme5/bg.gif",
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme5/bg.gif",
        "profile_background_tile": false,
        "profile_link_color": "FF8A94",
        "profile_sidebar_border_color": "FFFFFF",
        "profile_sidebar_fill_color": "99CC33",
        "profile_text_color": "3E4415",
        "profile_use_background_image": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/486497248293826560/FANdzhL9_normal.jpeg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/486497248293826560/FANdzhL9_normal.jpeg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/59980455/1404826066",
        "default_profile": false,
        "default_profile_image": false,
        "following": null,
        "follow_request_sent": null,
        "notifications": null
    },
    "geo": null,
    "coordinates": null,
    "place": {
        "id": "66555622726ab358",
        "url": "https://api.twitter.com/1.1/geo/id/66555622726ab358.json",
        "place_type": "city",
        "name": "Setia Budi",
        "full_name": "Setia Budi, Indonesia",
        "country_code": "ID",
        "country": "Indonesia",
        "bounding_box": {
            "type": "Polygon",
            "coordinates": [
                [
                    [
                        106.817351,
                        -6.24152
                    ],
                    [
                        106.817351,
                        -6.201177
                    ],
                    [
                        106.852353,
                        -6.201177
                    ],
                    [
                        106.852353,
                        -6.24152
                    ]
                ]
            ]
        },
        "attributes": {}
    },
    "contributors": null,
    "is_quote_status": false,
    "quote_count": 0,
    "reply_count": 0,
    "retweet_count": 0,
    "favorite_count": 0,
    "entities": {
        "hashtags": [],
        "urls": [],
        "user_mentions": [
            {
                "screen_name": "irfanzayo",
                "name": "irfan zayanto",
                "id": 60049976,
                "id_str": "60049976",
                "indices": [
                    0,
                    10
                ]
            },
            {
                "screen_name": "puisisi",
                "name": "Puisi Pancara",
                "id": 32809069,
                "id_str": "32809069",
                "indices": [
                    11,
                    19
                ]
            },
            {
                "screen_name": "tasyak",
                "name": "Tasya Kurnia",
                "id": 41986880,
                "id_str": "41986880",
                "indices": [
                    20,
                    27
                ]
            }
        ],
        "symbols": []
    },
    "favorited": false,
    "retweeted": false,
    "filter_level": "low",
    "lang": "in",
    "timestamp_ms": "1507710981481"
}

Another error:另一个错误:

Traceback (most recent call last): File "C:\\Users\\User\\Desktop\\fase 1-20170930T062552Z-001\\transformCSV.py", line 7, in tweet = json.loads(line) File "C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\json__init__.py", line 354, in loads return _default_decoder.decode(s) File "C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\json\\decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\json\\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)回溯(最近一次调用):文件“C:\\Users\\User\\Desktop\\fase 1-20170930T062552Z-001\\transformCSV.py”,第 7 行,在 tweet = json.loads(line) 文件“C:\\Users\\ User\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\json__init__.py",第 354 行,加载返回 _default_decoder.decode(s) File "C:\\Users\\User\\AppData\\Local\\Programs\\Python\\ Python36-32\\lib\\json\\decoder.py”,第 339 行,在解码对象中,end = self.raw_decode(s, idx=_w(s, 0).end()) 文件“C:\\Users\\User\\ AppData\\Local\\Programs\\Python\\Python36-32\\lib\\json\\decoder.py”,第 357 行,在 raw_decode 中引发 JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value :第 2 行第 1 列(字符 1)

    Traceback (most recent call last):
  File "C:\Users\Tanabata\Desktop\Putang ina mo\spli.py", line 8, in <module>
    tweet = json.load(fo)
  File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 3 column 1 (char 2893)

Json file itself Json 文件本身

You don't use csv .你不使用csv You have to create a writer :你必须创建一个writer

import json
import csv

with open('Sclass.json', 'r') as fo, open('Hasil_Tweets.csv', 'a') as fw:
    writer = csv.writer(fw)
    for line in fo:
        tweet = json.loads(line)
        writer.writerow([tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],
            tweet['user']['statuses_count'],tweet['user']['friends_count'],
            tweet['user']['followers_count'],
            tweet['place']['bounding_box']['coordinates'],tweet['text']])

For your second question, it seems, you don't have a json-lines-file but a file with a single json dataset.对于您的第二个问题,您似乎没有 json-lines-file,而是具有单个 json 数据集的文件。 So reading line by line is wrong, you should read the file as a whole:所以逐行读取是错误的,您应该将文件整体读取:

with open('Sclass.json', 'r') as fo:
    tweet = json.load(fo)

with open('Hasil_Tweets.csv', 'a') as fw
    writer = csv.writer(fw)
    writer.writerow([tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],
        tweet['user']['statuses_count'],tweet['user']['friends_count'],
        tweet['user']['followers_count'],
        tweet['place']['bounding_box']['coordinates'],tweet['text']])

As soon as you are working with tables (csv being one) think pandas (my opinion).一旦您使用表格(csv 就是其中之一),就会想到 Pandas(我的观点)。

In this case we can use pandas json_normalize to interpret your json file.在这种情况下,我们可以使用 pandas json_normalize 来解释您的 json 文件。

import json
from pandas.io.json import json_normalize

with open("Sclass.json.json") as f:
    df = json_normalize(json.load(f))

cols = ["id","timestamp_ms","user.name",
        "user.statuses_count","user.friends_count","user.followers_count",
        "place.bounding_box.coordinates","text"]

df[cols].to_csv("Hasil_Tweets.csv",sep=",",index=False) # outputs to csv

Pandas come with many output options, one of them being a html table. Pandas 有许多输出选项,其中之一是 html 表。 I will use this to show the outut:我将使用它来显示输出:

print(df[cols].to_html(index=False)) # outputs to html to show result

Output输出

 <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th>id</th> <th>timestamp_ms</th> <th>user.name</th> <th>user.statuses_count</th> <th>user.friends_count</th> <th>user.followers_count</th> <th>place.bounding_box.coordinates</th> <th>text</th> </tr> </thead> <tbody> <tr> <td>918032510927355904</td> <td>1507710981481</td> <td>Mutiara Sisyanni D</td> <td>6507</td> <td>237</td> <td>354</td> <td>[[[106.817351, -6.24152], [106.817351, -6.2011...</td> <td>@irfanzayo @puisisi @tasyak Lo tuh kebiasaan o...</td> </tr> </tbody> </table>

I'm adding this as another answer.我将此添加为另一个答案。

The *.json you shared is actually a big file containing multiple json strings but just every two rows.您共享的 *.json 实际上是一个包含多个 json 字符串但每两行的大文件。 How you got this file from the beginning I don't know but you can read it in using this:我不知道你是如何从一开始就得到这个文件的,但你可以用这个来阅读它:

import json
import pandas as pd

with open("Sclass.json") as f:
    data = [json.loads(row.strip()) for row in f.readlines()[0::2]]

However, when reading this structure to a dataframe you can see that it really isn't any clear structure:但是,当将此结构读取到数据帧时,您会发现它确实不是任何清晰的结构:

pd.DataFrame(data)

Conclusion: Your issue is something else entirely.结论:您的问题完全是另一回事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM