簡體   English   中英

如何獲取JSON格式的數據框

[英]How to get the dataframe out of a json format

在這里,我有一個json格式的數據,我想獲取特定的值作為我的列名以及相應的值。

數據:

{
"552783667052167168": {
    "552783667052167168": {
        "contributors": null,
        "truncated": false,
        "text": "France: 10 people dead after shooting at HQ of satirical weekly newspaper #CharlieHebdo, according to witnesses ",
        "in_reply_to_status_id": null,
        "id": 552783667052167168,


        }

    "552785374507175936": {
        "contributors": null,
        "truncated": false,
        "text": "MT @euronews France: 10 dead after shooting at HQ of satirical weekly #CharlieHebdo. If Zionists/Jews did this they'd be nuking Israel",
        "in_reply_to_status_id": 552783667052167168,
        "id": 552785374507175936,

        }
    "552786226546495488": {
        "contributors": null,
        "truncated": false,
        "text": "@j0nathandavis They who? Stupid and partial opinions like this one only add noise to any debate.",
        "in_reply_to_status_id": 552785374507175936,
        "id": 552786226546495488
        }
    }
"552791196247269378": {
    "552791196247269378": {
        "contributors": null,
        "truncated": false,
        "text": "BREAKING: At least 10 killed in shooting at French satirical newspaper Charlie Hebdo, Paris prosecutor's office says. ,
        "in_reply_to_status_id": null,
        "id": 552791196247269378
        }
    "552791516360765440": {
        "contributors": null,
        "truncated": false,
        "text": "@cnni 11 Killed now",
        "in_reply_to_status_id": 552791196247269378,
        "id": 552791516360765440
        }
    "552791567401238529": {
        "contributors": null,
        "truncated": false,
        "text": "@cnni 11 died",
        "in_reply_to_status_id": 552791196247269378,
        "id": 552791567401238529
        }
    }

我想將相應的列mainID和文本作為我的列。 有一件事情完成了,這里是第一個ID,即552783667052167168也有一個文本,如果您看到格式, { "552783667052167168": { "552783667052167168": {文字,我們為孩子們建立了另外兩列。

輸出:

ParentID           parentText              ChildID         childText
552783667052167168 "France: 10 people dead 552785374507175936 "MT @euronews France: 10 dead after
552783667052167168 "France: 10 people dead 552786226546495488 "@j0nathandavis They who? 
552791196247269378  "BREAKING: At least 10 killed  552791516360765440 "@cnni 11 Killed now"
552791196247269378  "BREAKING: At least 10 killed  552791567401238529  "@cnni 11 died"

在這里,我們將有"in_reply_to_status_id": null ,如果其父ID為null。 我想我們可以以此為原則。

編輯一個:

直到這里我都可以對其進行編碼,但是源tweet的文本仍在繼續。

for sourceTweet, tweets in dataTrain.items():
    #print(sourceTweet)
    for tweet, tweetContent in tweets.items():
        #print(tweet)
        for iTweet, iTweetContent in tweets.items():
            #print(iTweet)
            if (sourceTweet==iTweet):
                sourceTweetContent = iTweetContent
                sourceTweetText = iTweetContent["text"]
                break
        for jTweet, jTweetContent in tweets.items():
            #print(jTweet)
            if (tweetContent["in_reply_to_status_id"]==jTweet):
                replyToTweetContent = jTweetContent
                replyToTweetText = jTweetContent["text"]
                print(replyToTweetText)
                break

嘗試這個!!

a = """{
"552783667052167168": {
    "552783667052167168": {
        "contributors": null,
        "truncated": false,
        "text": "France: 10 people dead after shooting at HQ of satirical weekly newspaper #CharlieHebdo, according to witnesses",
        "in_reply_to_status_id": null,
        "id": 552783667052167168
        },
    "552785374507175936": {
        "contributors": null,
        "truncated": false,
        "text": "MT @euronews France: 10 dead after shooting at HQ of satirical weekly #CharlieHebdo. If Zionists/Jews did this they'd be nuking Israel",
        "in_reply_to_status_id": 552783667052167168,
        "id": 552785374507175936
        },
    "552786226546495488": {
        "contributors": null,
        "truncated": false,
        "text": "@j0nathandavis They who? Stupid and partial opinions like this one only add noise to any debate.",
        "in_reply_to_status_id": 552785374507175936,
        "id": 552786226546495488
        }
    },
"552791196247269378": {
    "552791196247269378": {
        "contributors": null,
        "truncated": false,
        "text": "BREAKING: At least 10 killed in shooting at French satirical newspaper Charlie Hebdo, Paris prosecutor's office says." ,
        "in_reply_to_status_id": null,
        "id": 552791196247269378
        },
    "552791516360765440": {
        "contributors": null,
        "truncated": false,
        "text": "@cnni 11 Killed now",
        "in_reply_to_status_id": 552791196247269378,
        "id": 552791516360765440
        },
    "552791567401238529": {
        "contributors": null,
        "truncated": false,
        "text": "@cnni 11 died",
        "in_reply_to_status_id": 552791196247269378,
        "id": 552791567401238529
        }
    }
}"""

data = json.loads(a)
df = pd.DataFrame(columns=['ParentId','parentText','ChildId','childText'])

l = []
pos = 0
for a in data:
    for d in data[a]:
        if d == a:
            l.append(a)
            l.append(data[a][d]['text'])
        else:
            l.append(d)
            l.append(data[a][d]['text'])
            df.loc[pos] = l
            l.remove(d)
            l.remove(data[a][d]['text'])
            pos+=1
    l = []        

輸出量

             ParentId                                         parentText  \
0  552783667052167168  France: 10 people dead after shooting at HQ of...   
1  552783667052167168  France: 10 people dead after shooting at HQ of...   
2  552791196247269378  BREAKING: At least 10 killed in shooting at Fr...   
3  552791196247269378  BREAKING: At least 10 killed in shooting at Fr...   

              ChildId                                          childText  
0  552785374507175936  MT @euronews France: 10 dead after shooting at...  
1  552786226546495488  @j0nathandavis They who? Stupid and partial op...  
2  552791516360765440                                @cnni 11 Killed now  
3  552791567401238529                                      @cnni 11 died  

這可能不是最優雅的方法,但它是一種解決方案。 希望能幫助到你:

# get the parent keys
parentkeys = list(json.keys())

# create lists to fill for columns later
parentids = []
childids = []
contributors = []
truncated = []
text = []
in_reply_to_status_id = []
id =[]

# get the data out the json
for parentkey in parentkeys:
    for child in json[parentkey]:
        parentids.append(parentkey)
        childids.append(child)
        contributors.append(json[parentkey][child]['contributors'])
        truncated.append(json[parentkey][child]['truncated'])
        text.append(json[parentkey][child]['text'])
        in_reply_to_status_id.append(json[parentkey][child]['in_reply_to_status_id'])
        id.append(json[parentkey][child]['id'])

# create the dataframe out the of the lists        
df = pd.DataFrame({'ParentID':parentids,
                   'ChildID':childids,
                   'contributors':contributors,
                   'truncated':truncated,
                   'text':text,
                   'in_reply_to_status_id':in_reply_to_status_id,
                   'id':id})

因此,現在我們必須按照您要求的格式轉換dataframe

# copy the text as parent text if it doenst have a child id
df['parentText'] = np.where(df.in_reply_to_status_id == 'null', df.text, None)

# fill the rows below untill you hit a different value rowwise
df.fillna(method='ffill', axis=0, inplace=True)

# filter the rows which have the same parent and childid
df = df[df.ParentID != df.ChildID]

# rename the column to the name which was asked
df.rename(columns={'text':'childText'}, inplace=True)

# select the 4 columns which are needed
df = df[['ParentID', 'parentText', 'ChildID', 'childText']]

輸出量

    ParentID            parentText                                        ChildID    childText
1   552783667052167168  France: 10 people dead after shooting at HQ of... 552785374507175936    MT @euronews France: 10 dead after shooting at...
2   552783667052167168  France: 10 people dead after shooting at HQ of...   552786226546495488  @j0nathandavis They who? Stupid and partial op...
4   552791196247269378  BREAKING: At least 10 killed in shooting at Fr...   552791516360765440  @cnni 11 Killed now
5   552791196247269378  BREAKING: At least 10 killed in shooting at Fr...   552791567401238529  @cnni 11 died

編輯
您的json在我的控制台中出現錯誤。 我已經為您清理了,請使用此工具進行測試:

json = {
"552783667052167168": {
    "552783667052167168": {
        "contributors": "null",
        "truncated": "false",
        "text": "France: 10 people dead after shooting at HQ of satirical weekly newspaper #CharlieHebdo, according to witnesses",
        "in_reply_to_status_id": "null",
        "id": 552783667052167168

        },
    "552785374507175936": {
        "contributors": "null",
        "truncated": "false",
        "text": "MT @euronews France: 10 dead after shooting at HQ of satirical weekly #CharlieHebdo. If Zionists/Jews did this they'd be nuking Israel",
        "in_reply_to_status_id": 552783667052167168,
        "id": 552785374507175936

        },
    "552786226546495488": {
        "contributors": "null",
        "truncated": "false",
        "text": "@j0nathandavis They who? Stupid and partial opinions like this one only add noise to any debate.",
        "in_reply_to_status_id": 552785374507175936,
        "id": 552786226546495488
        }
    },
"552791196247269378": {
    "552791196247269378": {
        "contributors": "null",
        "truncated": "false",
        "text": "BREAKING: At least 10 killed in shooting at French satirical newspaper Charlie Hebdo, Paris prosecutor's office says.",
        "in_reply_to_status_id": "null",
        "id": 552791196247269378
        },
    "552791516360765440": {
        "contributors": "null",
        "truncated": "false",
        "text": "@cnni 11 Killed now",
        "in_reply_to_status_id": 552791196247269378,
        "id": 552791516360765440
        },
    "552791567401238529": {
        "contributors": "null",
        "truncated": "false",
        "text": "@cnni 11 died",
        "in_reply_to_status_id": 552791196247269378,
        "id": 552791567401238529
        }
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM