在Python中从pandas DataFrame提取数据

Question

This is a sample of the DataFrame that I have and I want to extract the corresponding data of "totalCount" according to each time frame. 这是我拥有的DataFrame的示例，我想根据每个时间范围提取“ totalCount”的相应数据。

df = [{"username": "last",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"hour\":\"01:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"topciCount\":31}"
},
{"username": "truk",
 "time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"totalCount\":362},{\"hour\":\"01:00\",\"postCount\":\"22\",\"topicCount\":\"8\",\"topicCount\":355}"
}]
df = pd.DataFrame(df)
df

I have used this code to get the "postCount" of both '00:00' and '01:00': 我已经使用此代码来获取“ 00:00”和“ 01:00”的“ postCount”：

df_h0 = df.copy()
df_h0['hour']='00:00'
df_h0['totalCount']=df.time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)

df_h1 = df.copy()
df_h1['hour']='01:00'
df_h1['totalCount']=df.time_data.str.split('"01:00","postCount":"').str[1].str.split('","topic').str[0]
df_h1 = df_h1.fillna(0)

df_tot = df_h0.append([df_h1])
df_tot.head()

But now I want to get the "totalCount" which is not just next to the hours. 但是现在我想获取的“ totalCount”不仅仅在小时数旁边。 Anyone knows how to do that? 有人知道该怎么做吗？

Expected output: 预期产量：

                           time_data                 username   hour    totalCount
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    00:00   80
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    00:00   362
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    01:00   31
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    01:00   355

Answer 1

To solve the problem as it stands you should make a regular expression to extract the text at the correct position. 要解决此问题，您应该制作一个正则表达式以将文本提取到正确的位置。 But this is not the correct way to approach your problem. 但这不是解决问题的正确方法。 Ideally, to make the most of the DataFrame structure, you should parse the data when it is received into DataFrame colums so that you could use something more convenient and efficient: 理想情况下，为了充分利用DataFrame结构，应该将接收到的数据解析为DataFrame列，以便可以使用更方便，更高效的方法：

totals_over_one_hundred = df.loc[df['totalCount'] > 100]

在Python中从pandas DataFrame提取数据

问题描述

1 个解决方案

解决方案1
0 2019-07-22 18:57:19

在Python中从pandas DataFrame提取数据

问题描述

1 个解决方案

解决方案1 0 2019-07-22 18:57:19

解决方案1
0 2019-07-22 18:57:19