简体   繁体   English

在Python中从pandas DataFrame提取数据

[英]Extract data from a pandas DataFrame in Python

This is a sample of the DataFrame that I have and I want to extract the corresponding data of "totalCount" according to each time frame. 这是我拥有的DataFrame的示例,我想根据每个时间范围提取“ totalCount”的相应数据。

df = [{"username": "last",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"hour\":\"01:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"topciCount\":31}"
},
{"username": "truk",
 "time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"totalCount\":362},{\"hour\":\"01:00\",\"postCount\":\"22\",\"topicCount\":\"8\",\"topicCount\":355}"
}]
df = pd.DataFrame(df)
df

I have used this code to get the "postCount" of both '00:00' and '01:00': 我已经使用此代码来获取“ 00:00”和“ 01:00”的“ postCount”:

df_h0 = df.copy()
df_h0['hour']='00:00'
df_h0['totalCount']=df.time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)

df_h1 = df.copy()
df_h1['hour']='01:00'
df_h1['totalCount']=df.time_data.str.split('"01:00","postCount":"').str[1].str.split('","topic').str[0]
df_h1 = df_h1.fillna(0)

df_tot = df_h0.append([df_h1])
df_tot.head()

But now I want to get the "totalCount" which is not just next to the hours. 但是现在我想获取的“ totalCount”不仅仅在小时数旁边。 Anyone knows how to do that? 有人知道该怎么做吗?

Expected output: 预期产量:

                           time_data                 username   hour    totalCount
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    00:00   80
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    00:00   362
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    01:00   31
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    01:00   355

To solve the problem as it stands you should make a regular expression to extract the text at the correct position. 要解决此问题,您应该制作一个正则表达式以将文本提取到正确的位置。 But this is not the correct way to approach your problem. 但这不是解决问题的正确方法。 Ideally, to make the most of the DataFrame structure, you should parse the data when it is received into DataFrame colums so that you could use something more convenient and efficient: 理想情况下,为了充分利用DataFrame结构,应该将接收到的数据解析为DataFrame列,以便可以使用更方便,更高效的方法:

totals_over_one_hundred = df.loc[df['totalCount'] > 100]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM