繁体   English   中英

在Python中从pandas DataFrame提取数据

[英]Extract data from a pandas DataFrame in Python

这是我拥有的DataFrame的示例,我想根据每个时间范围提取“ totalCount”的相应数据。

df = [{"username": "last",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"hour\":\"01:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"topciCount\":31}"
},
{"username": "truk",
 "time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"totalCount\":362},{\"hour\":\"01:00\",\"postCount\":\"22\",\"topicCount\":\"8\",\"topicCount\":355}"
}]
df = pd.DataFrame(df)
df

我已经使用此代码来获取“ 00:00”和“ 01:00”的“ postCount”:

df_h0 = df.copy()
df_h0['hour']='00:00'
df_h0['totalCount']=df.time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)

df_h1 = df.copy()
df_h1['hour']='01:00'
df_h1['totalCount']=df.time_data.str.split('"01:00","postCount":"').str[1].str.split('","topic').str[0]
df_h1 = df_h1.fillna(0)

df_tot = df_h0.append([df_h1])
df_tot.head()

但是现在我想获取的“ totalCount”不仅仅在小时数旁边。 有人知道该怎么做吗?

预期产量:

                           time_data                 username   hour    totalCount
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    00:00   80
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    00:00   362
0   {"hours":[{"hour":"00:00","postCount":"5","top...   last    01:00   31
1   {"hours":[{"hour":"00:00","postCount":"11","to...   truk    01:00   355

要解决此问题,您应该制作一个正则表达式以将文本提取到正确的位置。 但这不是解决问题的正确方法。 理想情况下,为了充分利用DataFrame结构,应该将接收到的数据解析为DataFrame列,以便可以使用更方便,更高效的方法:

totals_over_one_hundred = df.loc[df['totalCount'] > 100]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM