[英]Create Pandas Dataframe from List of Dictionaries with missing values for some keys
everyone. 大家。
Below is the code I'm using to parse a text file: 下面是我用来解析文本文件的代码:
import pandas as pd
tags = ['129','30','32','851','9730','9882']
rows = []
file = open('D:\\python\\redi_fix\\redi_august.txt','r')
content = file.readlines()
for line in content:
for message in line.split('\t'):
try:
row_dict = {}
tag,val = message.split('=')
if tag in tags:
row_dict[tag]=val
rows.append(row_dict)
except:
pass
Creating a pandas dataframe from rows yields the following result: 从行创建pandas数据帧会产生以下结果:
129 30 32 851 9730 9882
r170557 NaN NaN NaN NaN NaN
NaN ARCA NaN NaN NaN NaN
NaN NaN 100 NaN NaN NaN
r170557 NaN NaN NaN NaN NaN
NaN ARCA NaN NaN NaN NaN
NaN NaN 300 NaN NaN NaN
Looks like every value for a key is on a different row. 看起来密钥的每个值都在不同的行上。 The result I'm struggling to achieve is all values to be on the same row - see below for example:
我努力实现的结果是所有值都在同一行 - 见下面例如:
129 30 32 851 9730 9882
r170557 ARCA 100 NaN NaN NaN
r170557 ARCA 300 NaN NaN NaN
If you want to "collapse" your NaN
s, you can perform a groupby
+ agg
on first
/ last
: 如果你想“折叠”你的
NaN
,你可以在first
/ last
上执行groupby
+ agg
:
df.groupby(df['129'].notnull().cumsum(), as_index=False).agg('first')
129 30 32 851 9730 9882
0 r170557 ARCA 100.0 NaN NaN NaN
1 r170557 ARCA 300.0 NaN NaN NaN
Using your result dataframe, we need sorted
and dropna
使用结果数据
dropna
,我们需要sorted
和dropna
result.apply(lambda x : sorted(x,key=pd.isnull)).dropna(thresh=1)
Out[1171]:
129 30 32 851 9730 9882
0 r170557 ARCA 100.0 NaN NaN NaN
1 r170557 ARCA 300.0 NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.