[英]merging rows and replacing NaN values with pandas
I am trying to merge rows with each other to get one row containing all the values that are present.我试图将行彼此合并以获得包含所有存在的值的行。 Currently the df look like this: dataframe
目前 df 看起来像这样: dataframe
What i want is something like:我想要的是这样的:
| index | scan .. | snel. | kool .. | note .. |
| ----- | ------- | ----- | ------- | ------- |
| 0 | 7,8 | 4,0 | 20.0 | Fiasp, ..|
I can get that output in the code example below but it just seems really messy.我可以在下面的代码示例中得到 output,但它看起来真的很乱。
I tried to use groupby, agg, sum, max, and all those do is that it removes columns and looks like this: df2.groupby('Tijdstempel apparaat').max().reset_index()
我尝试使用 groupby、agg、sum、max,所有这些都是删除列,看起来像这样:
df2.groupby('Tijdstempel apparaat').max().reset_index()
I tried filling the row with the values of the previous rows, and then drop the rows that dont contain every value.我尝试用前几行的值填充该行,然后删除不包含每个值的行。 But this seems like a long work around and really messy.
但这似乎是一项漫长的工作,而且非常混乱。
df2 = df2.loc[df['Tijdstempel apparaat'] == '20-01-2023 13:24']
df2 = df2.reset_index()
del df2['index']
df2['Snelwerkende insuline (eenheden)'].fillna(method='pad', inplace=True)
df2['Koolhydraten (gram)'].fillna(method='pad', inplace=True)
df2['Notities'].fillna(method='pad', inplace=True)
df2['Scan Glucose mmol/l'].fillna(method='pad', inplace=True)
print(df2)
# df2.loc[df2[0,'Snelwerkende insuline (eenheden)']] = df2.loc[df2[1, 'Snelwerkende insuline (eenheden)']]
df2.drop([0, 1, 2])
When i have to do this for the entire data.csv (whenever a time stamp like "20-01-2023 13:24" is found multiple times) i am worried it wil be really slow and time consuming.当我必须对整个数据执行此操作时。csv(每当多次发现像“20-01-2023 13:24”这样的时间戳时)我担心它会非常缓慢且耗时。
sample data as your data样本数据作为您的数据
df = pd.DataFrame(data={
"times":["date1","date1","date1","date1","date1"],
"type":[1,2,3,4,5],
"key1":[1,None,None,None,None],
"key2":[None,"2",None,None,None],
"key3":[None,None,3,None,None],
"key4":[None,None,None,"val",None],
"key5":[None,None,None,None,5],
})
solution解决方案
melt = df.melt(id_vars="times",
value_vars=df.columns[1:],)
melt = melt.dropna()
pivot = melt.pivot_table(values="value", index="times", columns="variable", aggfunc=lambda x: x)
change type column location更改类型列位置
index = list(pivot.columns).index("type")
pivot = pd.concat([pivot.iloc[:,index:], pivot.iloc[:,:index]], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.