简体   繁体   English

合并行并用 pandas 替换 NaN 值

[英]merging rows and replacing NaN values with pandas

I am trying to merge rows with each other to get one row containing all the values that are present.我试图将行彼此合并以获得包含所有存在的值的行。 Currently the df look like this: dataframe目前 df 看起来像这样: dataframe

What i want is something like:我想要的是这样的:


| index | scan .. | snel. | kool .. | note ..  |
| ----- | ------- | ----- | ------- | -------  |
| 0     | 7,8     | 4,0   | 20.0    | Fiasp, ..|


I can get that output in the code example below but it just seems really messy.我可以在下面的代码示例中得到 output,但它看起来真的很乱。

I tried to use groupby, agg, sum, max, and all those do is that it removes columns and looks like this: df2.groupby('Tijdstempel apparaat').max().reset_index()我尝试使用 groupby、agg、sum、max,所有这些都是删除列,看起来像这样: df2.groupby('Tijdstempel apparaat').max().reset_index()

I tried filling the row with the values of the previous rows, and then drop the rows that dont contain every value.我尝试用前几行的值填充该行,然后删除不包含每个值的行。 But this seems like a long work around and really messy.但这似乎是一项漫长的工作,而且非常混乱。

df2 = df2.loc[df['Tijdstempel apparaat'] == '20-01-2023 13:24']
df2 = df2.reset_index()
del df2['index']
df2['Snelwerkende insuline (eenheden)'].fillna(method='pad', inplace=True)
df2['Koolhydraten (gram)'].fillna(method='pad', inplace=True)
df2['Notities'].fillna(method='pad', inplace=True)
df2['Scan Glucose mmol/l'].fillna(method='pad', inplace=True)
print(df2)
# df2.loc[df2[0,'Snelwerkende insuline (eenheden)']] = df2.loc[df2[1, 'Snelwerkende insuline (eenheden)']]
df2.drop([0, 1, 2])

Output: Output:

When i have to do this for the entire data.csv (whenever a time stamp like "20-01-2023 13:24" is found multiple times) i am worried it wil be really slow and time consuming.当我必须对整个数据执行此操作时。csv(每当多次发现像“20-01-2023 13:24”这样的时间戳时)我担心它会非常缓慢且耗时。

sample data as your data样本数据作为您的数据

df = pd.DataFrame(data={
    "times":["date1","date1","date1","date1","date1"],
    "type":[1,2,3,4,5],
    "key1":[1,None,None,None,None],
    "key2":[None,"2",None,None,None],
    "key3":[None,None,3,None,None],
    "key4":[None,None,None,"val",None],
    "key5":[None,None,None,None,5],
})

solution解决方案

melt = df.melt(id_vars="times",
        value_vars=df.columns[1:],)

melt = melt.dropna()

pivot = melt.pivot_table(values="value", index="times", columns="variable", aggfunc=lambda x: x)

change type column location更改类型列位置

index = list(pivot.columns).index("type")
pivot = pd.concat([pivot.iloc[:,index:], pivot.iloc[:,:index]], axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM