[英]How to combine multiple rows in a pandas dataframe which have only 1 non-null entry per column into one row?
I am using json_normalize to parse json entries of a pandas column.我正在使用 json_normalize 来解析 Pandas 列的 json 条目。 But, as an output I am getting a dataframe with multiple rows with each row having only one non-null entry.
但是,作为输出,我得到一个包含多行的数据框,每行只有一个非空条目。 I want to combine all these rows to one row in pandas.
我想将所有这些行合并为熊猫中的一行。
currency custom.gt custom.eq price.gt price.lt
0 NaN 4.0 NaN NaN NaN
1 NaN NaN NaN 999.0 NaN
2 NaN NaN NaN NaN 199000.0
3 NaN NaN other NaN NaN
4 USD NaN NaN NaN NaN
You can use ffill (forward fill) and bfill (backfill), which are methods for filling NA values in pandas.您可以使用ffill (前向填充)和bfill (回填),它们是在Pandas中填充 NA 值的方法。
# fill NA values
# option 1:
df = df.ffill().bfill()
# option 2:
df = df.fillna(method='ffill').fillna(method='bfill')
print(df)
currency custom.gt custom.eq price.gt price.lt
0 USD 4.0 other 999.0 199000.0
1 USD 4.0 other 999.0 199000.0
2 USD 4.0 other 999.0 199000.0
3 USD 4.0 other 999.0 199000.0
4 USD 4.0 other 999.0 199000.0
You can then drop the duplicated rows using drop_duplicates and keep the first one :然后,您可以使用drop_duplicates删除重复的行并保留第一个:
df = df.drop_duplicates(keep='first')
print(df)
currency custom.gt custom.eq price.gt price.lt
0 USD 4.0 other 999.0 199000.0
Depending on how many times you have to repeat the task, I might also take a look at how the JSON file is structured to see if using a dictionary comprehension could help clean things up so that json_normalize
can parse it more easily the first time.根据您必须重复执行任务的次数,我可能还会查看 JSON 文件的结构,看看使用字典理解是否有助于清理问题,以便
json_normalize
可以在第一次更轻松地解析它。
you could do你可以
import pandas as pd
from functools import reduce
df = pd.DataFrame.from_dict({"a":["1", None, None],"b" : [None, None, 1], "c":[None, 3, None]})
def red_func(x,y) :
if pd.isna(x) or pd.isnull(x) :
return y
result = [*map( lambda x : reduce(f,x), [list(row) for i, row in df.iterrows()]),]
Outputs :输出:
In [135]: df
Out[135]:
a b c
0 1 NaN NaN
1 None NaN 3.0
2 None 1.0 NaN
In [136]: [*map( lambda x : reduce(f,x), [list(row) for i, row in df.iterrows()]),]
Out[136]: ['1', 3.0, 1.0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.