![](/img/trans.png)
[英]How to convert dataframe column which contains list of dictionary into separate columns?
[英]Cross dataframe columns "contains List" with dictionary
我正在嘗試創建 dataframe。 來自 dataframe 與交叉字典的組合,如下所示
dataframe 包含多列“N個列x,y,z,a,b,c......等超過100列”
df = pd.DataFrame({'ID':['EF407412','KM043272']
, 'x': ['[2788, 3140, 4836]','[539, 906, 1494, 1932, 2029,7001]']
, 'y': ['[1408, 1572, 2277]','[]']
# dataframe contains multiple columns "N numbers of columns x,y,z,a,b,c ......etc more than 100 columns "
})
字典名稱是比例,它的項目(鍵值)是可定制的,從輸入 dataframe 到 output dataframe 的轉換規則在下面的評論中提到
scale = ("500-10000", {
# Key= Scales and value = Weights, both Customizable
500: 7000, # key is 500 and value is compared with List as List items >= 7000
2500: 3000, # key is 2500 and value is compared with List as List 7000 > items >= 3000
5000: 1000, # key is 5000 and value is compared with List as List 3000 > items >= 1000
7500: 400, # key is 7500 and value is compared with List as List 1000 > items >= 400
10000:250 # key is 10000 and value is compared with List as List 400 > items >= 250
# any others List items < 250 will be neglected
# any others List items < 250 will be neglected })
重要 ps >>> 如果輸入的列表項包含冗余數據,則在 output 中將其視為單獨的值。 例如 x 列包含列表 [4836, 4836, 4836] output 在 x_2500 的列內將是 [4836, 4836, 4836]
使用您的df
和scale
對象...
def make_new_columns(series: pd.Series) -> pd.DataFrame:
"""Given column, make new columns using `scale`."""
# convert str representation of list to literal list
series = series.apply(ast.literal_eval)
scale_dict = scale[1]
frames = []
for k, v in scale_dict.items():
k_frame = pd.DataFrame({f"{series.name}_{k}": series.apply(lambda x: [i for i in x if i >= v])})
frames.append(k_frame)
frame = pd.concat(frames, axis="columns")
cols = frame.columns[frame.columns.str.startswith(f"{series.name}_")]
for col0, col1 in zip(cols, cols[1:]):
frame[f"{col1}_"] = frame[[col0, col1]].applymap(set).apply(lambda x: x[col1].difference(x[col0]), axis=1)
# the first `x_...` col is `x_500` and will not change -- remove others
frame = frame.drop(columns=cols[1:])
frame.columns = frame.columns.str.strip("_")
frame[cols] = frame[cols].applymap(lambda x: [0] if not len(x) else list(x))
return frame
# apply `make_new_columns` to x, y, z, a, b, c, ...
cols_to_apply = df.loc[:, "x":].columns
to_join = []
for col in cols_to_apply:
new = make_new_columns(df[col])
to_join.append(new)
df = df[["ID"]].join(to_join)
df
嘗試:
from ast import literal_eval
df["x"] = df["x"].apply(literal_eval)
df["y"] = df["y"].apply(literal_eval)
x = df.set_index("ID").stack().to_frame().explode(0).dropna()
x["name"] = pd.cut(
x[0],
list(scale[1].values())[::-1] + [float("inf")],
right=False,
labels=list(scale[1])[::-1],
)
x["tmp"] = x.index.get_level_values(1)
x = x.pivot_table(
index=pd.Grouper(level=0),
columns=["tmp", "name"],
values=0,
aggfunc=list,
)
idx = pd.MultiIndex.from_product(
[set(x.columns.get_level_values(0)), scale[1].keys()]
)
x = x.reindex(idx, axis=1)
x.columns = [f"{a}_{b}" for a, b in x.columns]
x = x.apply(lambda s: s.fillna({i: [0] for i in x.index}))
print(
x[
sorted(x.columns, key=lambda x: (x.split("_")[0], int(x.split("_")[1])))
].reset_index()
)
印刷:
ID x_500 x_2500 x_5000 x_7500 x_10000 y_500 y_2500 y_5000 y_7500 y_10000
0 EF407412 [0] [3140, 4836] [2788] [0] [0] [0] [0] [1408, 1572, 2277] [0] [0]
1 KM043272 [7001] [0] [1494, 1932, 2029] [539, 906] [0] [0] [0] [0] [0] [0]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.