简体   繁体   English

如何清理列表,以及 pandas dataframe 中的元素列表?

[英]How do I clean a list, and list of list of elements in a pandas dataframe?

Edited:编辑:

After writing this:写完之后:

m = df.explode('ID1').groupby('ID1')['ID2'].agg(list)

I have the following dataframe:我有以下 dataframe:

Ref         
45263     [['3105-BB', '3106-BB', '3201-BB', '3202-BB'],...
45256     [['3105-BB', '3106-BB', '3201-BB', '3202-BB'],...
48565     [['3159-CC', '3217-CC'], ['3159-CC', '3217-CC']]
49365     [['3159-CC', '3217-CC'], ['3159-CC', '3217-CC']]
47548     [['3107-CC', '3108-CC', '3201-CC', '3202-CC'],...

In col on the right, how do I remove the lists of list brackets, and the duplicates for each row.在右侧的 col 中,如何删除列表括号的列表以及每行的重复项。 Ideally I'd like just a single list for each row?理想情况下,我希望每行只有一个列表?

eg for output:例如 output:

Ref         
45263     ['3105-BB', '3106-BB', '3201-BB', '3202-BB']
45256     ['3105-BB', '3106-BB', '3201-BB', '3202-BB']
48565     ['3159-CC', '3217-CC']
49365     ['3159-CC', '3217-CC']
47548     ['3107-CC', '3108-CC', '3201-CC', '3202-CC']

Afterwards I will use m in the following:之后我将在下面使用m

df['ID4'] = df['Ref'].map(m)

This will return a final dataframe I am looking for.这将返回我正在寻找的最终 dataframe。

Use set comprehension with flatten values of nested lists:对嵌套列表的展平值使用set comprehension推导:

df['ID'] = df['ID'].apply(lambda x: list(set(z for y in x for z in y)))

If order is important use dict with keys trick:如果顺序很重要,请使用带keys技巧的字典:

df['ID'] = df['ID'].apply(lambda x: list(dict.fromkeys([z for y in x for z in y]).keys()))

If next processing is map, you need explode lists:如果下一个处理是 map,你需要分解列表:

df = df.explode('ID').reset_index(drop=True)
print (df)
      Ref       ID
0   45263  3105-BB
1   45263  3106-BB
2   45263  3202-BB
3   45263  3201-BB
4   45256  3105-BB
5   45256  3106-BB
6   45256  3202-BB
7   45256  3201-BB
8   48565  3217-CC
9   48565  3159-CC
10  49365  3217-CC
11  49365  3159-CC
12  47548  3202-CC
13  47548  3108-CC
14  47548  3201-CC
15  47548  3107-CC

Sample :样本

df['ID1'] = df['ID'].apply(lambda x: list(set(z for y in x for z in y)))
df['ID2'] = df['ID'].apply(lambda x: list(dict.fromkeys([z for y in x for z in y]).keys()))
print (df)
     Ref                                        ID  \
0  45263    [[3105-BB, 3106-BB, 3201-BB, 3202-BB]]   
1  45256    [[3105-BB, 3106-BB, 3201-BB, 3202-BB]]   
2  48565  [[3159-CC, 3217-CC], [3159-CC, 3217-CC]]   
3  49365  [[3159-CC, 3217-CC], [3159-CC, 3217-CC]]   
4  47548    [[3107-CC, 3108-CC, 3201-CC, 3202-CC]]   

                                    ID1                                   ID2  
0  [3105-BB, 3106-BB, 3202-BB, 3201-BB]  [3105-BB, 3106-BB, 3201-BB, 3202-BB]  
1  [3105-BB, 3106-BB, 3202-BB, 3201-BB]  [3105-BB, 3106-BB, 3201-BB, 3202-BB]  
2                    [3217-CC, 3159-CC]                    [3159-CC, 3217-CC]  
3                    [3217-CC, 3159-CC]                    [3159-CC, 3217-CC]  
4  [3202-CC, 3108-CC, 3201-CC, 3107-CC]  [3107-CC, 3108-CC, 3201-CC, 3202-CC]  

EDIT:编辑:

f = lambda x: list(set(z for y in x for z in y)
df.explode('ID1').groupby('ID1')['ID2'].agg(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM