简体   繁体   English

如何从pandas中的列创建唯一ID列表,其中ID列表在Python中被提及为字符串

[英]How to create a list of unique ID from a column in pandas where lists of ID are mentioned as strings in Python

I have a pandas dataframe df 我有一个pandas数据帧df

import pandas as pd

lst = [23682, 21963, 9711, 21175, 13022,1662,7399, 13679, 17654,4567,23608,2828, 1234]

lst_match = ['[21963]','[21175]', '[1662 7399 13679 ]','[17654 23608]','[2828]','0','0','0','0','0','0', '0','0' ]

df = pd.DataFrame(list(zip(lst, lst_match)),columns=['ID','ID_match'])

df DF

       ID            ID_match
0   23682             [21963]
1   21963             [21175]
2    9711   [1662 7399 13679]
3   21175       [17654 23608]
4   13022              [2828]
5    1662                   0
6    7399                   0
7   13679                   0
8   17654                   0
9    4567                   0
10  23608                   0
11   2828                   0
12   1234                   0

The values in ID_match column are also IDs though in a list in string format. ID_match列中的值也是ID,但是在字符串格式的列表中。

I want to create a dataframe of unique IDs in such a manner that my unique ID frame should contain all the ID which have some value other than 0 in ID_match column and those IDs' which are mentioned in the ID_match column. 我想创建一个唯一ID的数据帧,使得我的唯一ID帧应该包含ID_match列中具有除0以外值的所有ID以及ID_match列中提到的那些ID。

so my output dataframe of unique ID's must look like: 所以我的唯一ID输出数据框必须如下所示:

       ID           
0   23682            
1   21963             
2    9711  
3   21175       
4   13022              
5    1662                   
6    7399                  
7   13679                   
8   17654                   
9   23608                    
10   2828                  

How can I do this with python pandas? 我怎么能用python pandas做到这一点?

Use: 采用:

s = (df[df['ID_match'] != '0']
       .set_index('ID')['ID_match']
       .str.strip('[ ]')
       .str.split('\s+', expand=True)
       .stack())
print (s)
23682  0    21963
21963  0    21175
9711   0     1662
       1     7399
       2    13679
21175  0    17654
       1    23608
13022  0     2828
dtype: object


vals = s.index.get_level_values(0).to_series().append(s.astype(int)).unique()
df = pd.DataFrame({'ID':vals})
print (df)
       ID
0   23682
1   21963
2    9711
3   21175
4   13022
5    1662
6    7399
7   13679
8   17654
9   23608
10   2828

Explanation : 说明

  1. First filter out all non 0 value by boolean indexing 首先通过boolean indexing过滤掉所有非0
  2. Create index by ID column by set_index ID列按set_index创建索引
  3. Remove trailing [ ] with strip 删除带strip尾随[ ]
  4. split value and reshape by stack split值并按stack重构

  5. Then get first level of MultiIndex by get_level_values and convert to_series 然后通过get_level_values获取MultiIndex的第一级并转换为to_series

  6. append Series s converted to integer s append系列s转换为integer s
  7. Get unique values and last call DataFrame contructor 获取unique值并最后调用DataFrame构造DataFrame

These look like string representations of lists. 这些看起来像列表的字符串表示。 So you can use ast.literal_eval and itertools.chain : 所以你可以使用ast.literal_evalitertools.chain

from ast import literal_eval
from itertools import chain

s = df['ID_match'].astype(str).str.replace(' ', ',').apply(literal_eval)
L = list(chain.from_iterable(s[s != 0]))

res = pd.DataFrame({'ID': df.loc[df['ID_match'] != 0, 'ID'].tolist() + L})\
        .drop_duplicates().reset_index(drop=True)

print(res)

       ID
0   23682
1   21963
2    9711
3   21175
4   13022
5    1662
6    7399
7   13679
8   17654
9   23608
10   2828

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PySpark:如何从给定的 RDD (id, [strings]) 创建唯一字符串列表的 RDD - PySpark: how to create a RDD of lists of unique strings from a given RDD (id, [strings]) 创建一个列表列表,其中另一列提到了列表的长度 - create a list of lists where the length of the list is mentioned in another column Python pandas:如果第 2 列不包含“字符串”,则从第 1 列获取唯一 ID - Python pandas: get unique id from column 1 if "string" not contains on column 2 将唯一ID分配给python中的列表列表,其中重复项获得相同的id - Assign unique id to list of lists in python where duplicates get the same id 从列表的python列表和带有列表的列创建新的pandas数据框 - Create a new pandas dataframe from a python list of lists with a column with lists Python:如何获取唯一 ID 并从第 1 列(ID)和第 3 列(描述)中删除重复项,然后在 Pandas 中获取第 2 列(值)的中值 - Python: how to get unique ID and remove duplicates from column 1 (ID), and column 3 (Description), Then get the median for column 2 (Value) in Pandas 如何通过pandas中的user_id按列从列中获取唯一值 - how to get unique values from list column by group by user_id in pandas 来自字符串的Python最短唯一ID - Python shortest unique id from strings 如何 map 在 pandas 的 id 列中的唯一项目 - How to map a unique item in id column in pandas 如何从值在列表中的 pandas 列中提取唯一值 - How to extract unique values from pandas column where values are in list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM