简体   繁体   English

在 pandas df 基于另一列的列表中创建一个新列

[英]creating a new column with lists in pandas df based another column

I have a pandas dataframe where some of the rows have multiple entries.我有一个熊猫数据框,其中一些行有多个条目。 I would like to match up a list I have to the third column.我想将我必须的列表与第三列相匹配。 I have tried different things, but it isn't working for some reason.我尝试了不同的方法,但由于某种原因它不起作用。

Current df当前df

username_list= ["charles23", "ems12", "", "sam34", "jon134", "", "jy19"]


ID1     ID2   passcode                    
01      01    Charlie233, Emily13         
01      02    
01      03    Sam310, John12               
01      04    
01      05    Jake42                      

Desired df所需的df

ID1     ID2   passcode                     username
01      01    Charlie233, Emily13          charles23, ems12
01      02                                
01      03    Sam310, John12               sam34, jon134
01      04                                
01      05    Jake42                       jy19

What I tried我试过的

df = df.assign(passcode = df["passcode"].str.split(",")).explode(column="passcode").assign(username=username_list).groupby(["ID1", "ID2"])["passcode", "username"].agg(list)

df.assign(
    passcode=df["passcode"].apply(lambda x: ", ".join(x) if x else ""),
    username=df["username"].apply(lambda x: ", ".join(x))
).reset_index()

ValueError: Length of values (1000) does not match length of index (1008) ValueError:值的长度(1000)与索引的长度(1008)不匹配

I don't know why this error keeps happening given that I checked len(username_list) == len(df["passcode"])鉴于我检查了 len(username_list) == len(df["passcode"])

You can do:你可以做:

df['pl'] = df['passcode'].str.split(',').str.len()
df['pi'] = df['pl'].cumsum() - df['pl']
df['username'] = df.apply(lambda x:username_list[x['pi']:x['pi'] + x['pl']], 
                      axis=1).str.join(',')
df.drop(['pi', 'pl'], axis=1, inplace=True)

output (print(df)):输出(打印(df)):

   ID1  ID2             passcode         username
0    1    1  Charlie233, Emily13  charles23,ems12
1    1    2                                      
2    1    3       Sam310, John12     sam34,jon134
3    1    4                                      
4    1    5               Jake42             jy19

Explanation :说明

Lets look at the username_list first, what we can see here is it is aligned with the values in passcode but we need to know where to start and where to stop for each list of words, for that the trick below works:让我们先看一下username_list ,我们在这里可以看到它与passcode中的值对齐,但我们需要知道每个单词列表的开始位置和停止位置,因为下面的技巧有效:

df['pl'] = df['passcode'].str.split(',').str.len()
df['pi'] = df['pl'].cumsum()-df['pl']

where pl is the passcode length and pi indicates where the next passcode starts at.其中pl是密码长度, pi表示下一个密码的开始位置。

Then use these to go through your username_list to slice it and then join the list with ',' using pandas str.join method然后使用这些通过您的username_list对其进行切片,然后使用 pandas str.join 方法使用“,”加入列表

df['username'] = df.apply(
        lambda x:username_list[x['pi']:x['pi'] + x['pl']], axis=1).str.join(',')

Then drop the columns pi , pl然后删除列pi , pl

df.drop(['pi', 'pl'], axis=1, inplace=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM