[英]creating a new column with lists in pandas df based another column
I have a pandas dataframe where some of the rows have multiple entries.我有一个熊猫数据框,其中一些行有多个条目。 I would like to match up a list I have to the third column.
我想将我必须的列表与第三列相匹配。 I have tried different things, but it isn't working for some reason.
我尝试了不同的方法,但由于某种原因它不起作用。
Current df当前df
username_list= ["charles23", "ems12", "", "sam34", "jon134", "", "jy19"]
ID1 ID2 passcode
01 01 Charlie233, Emily13
01 02
01 03 Sam310, John12
01 04
01 05 Jake42
Desired df所需的df
ID1 ID2 passcode username
01 01 Charlie233, Emily13 charles23, ems12
01 02
01 03 Sam310, John12 sam34, jon134
01 04
01 05 Jake42 jy19
What I tried我试过的
df = df.assign(passcode = df["passcode"].str.split(",")).explode(column="passcode").assign(username=username_list).groupby(["ID1", "ID2"])["passcode", "username"].agg(list)
df.assign(
passcode=df["passcode"].apply(lambda x: ", ".join(x) if x else ""),
username=df["username"].apply(lambda x: ", ".join(x))
).reset_index()
ValueError: Length of values (1000) does not match length of index (1008) ValueError:值的长度(1000)与索引的长度(1008)不匹配
I don't know why this error keeps happening given that I checked len(username_list) == len(df["passcode"])鉴于我检查了 len(username_list) == len(df["passcode"])
You can do:你可以做:
df['pl'] = df['passcode'].str.split(',').str.len()
df['pi'] = df['pl'].cumsum() - df['pl']
df['username'] = df.apply(lambda x:username_list[x['pi']:x['pi'] + x['pl']],
axis=1).str.join(',')
df.drop(['pi', 'pl'], axis=1, inplace=True)
output (print(df)):输出(打印(df)):
ID1 ID2 passcode username
0 1 1 Charlie233, Emily13 charles23,ems12
1 1 2
2 1 3 Sam310, John12 sam34,jon134
3 1 4
4 1 5 Jake42 jy19
Explanation :说明:
Lets look at the username_list
first, what we can see here is it is aligned with the values in passcode
but we need to know where to start and where to stop for each list of words, for that the trick below works:让我们先看一下
username_list
,我们在这里可以看到它与passcode
中的值对齐,但我们需要知道每个单词列表的开始位置和停止位置,因为下面的技巧有效:
df['pl'] = df['passcode'].str.split(',').str.len()
df['pi'] = df['pl'].cumsum()-df['pl']
where pl
is the passcode length and pi
indicates where the next passcode starts at.其中
pl
是密码长度, pi
表示下一个密码的开始位置。
Then use these to go through your username_list
to slice it and then join the list with ',' using pandas str.join method然后使用这些通过您的
username_list
对其进行切片,然后使用 pandas str.join 方法使用“,”加入列表
df['username'] = df.apply(
lambda x:username_list[x['pi']:x['pi'] + x['pl']], axis=1).str.join(',')
Then drop the columns pi
, pl
然后删除列
pi
, pl
df.drop(['pi', 'pl'], axis=1, inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.