[英]Creating another column in pandas based on a pre-existing column
I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a 'user/' prefix before each ID in the list.我的数据框中有第三列,我希望能够创建看起来几乎相同的第四列,除了它没有双引号并且列表中的每个 ID 之前都有一个“用户/”前缀。 Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).
此外,有时它只是一个 ID 与 ID 列表(如示例 DF 所示)。
original原来的
col1 col2 col3
01 01 "ID278, ID289"
02 02 "ID275"
desired想要的
col1 col2 col3 col4
01 01 "ID278, ID289" user/ID278, user/ID289
02 02 "ID275" user/ID275
Given:鉴于:
col1 col2 col3
0 1.0 1.0 "ID278, ID289"
1 2.0 2.0 "ID275"
2 2.0 1.0 NaN
Doing:正在做:
df['col4'] = (df.col3.str.strip('"') # Remove " from both ends.
.str.split(', ') # Split into lists on ', '.
.apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension,
if isinstance(x, list) # If it's a list.
else x)
.str.join(', ')) # Join them back together.
print(df)
Output:输出:
col1 col2 col3 col4
0 1.0 1.0 "ID278, ID289" user/ID278, user/ID289
1 2.0 2.0 "ID275" user/ID275
2 2.0 1.0 NaN NaN
df.col4 = df.col3.str.strip('"')
df.col4 = 'user/' + df.col4
should do the trick.应该做的伎俩。
In general, operations for vectorized string manipulations are performed by pd.Series.str...
operations.通常,向量化字符串操作的操作由
pd.Series.str...
操作执行。 Most of their names closely match either a Python string method or re
method.它们的大多数名称都与 Python 字符串方法或
re
方法非常匹配。 Pandas usually supports standard Python operators (+, -, *, etc.) with strings and will interpolate scalars as vectors with the dimensions of the column your are working with. Pandas 通常支持带有字符串的标准 Python 运算符(+、-、* 等),并将标量作为向量与您正在使用的列的维度进行插值。
A slow option is always just to use Series.apply(func)
where this just iterates over values in the series and passes the value to a function, func
.一个缓慢的选择总是只使用
Series.apply(func)
,它只是迭代系列中的值并将值传递给函数func
。
You can use .apply() function:您可以使用 .apply() 功能:
def function(x):
if not x:
return ""
elements = x.split(", ")
out = list()
for i in elements:
out.append(f"user/{i}")
return ", ".join(out)
df["col4"] = df.col3.apply(function)
That returns:返回:
col1 col2 col3 col4
1 1 ID278, ID289 user/ID278, user/ID289
2 2 ID275 user/ID275
3 3
Here's a solution that takes both the double quotes and ID lists into account:这是一个同时考虑双引号和 ID 列表的解决方案:
# remove the double quotes
df['col4'] = df['col3'].str.strip('"')
# split the string, add prefix user/, and then join
df['col4'] = df['col4'].apply(lambda x: ', '.join(f"user/{userId}" for userId in x.split(', ')))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.