简体   繁体   English

根据预先存在的列在 pandas 中创建另一列

[英]Creating another column in pandas based on a pre-existing column

I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a 'user/' prefix before each ID in the list.我的数据框中有第三列,我希望能够创建看起来几乎相同的第四列,除了它没有双引号并且列表中的每个 ID 之前都有一个“用户/”前缀。 Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).此外,有时它只是一个 ID 与 ID 列表(如示例 DF 所示)。

original原来的

col1   col2     col3 
01      01     "ID278, ID289"

02      02     "ID275"

desired想要的

col1   col2     col3                col4
01      01     "ID278, ID289"     user/ID278, user/ID289

02      02     "ID275"            user/ID275

Given:鉴于:

   col1  col2            col3
0   1.0   1.0  "ID278, ID289"
1   2.0   2.0         "ID275"
2   2.0   1.0             NaN

Doing:正在做:

df['col4'] = (df.col3.str.strip('"')  # Remove " from both ends.
                     .str.split(', ') # Split into lists on ', '.
                     .apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension,
                                       if isinstance(x, list)  # If it's a list.
                                       else x)
                     .str.join(', ')) # Join them back together.
print(df)

Output:输出:

   col1  col2            col3                    col4
0   1.0   1.0  "ID278, ID289"  user/ID278, user/ID289
1   2.0   2.0         "ID275"              user/ID275
2   2.0   1.0             NaN                     NaN
df.col4 = df.col3.str.strip('"')
df.col4 = 'user/' + df.col4

should do the trick.应该做的伎俩。

In general, operations for vectorized string manipulations are performed by pd.Series.str... operations.通常,向量化字符串操作的操作由pd.Series.str...操作执行。 Most of their names closely match either a Python string method or re method.它们的大多数名称都与 Python 字符串方法或re方法非常匹配。 Pandas usually supports standard Python operators (+, -, *, etc.) with strings and will interpolate scalars as vectors with the dimensions of the column your are working with. Pandas 通常支持带有字符串的标准 Python 运算符(+、-、* 等),并将标量作为向量与您正在使用的列的维度进行插值。

A slow option is always just to use Series.apply(func) where this just iterates over values in the series and passes the value to a function, func .一个缓慢的选择总是只使用Series.apply(func) ,它只是迭代系列中的值并将值传递给函数func

You can use .apply() function:您可以使用 .apply() 功能:

def function(x):
    if not x:
        return ""
    
    elements = x.split(", ")
    out = list()
    
    for i in elements:
        out.append(f"user/{i}")
        
    return ", ".join(out)

df["col4"] = df.col3.apply(function)

That returns:返回:

col1  col2  col3          col4
1     1     ID278, ID289  user/ID278, user/ID289
2     2     ID275         user/ID275
3     3 

Here's a solution that takes both the double quotes and ID lists into account:这是一个同时考虑双引号和 ID 列表的解决方案:

# remove the double quotes
df['col4'] = df['col3'].str.strip('"')
# split the string, add prefix user/, and then join
df['col4'] = df['col4'].apply(lambda x: ', '.join(f"user/{userId}" for userId in x.split(', ')))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如果预先存在的列包含某个值,则使用“是”创建一个新列,如果该列的值为“”,则创建一个“否” - Pandas: creating a new column with “Yes” in case a pre-existing column contains some value and “No” if the value of the column is ' ' 基于 pandas 中的现有列创建新列 - Creating new column based on existing column in pandas Pandas:使用从预先存在的列计算的值在数据框中创建两个新列 - Pandas: create two new columns in a dataframe with values calculated from a pre-existing column 基于另一列创建 pandas 聚合列 - Creating pandas aggregate column based on another column 使用Pandas df的字典基于现有列创建新列 - creating new column based on existing column using a dictionary for a Pandas df Pandas:根据现有列的值创建新列 - Pandas: Creating new column based on values from existing column 根据对其他几个预先存在的列的评估,在 csv 文件中创建和 append 单列 1、0 和 -1 - Create and append single column of 1’s, 0’s, and -1’s in csv file based on assessment of several other pre-existing columns Pandas 根据另一列更改现有列值 - Pandas change existing column values based on another column 使用基于现有行值的条件在Pandas中创建新列并返回另一行的值 - Creating new column in Pandas with a condition based on existing row values and returning another row's values 根据预先存在的行连接数据帧 - concatenating dataframes based on pre-existing rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM