Creating another column in pandas based on a pre-existing column

Question

I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a 'user/' prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).

original

col1   col2     col3 
01      01     "ID278, ID289"

02      02     "ID275"

desired

col1   col2     col3                col4
01      01     "ID278, ID289"     user/ID278, user/ID289

02      02     "ID275"            user/ID275

Answer 1

Given:

   col1  col2            col3
0   1.0   1.0  "ID278, ID289"
1   2.0   2.0         "ID275"
2   2.0   1.0             NaN

Doing:

df['col4'] = (df.col3.str.strip('"')  # Remove " from both ends.
                     .str.split(', ') # Split into lists on ', '.
                     .apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension,
                                       if isinstance(x, list)  # If it's a list.
                                       else x)
                     .str.join(', ')) # Join them back together.
print(df)

Output:

   col1  col2            col3                    col4
0   1.0   1.0  "ID278, ID289"  user/ID278, user/ID289
1   2.0   2.0         "ID275"              user/ID275
2   2.0   1.0             NaN                     NaN

Answer 2

df.col4 = df.col3.str.strip('"')
df.col4 = 'user/' + df.col4

should do the trick.

In general, operations for vectorized string manipulations are performed by pd.Series.str... operations. Most of their names closely match either a Python string method or re method. Pandas usually supports standard Python operators (+, -, *, etc.) with strings and will interpolate scalars as vectors with the dimensions of the column your are working with.

A slow option is always just to use Series.apply(func) where this just iterates over values in the series and passes the value to a function, func .

Answer 3

You can use .apply() function:

def function(x):
    if not x:
        return ""
    
    elements = x.split(", ")
    out = list()
    
    for i in elements:
        out.append(f"user/{i}")
        
    return ", ".join(out)

df["col4"] = df.col3.apply(function)

That returns:

col1  col2  col3          col4
1     1     ID278, ID289  user/ID278, user/ID289
2     2     ID275         user/ID275
3     3

Answer 4

Here's a solution that takes both the double quotes and ID lists into account:

# remove the double quotes
df['col4'] = df['col3'].str.strip('"')
# split the string, add prefix user/, and then join
df['col4'] = df['col4'].apply(lambda x: ', '.join(f"user/{userId}" for userId in x.split(', ')))

Creating another column in pandas based on a pre-existing column

Question

3 answers

solution1
1 ACCPTED 2022-07-19 04:52:22

solution2
0 2022-07-19 02:28:49

solution3
0 2022-07-19 02:42:38

solution4
0 2022-07-19 02:59:55

Creating another column in pandas based on a pre-existing column

Question

3 answers

solution1 1 ACCPTED 2022-07-19 04:52:22

solution2 0 2022-07-19 02:28:49

solution3 0 2022-07-19 02:42:38

solution4 0 2022-07-19 02:59:55

solution1
1 ACCPTED 2022-07-19 04:52:22

solution2
0 2022-07-19 02:28:49

solution3
0 2022-07-19 02:42:38

solution4
0 2022-07-19 02:59:55