简体   繁体   中英

How to split column element and replace by other column value in python dataframe?

I want to replace col1 element with col2. For example if col1 contains abc i want to replace it with a{colb}c.

import pandas as pd
d = {'col1': ['a b', 'a c'], 'col2': ['z 26', 'y 25']}
df = pd.DataFrame(data=d)
print(df)
    col1     col2
    a b      z 26
    a c      y 25

output required if df['col1']=='a b'

    col1    col2     col3
0   a b     z 26     a z
1   a c     y 25     a c

I tried

df['col3'] = np.where(df[df['col1']=='a b'],(df['col1'].replace(str(df['col1'].str.split(' ')[1])),(str(df['col2'].str.split(' ')[0]))), 0)

error: ---error: operands could not be broadcast together with shapes (1,3) (2,) () 

&

for x in df['col1']:
  x.replace(df['col1'].str.split(' ')[1],df['col2'].str.split(' ')[1])
#error --replace() argument 1 must be str, not list

suggest easy solution...

I'm not entirely certain I've understood what you're trying to do here, but this does what you ask for:

for i, row in df2.iterrows():
    if row["col1"] == "a b":
         row["col1"] = "a " + row["col2"].split(" ")[0]

To iterate a dataframe row-wise, you use iterrows , which returns a tuple of (index, row) .

EDIT Note that modifying in place with this is undefined. If you don't want to use row directly you can modify the original df, if you need to:

df2["col1"][i] = row["col1"]

(After modifying the row .)

This is quite un-pandasy, and there is doubtless a way of doing this in one step with pandas, but this pattern will work with anything. Whether it is slower than a 'vectorised' solution depends on how exactly pandas implements iterrows and loc .

Note that the condition -- 'ab' ---is hard coded here, which seems to be what you want.

import pandas as pd
d = {'col1': ['a b', 'a c'], 'col2': ['z 26', 'y 25']}
df = pd.DataFrame(data=d)

Solution 1:

df.loc[(df['col1'] == 'a b'), 'col3'] = df['col1'].str[0] + ' ' + df['col2'].str[0]
df['col3'].fillna(df['col1'], inplace=True)

Solution 2:

condition = (df['col1'] == 'a b')
df['col3'] = np.where((df['col1'] == 'a b'), df['col1'].str[0] + ' ' + df['col2'].str[0], df['col1'])

While this is quite messy, I managed to do it in one line. I am honestly not sure if it does exactly what you want, but I can give you a hand modifying it if necessary.

df['col3'] = [f'{i[1].col1.split()[0]} {i[1].col2.split()[0]}' if i[1].col1 == 'a b' else i[1].col1 for i in df.iterrows()]

I found a way like this: split both column and join them. However I was looking for replacement and Insert

df['col3'] = df['col1'].apply(lambda x:x.split(' ') [0]) +' '+ df['col2'].apply(lambda x:x.split(' ') [0])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM