如何合并字符串熊猫df

Question

我正在尝试在pandas df merge特定的strings 。 下面的df只是一个例子。 我的df的值将有所不同，但将适用基本规则。 我基本上想merge所有rows直到有4个字母的string为止。

尽管此df中的4个字母字符串始终为Excl ，但我的df中将包含许多4个字母strings 。

import pandas as pd

d = ({
    'A' : ['Include','Inclu','Incl','Inc'],
    'B' : ['Excl','de','ude','l'],           
    'C' : ['X','Excl','Excl','ude'],
    'D' : ['','Y','ABC','Excl'],
    })

df = pd.DataFrame(data=d)

日期：

         A     B     C     D
0  Include  Excl     X      
1    Inclu    de  Excl     Y
2     Incl   ude  Excl   ABC
3      Inc     l   ude  Excl

预期输出：

         A     B     C     D
0  Include  Excl     X      
1  Include        Excl     Y 
2  Include        Excl   ABC
3  Include              Excl

因此， row 0保持不变，而col B有4个字母。 Row 1行将Col A,B合并为Col C 4个字母。 Row 2与上面相同。 Row 3合并Col A,B,C因为Col D有4个字母。

我尝试通过合并所有columns ，然后返回并删除不需要的值来手动执行此操作。

df["Com"] = df["A"].map(str) + df["B"]  + df["C"]

但是我必须手动遍历每一行并删除不同长度的字母。

上面的df只是一个例子。 中心相似之处是我需要合并4个字母字符串之前的所有内容。

Answer 1

尝试这个，

很抱歉使用笨拙的解决方案，我会尝试改善性能，

temp=df.eq('Excl').shift(-1,axis=1)
df['end']= temp.apply(lambda x:x.argmax(),axis=1)
res=df.apply(lambda x:x.loc[:x['end']].sum(),axis=1)
mask=temp.replace(False,np.NaN).fillna(method='ffill').fillna(False).astype(bool)
del df['end']
df[:]=np.where(mask,'',df)
df['A']=res
print df

输出：

         A     B     C     D
0  Include  Excl     X      
1  Include        Excl     Y
2  Include        Excl   ABC
3  Include              Excl

改进的解决方案：

res= df.apply(lambda x:x.loc[:x.eq('Excl').shift(-1).argmax()].sum(),axis=1)
mask=df.eq('Excl').shift(-1,axis=1).replace(False,np.NaN).fillna(method='ffill').fillna(False).astype(bool)
df[:]=np.where(mask,'',df)
df['A']=res

更简化的解决方案：

t=df.eq('Excl').shift(-1,axis=1)
res= df.apply(lambda x:x.loc[:x.eq('Excl').shift(-1).argmax()].sum(),axis=1)
df[:]=np.where(t.fillna(0).astype(int).cumsum() >= 1,'',df)
df['A']=res

Answer 2

你可以做类似的事情

mask = (df.iloc[:, 1:].applymap(len) == 4).cumsum(1) == 0
df.A = df.A + df.iloc[:, 1:][mask].apply(lambda x: x.str.cat(), 1)
df.iloc[:, 1:] = df.iloc[:, 1:][~mask].fillna('')

Answer 3

我给您一个粗略的方法，在这里，我们正在查找“ Excl”的位置，并将列值合并起来以获得所需的输出。

ls=[]
for i in range(len(df)):
    end=(df.loc[i,:].index[(df.loc[i,:]=='Excl')][0])
    ls.append(''.join(df.loc[i,:end].replace({'Excl':''}).values))
df['A']=ls

如何合并字符串熊猫df

问题描述

3 个解决方案

解决方案1
1 2018-07-25 06:20:33

解决方案2
1 已采纳 2018-07-25 07:15:32

解决方案3
0 2018-07-25 06:41:11

如何合并字符串熊猫df

问题描述

3 个解决方案

解决方案1 1 2018-07-25 06:20:33

解决方案2 1 已采纳 2018-07-25 07:15:32

解决方案3 0 2018-07-25 06:41:11

解决方案1
1 2018-07-25 06:20:33

解决方案2
1 已采纳 2018-07-25 07:15:32

解决方案3
0 2018-07-25 06:41:11