简体   繁体   English

从pandas dataframe列中提取多个单词到同一列中

[英]Extracting multiple words from pandas dataframe column into same column

Suppose a dataframe consists of two columns A={1,2,3} B={'abc d', 'efg h', 'ijk l'}. 假设一个数据帧由两列A = {1,2,3} B = {'abc d','efg h','ijk l'}组成。 For A = 2, I would like to change the corresponding entry in column B to 'ef h'. 对于A = 2,我想将B列中的相应条目更改为'ef h'。 (ie. extract the first, second and last word, not drop the third word, not the same). (即,提取第一个,第二个和最后一个单词,不要删除第三个单词,不一样)。

It is easy to extract single words using the df.loc[df['colA']=2,'colB'].str.split().str[x], where x= 0,1 and -1, but I'm having difficulty joining the three words back into one string efficiently. 使用df.loc [df ['colA'] = 2,'colB']。str.split()。str [x]提取单词很容易,其中x = 0,1和-1,但是我很难将三个单词有效地重新组合成一个字符串。 The most efficient way I can think of is provided below. 下面提供了我能想到的最有效的方法。 Is there a better way of achieving what I'm trying to do? 是否有更好的方法来实现我要完成的任务? Thanks. 谢谢。

y = lambda x : df.loc[df['colA']==2,'colB'].str.split().str[x]
df.loc[df['colA']=2,'colB'] = y(0) + ' ' + y(1) + ' ' + y(-1)

Expected and actual result: 预期和实际结果:

A     B
1  a b c d
2  e f h
3  i j k l

How about this: 这个怎么样:

df = pd.DataFrame(data = {'A': [1,2,3], 
                          'B': ['a b c d', 'e f g h', 'i j k l']})

y = lambda x : df.loc[df['A']==2,'B'].str[0:2*x+2] + df.loc[df['A']==2,'B'].str[-1]
df.loc[df1['A']==2,'B'] = y(1)

Then df is the wanted: 然后df是通缉犯:

   A        B
0  1  a b c d
1  2    e f h
2  3  i j k l

You were pretty close to the solution, the only problem is that str[x] returns a value wrapped in a Series object. 您非常接近解决方案,唯一的问题是str[x]返回包装在Series对象中的值。 You could fix this by extracting the value from the Series as shown: 您可以通过从Series提取值来解决此问题,如下所示:

y = lambda x : df.loc[df['colA']==2,'colB'].str.split().str[x].values[0]
df.loc[df['colA']==2,'colB'] = y(0) + ' ' + y(1) + ' ' + y(-1)

You can also achieve the same by making use of the apply function 您还可以通过使用apply函数来实现相同目的

df.loc[df['colA']==2, 'colB'] = df.loc[df['colA']==2,'colB'].apply(lambda x: ' '.join(x.split()[0:2] + [x.split()[-1]]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM