[英]How to do column string concatenation including space separator in Pandas dataframe?
I am a Pandas DataFrame as follows: 我是一个Pandas DataFrame如下:
df = pd.DataFrame({
'id': [1,2 ,3],
'txt1': ['Hello there1', 'Hello there2', 'Hello there3'],
'txt2': ['Hello there4', 'Hello there5', 'Hello there6'],
'txt3': ['Hello there7', 'Hello there8', 'Hello there9']
})
df
id txt1 txt2 txt3
1 Hello there1 Hello there4 Hello there7
2 Hello there2 Hello there5 Hello there8
3 Hello there3 Hello there6 Hello there9
I want to concatenate column txt1
, txt2
, and txt3
. 我想连接列
txt1
, txt2
和txt3
。 So far I am able to achieve it as follows: 到目前为止,我能够实现如下:
df['alltext'] = df['txt1'] + df['txt2'] + df['txt3']
df
id txt1 txt2 txt3 alltext
1 Hello there1 Hello there4 Hello there7 Hello there1Hello there4Hello there7
2 Hello there2 Hello there5 Hello there8 Hello there2Hello there5Hello there8
3 Hello there3 Hello there6 Hello there9 Hello there3Hello there6Hello there9
but how to introduce space character between the two column strings while concatenating in Pandas? 但是如何在Pandas中连接时在两个列字符串之间引入空格字符?
I have just started learning Pandas. 我刚刚开始学习熊猫。
You can also add separator between columns: 您还可以在列之间添加分隔符:
df['alltext'] = df['txt1'] + ' ' + df['txt2'] + ' ' + df['txt3']
Or filter by DataFrame.filter
only columns with txt
in column name and use join
per rows with apply
: 或者仅使用
DataFrame.filter
过滤列名称中包含txt
的列,并使用apply
join
每行:
df['alltext'] = df.filter(like='txt').apply(' '.join, 1)
Or filter only object columns by DataFrame.select_dtypes
- most times a Series
with a dtype of object is going to be a string
- but it could be any Python object
: 或者通过
DataFrame.select_dtypes
仅过滤对象列 - 大多数情况下,具有DataFrame.select_dtypes
对象的Series
将成为string
- 但它可以是任何Python object
:
df['alltext'] = df.select_dtypes('object').apply(' '.join, 1)
Or select columns by positions - all columns without first by DataFrame.iloc
: 或者按位置选择列 - 所有列
DataFrame.iloc
:
df['alltext'] = df.iloc[:, 1:].apply(' '.join, 1)
Thank you, @Jon Clements for solution for better matching columns names with txt
and numeric: 谢谢@Jon Clements的解决方案,以便用
txt
和numeric更好地匹配列名:
df['alltext'] = df.filter(regex=r'^txt\d+$').apply(' '.join, 1)
只需在它之间添加空间 ,
df['alltext'] = df['txt1'] + ' ' + df['txt2'] + ' ' + df['txt3']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.