简体   繁体   English

将 dataframe 的多列与非空值的分隔符连接起来

[英]Concatenate multiple columns of dataframe with a seperating character for Non-null values

I have a data frame like this:我有一个这样的数据框:

df:
C1   C2  C3
1    4    6
2   NaN   9
3    5   NaN
NaN  7    3

I want to concatenate the 3 columns to a single column with comma as a seperator.我想用逗号作为分隔符将 3 列连接成一列。 But I want the comma(",") only in case of non-null value.但我只在非空值的情况下才想要逗号(“,”)。

I tried this but this doesn't work for non-null values:我试过了,但这不适用于非空值:

df['New_Col'] = df[['C1','C2','C3']].agg(','.join, axis=1)

This gives me the output:这给了我 output:

New_Col
1,4,6
2,,9
3,5,
,7,3

This is my ideal output:这是我理想的 output:

New_Col
1,4,6
2,9
3,5
7,3

Can anyone help me with this?谁能帮我这个?

In your case do stack在你的情况下做stack

df['new'] = df.stack().astype(int).astype(str).groupby(level=0).agg(','.join)
Out[254]: 
0    1,4,6
1      2,9
2      3,5
3      7,3
dtype: object

Judging by your (wrong) output, you have a dataframe of strings and NaN values are actually empty strings (otherwise it would throw TypeError: expected str instance, float found because NaN is a float).从您的(错误的)output 来看,您有一个 dataframe 字符串,而 NaN 值实际上是空字符串(否则它会抛出TypeError: expected str instance, float found because NaN is a float)。

Since you're dealing with strings, pandas is not optimized for it, so a vanilla Python list comprehension is probably the most efficient choice here.由于您正在处理字符串,因此 pandas 没有针对它进行优化,因此香草 Python 列表理解可能是这里最有效的选择。

df['NewCol'] = [','.join([e for e in x if e]) for x in df.values]

结果

You can use filter to get rid of NaN s:您可以使用filter摆脱NaN s:

df['New_Col'] = df.apply(lambda x: ','.join(filter(lambda x: x is not np.nan,list(x))), axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从熊猫数据框中的多个列创建一个包含所有非空值的单个列 - create a single column containing all non-null values from multiple columns in a pandas dataframe 从PySpark DataFrame中的非空列中选择值 - Selecting values from non-null columns in a PySpark DataFrame 如何将 Pandas Dataframe 中某些列的非空值填充到新列中? 如何在多个条件下使用 np.where()? - How to fill Non-Null values from some columns in Pandas Dataframe into a new column? How to use np.where() for multiple conditions? 有条件地设置DataFrame的非空值 - Set non-null values of DataFrame conditionally 使用基于具有非空值的其他列的lambda在数据框中创建列 - Create a column in dataframe using lambda based on another columns with non-null values 如何从Python数据框的多个列中选择所有非NULL值 - How to pick out all non-NULL value from multiple columns in Python Dataframe 使用现有 DataFrame 的前 8 个 NON-NULL 值创建一个新的 DataFrame - Create a new DataFrame with first 8 NON-NULL values of existing DataFrame 使用来自其他列的非空值填充列中的空值 - Fill nulls in columns with non-null values from other columns PySpark数据框:过滤具有四个或更多非空列的记录 - PySpark dataframe: filter records with four or more non-null columns 如何将DataFrame列的非空条目组合到新列中? - How to combine non-null entries of columns of a DataFrame into a new column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM