简体   繁体   English

在 python 中将多个列合并为一个字符串

[英]Combine multiple columns as a string in python

I am trying to combine multiple columns from a dataframe into a new column in the same dataframe. Those columns could either have a string value or are Na or NaN.我正在尝试将 dataframe 中的多个列合并到同一个 dataframe 中的一个新列中。这些列可以有一个字符串值或者是 Na 或 NaN。 Whenever one column is Na/NaN, I would like these columns to not be included in the final string.每当一列是 Na/NaN 时,我希望这些列不包含在最终字符串中。

Eg例如

       a        b        c        d       RESULT
0      AA       BB       CC       DD      AA;BB;CC;DD
1      ab       Na       cd       da      ab;cd;da
2      Na       xx       Na       Na      xx
3      Na       Na       Na       Na      Na

I have tested multiple functions already.我已经测试了多个功能。

  1. df['RESULT'] = df['a'] + ";" + df['b'] +... df['RESULT'] = df['a'] + ";" + df['b'] +... does not work as it will still nest the Na's. df['RESULT'] = df['a'] + ";" + df['b'] +...不起作用,因为它仍然会嵌套 Na。
  2. df['RESULT'] = ";".join(df['a'],df['b],...) does not work as join just takes one argument (and I have 4) df['RESULT'] = ";".join(df['a'],df['b],...)不起作用,因为 join 只需要一个参数(我有 4 个)
  3. df['RESULTS'] = [f"{a};{b};{c}" for a,b,c in zip(df['a'],df['b'], df['b'])] does not work as it adds the Na's as string to the output df['RESULTS'] = [f"{a};{b};{c}" for a,b,c in zip(df['a'],df['b'], df['b'])]不起作用,因为它将 Na 的字符串添加到 output
  4. Pandas str.cat(): df['fill_name']= df['RESULTS'].str.cat(df['a'],sep=";").str.cat(df['b'],sep=";")... is the closest to what I am looking for, but as soon as there is one Na in one column, the whole output is Na. Pandas str.cat(): df['fill_name']= df['RESULTS'].str.cat(df['a'],sep=";").str.cat(df['b'],sep=";")...是最接近我要找的东西,但是只要一列中有一个 Na,整个 output 就是 Na。

In the end I am looking into something like the "TEXTJOIN" function in Excel.最后,我正在研究Excel 中的“TEXTJOIN”function。

A combo with pandas.DataFrame.stack and GroupBy.agg : pandas.DataFrame.stackGroupBy.agg的组合:

cols = ["a", "b", "c", "d"]

df["RESULT"] = df[cols].stack().groupby(level=0).agg(";".join)

Output: Output:

print(df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

Use DataFrame.stack for remove missing values and aggregate join :使用DataFrame.stack删除缺失值和聚合join

columns = ['a','b','c','d']
df['RESULT'] = df[columns].stack().groupby(level=0).agg(';'.join)
print (df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

Or remove missing values in custom function with replace empty strings:或者用替换空字符串删除自定义 function 中的缺失值:

df['RESULT'] = df[columns].agg(lambda x: ";".join(x.dropna()), axis=1).replace('',np.nan)
print (df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM