[英]Combine multiple columns as a string in python
I am trying to combine multiple columns from a dataframe into a new column in the same dataframe. Those columns could either have a string value or are Na or NaN.我正在尝试将 dataframe 中的多个列合并到同一个 dataframe 中的一个新列中。这些列可以有一个字符串值或者是 Na 或 NaN。 Whenever one column is Na/NaN, I would like these columns to not be included in the final string.
每当一列是 Na/NaN 时,我希望这些列不包含在最终字符串中。
Eg例如
a b c d RESULT
0 AA BB CC DD AA;BB;CC;DD
1 ab Na cd da ab;cd;da
2 Na xx Na Na xx
3 Na Na Na Na Na
I have tested multiple functions already.我已经测试了多个功能。
df['RESULT'] = df['a'] + ";" + df['b'] +...
df['RESULT'] = df['a'] + ";" + df['b'] +...
does not work as it will still nest the Na's. df['RESULT'] = df['a'] + ";" + df['b'] +...
不起作用,因为它仍然会嵌套 Na。df['RESULT'] = ";".join(df['a'],df['b],...)
does not work as join just takes one argument (and I have 4) df['RESULT'] = ";".join(df['a'],df['b],...)
不起作用,因为 join 只需要一个参数(我有 4 个)df['RESULTS'] = [f"{a};{b};{c}" for a,b,c in zip(df['a'],df['b'], df['b'])]
does not work as it adds the Na's as string to the output df['RESULTS'] = [f"{a};{b};{c}" for a,b,c in zip(df['a'],df['b'], df['b'])]
不起作用,因为它将 Na 的字符串添加到 outputdf['fill_name']= df['RESULTS'].str.cat(df['a'],sep=";").str.cat(df['b'],sep=";")...
is the closest to what I am looking for, but as soon as there is one Na in one column, the whole output is Na. df['fill_name']= df['RESULTS'].str.cat(df['a'],sep=";").str.cat(df['b'],sep=";")...
是最接近我要找的东西,但是只要一列中有一个 Na,整个 output 就是 Na。 In the end I am looking into something like the "TEXTJOIN" function in Excel.最后,我正在研究Excel 中的“TEXTJOIN”function。
A combo with pandas.DataFrame.stack
and GroupBy.agg
: pandas.DataFrame.stack
和GroupBy.agg
的组合:
cols = ["a", "b", "c", "d"]
df["RESULT"] = df[cols].stack().groupby(level=0).agg(";".join)
Output: Output:
print(df)
a b c d RESULT
0 AA BB CC DD AA;BB;CC;DD
1 ab NaN cd da ab;cd;da
2 NaN xx NaN NaN xx
3 NaN NaN NaN NaN NaN
Use DataFrame.stack
for remove missing values and aggregate join
:使用
DataFrame.stack
删除缺失值和聚合join
:
columns = ['a','b','c','d']
df['RESULT'] = df[columns].stack().groupby(level=0).agg(';'.join)
print (df)
a b c d RESULT
0 AA BB CC DD AA;BB;CC;DD
1 ab NaN cd da ab;cd;da
2 NaN xx NaN NaN xx
3 NaN NaN NaN NaN NaN
Or remove missing values in custom function with replace empty strings:或者用替换空字符串删除自定义 function 中的缺失值:
df['RESULT'] = df[columns].agg(lambda x: ";".join(x.dropna()), axis=1).replace('',np.nan)
print (df)
a b c d RESULT
0 AA BB CC DD AA;BB;CC;DD
1 ab NaN cd da ab;cd;da
2 NaN xx NaN NaN xx
3 NaN NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.