[英]How do I merge columns that have similar names in a pandas dataframe?
I have a dataframe which has the words Due Date written differently but it all means the same.我有一个 dataframe,它的截止日期写法不同,但意思相同。 The problem is in my master data(xls file), one due date has an extra space or doesnt and i cant change that.All i can change is my final output.问题出在我的主数据(xls 文件)中,一个截止日期有额外的空间或没有,我无法更改它。我只能更改我的最终 output。
Sr no Due Date Due Date DueDate
1 1/2/22
2 1/5/22
3
4
5 ASAP
I just want that column 2 and 3 all combine under column one at the same location they were我只希望第 2 列和第 3 列全部合并到第 1 列下的相同位置
Sr No. Due Date
1 1/2/22
2 1/5/22
3
4
5 ASAP
Try with bfill
尝试使用bfill
out = df.bfill(axis = 1)[['Sr No','Due Date']]
You can use filter
with a regex to get similar names, then bfill
and get the first.您可以使用带有正则表达式的filter
来获取相似的名称,然后bfill
并获取第一个。 Finally join to original devoid of the found columns:最后加入没有找到的列的原始文件:
d = df.filter(regex=r'(?i)due\s*date')
df2 = (df
.drop(columns=list(d.columns))
.join(d.bfill(1).iloc[:,0])
)
Output: Output:
Sr no Due Date
0 1 1/2/22
1 2 1/5/22
2 3 None
3 4 None
4 5 ASAP
Possible solution is the following:可能的解决方案如下:
import pandas as pd
# set test data
data = {"Sr no": [1,2,3,4,5],
"Due Date": ["1/2/22", "", "", "", ""],
"Due Date ": ["", "1/2/22", "", "", ""],
" Due Date": ["", "", "", "", "ASAP"]
}
# create pandas dataframe
df = pd.DataFrame(data)
# clean up column names
df.columns = [col.strip() for col in df.columns]
# group data
df = df.groupby(df.columns, axis=1).agg(lambda x: x.apply(lambda y: ''.join([str(l) for l in y if str(l) != "nan"]), axis=1))
# reorder column
df = df[['Sr no', 'Due Date']]
df
Returns退货
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.