[英]Replace duplicates with NAN in Pandas Series
i have a series where i want to replace the duplicated values in the series by NAN, or to replace them with empty string.我有一个系列,我想用 NAN 替换系列中的重复值,或者用空字符串替换它们。 following is my以下是我的
data_dict = [{"Geo": "Canada"}, {"Geo": "Sri Lanka"}, {"Geo": "Lahore"}, {"Geo": "Karachi"}, {"Geo": "Islamabad"},
{"Geo": "Other"}, {"Pipelines": "Sri Lanka"}, {"Pipelines": "Canada Exec"}, {"Pipelines": "USA SuperSA"},
{"Pipelines": "Others"}]
df = pd.DataFrame(data_dict)
stacked_df = df.stack()
print(stacked_df)
the Series output is as follows: output系列如下:
0 Geo Canada
1 Geo Sri Lanka
2 Geo Lahore
3 Geo Karachi
4 Geo Islamabad
5 Geo Other
6 Pipelines Sri Lanka
7 Pipelines Canada Exec
8 Pipelines USA SuperSA
9 Pipelines Others
dtype: object
desired output is following without index所需的 output 在没有索引的情况下跟随
Geo Canada
Sri Lanka
Lahore
Karachi
Islamabad
Other
Pipelines Sri Lanka
Canada Exec
USA SuperSA
Others
dtype: object
First, stack_df
is not a data frame, it is a series.首先, stack_df
不是一个数据框,它是一个系列。 Second, Geo
and Pipelines
are in the index, not a normal column.其次, Geo
和Pipelines
在索引中,而不是普通列。 That said, to obtain the desired output, I would do:也就是说,要获得所需的 output,我会这样做:
(stacked_df.reset_index(level=1)
.assign(level_1=lambda x: x.level_1.mask(x.level_1.duplicated(),""))
)
Output: Output:
level_1 0
0 Geo Canada
1 Sri Lanka
2 Lahore
3 Karachi
4 Islamabad
5 Other
6 Pipelines Sri Lanka
7 Canada Exec
8 USA SuperSA
9 Others
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.