[英]Need to create a Dataframe where the columns are created by looping through the values of another Dataframe columns. How can I do this in PySpark?
[英]How could I replace the values from a dataframe column into another dataframe columns. Just like the image
fillna是您需要的功能。 首先,将“未知”值替换为 nan,然后将所有 nan 替换为其他列值。
import pandas as pd
from numpy import nan
df = (
pd.DataFrame(
{
'CITY_MULTIPLE_CHOICE': ['new york', 'chicago', 'unknown', 'Los Angeles'],
'CITY_OPEN': ['unknown', 'unknown', 'Chicago', 'Chicago'],
}
).replace('unknown', nan)
.assign(
CITY_MULTIPLE_CHOICE=lambda x: x.CITY_MULTIPLE_CHOICE.fillna(value=x.CITY_OPEN),
CITY_OPEN=lambda x: x.CITY_OPEN.fillna(value=x.CITY_MULTIPLE_CHOICE)
)
)
print(df)
和
CITY_MULTIPLE_CHOICE CITY_OPEN
0 new york unknown
1 chicago unknown
2 unknown Chicago
3 Los Angeles Chicago
以前的代码将打印:
CITY_MULTIPLE_CHOICE CITY_OPEN
0 new york new york
1 chicago chicago
2 Chicago Chicago
3 Los Angeles Chicago
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.