[英]Split column in a pandas dataframe
I have a file in which one of the column is a multi-value field, for example:我有一个文件,其中一列是多值字段,例如:
Col1|Col2
rec1|xyz#tew
rec2|
rec3|jkl#qwer
I need to split the Col2 based on delimiter, and following is the code which I am using:我需要根据定界符拆分 Col2,以下是我正在使用的代码:
x = ['Col1','Col2']
df[x] = (df[x].apply(lambda c: c.str.split('#',expand=True))
With this code I am getting following error: "AttributeError: 'Series' object has no attribute 'series' "使用此代码,我收到以下错误:“AttributeError:'Series'object 没有属性'series'”
I tried using replace and fillna, but no luck, can someone please help in correcting the above code我尝试使用 replace 和 fillna,但没有运气,有人可以帮助更正上面的代码
First, we'll need to replace the NaN values in a clever manner:首先,我们需要巧妙地替换 NaN 值:
>> df["Col2"] = df["Col2"].fillna("#")
Now, split the strings in the "Col2" column:现在,拆分“Col2”列中的字符串:
>> df["Col2"] = df["Col2"].str.split("#", n=1) # n=1 to make sure every list has 2 values
>> df
Col1 Col2
0 rec1 [xyz, tew]
1 rec2 [, ]
2 rec3 [jkl, qwer]
Now, merge your original dataframe with a new dataframe created from the lists of the previous step现在,将您的原始 dataframe 与根据上一步的列表创建的新 dataframe 合并
>> df = df.join(pd.DataFrame(df["Col2"].values.tolist())) # .add_prefix('col_'))
You can add a prefix if you want to name your columns (add .add_prefix('Col_')
at the end, for example).如果要命名列,可以添加前缀(例如,在末尾添加
.add_prefix('Col_')
)。 Drop your old "Col2":放下旧的“Col2”:
>> df = df.drop("Col2", axis=1)
>> df
Col1 0 1
0 rec1 xyz tew
1 rec2
2 rec3 jkl qwer
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.