简体   繁体   English

拆分列在 pandas dataframe

[英]Split column in a pandas dataframe

I have a file in which one of the column is a multi-value field, for example:我有一个文件,其中一列是多值字段,例如:

Col1|Col2
rec1|xyz#tew
rec2|
rec3|jkl#qwer

I need to split the Col2 based on delimiter, and following is the code which I am using:我需要根据定界符拆分 Col2,以下是我正在使用的代码:

x = ['Col1','Col2']
df[x] = (df[x].apply(lambda c: c.str.split('#',expand=True))

With this code I am getting following error: "AttributeError: 'Series' object has no attribute 'series' "使用此代码,我收到以下错误:“AttributeError:'Series'object 没有属性'series'”

I tried using replace and fillna, but no luck, can someone please help in correcting the above code我尝试使用 replace 和 fillna,但没有运气,有人可以帮助更正上面的代码

First, we'll need to replace the NaN values in a clever manner:首先,我们需要巧妙地替换 NaN 值:

>> df["Col2"] = df["Col2"].fillna("#")

Now, split the strings in the "Col2" column:现在,拆分“Col2”列中的字符串:

>> df["Col2"] = df["Col2"].str.split("#", n=1)  # n=1 to make sure every list has 2 values
>> df
   Col1         Col2
0  rec1   [xyz, tew]
1  rec2         [, ]
2  rec3  [jkl, qwer]

Now, merge your original dataframe with a new dataframe created from the lists of the previous step现在,将您的原始 dataframe 与根据上一步的列表创建的新 dataframe 合并

>> df = df.join(pd.DataFrame(df["Col2"].values.tolist())) # .add_prefix('col_'))

You can add a prefix if you want to name your columns (add .add_prefix('Col_') at the end, for example).如果要命名列,可以添加前缀(例如,在末尾添加.add_prefix('Col_') )。 Drop your old "Col2":放下旧的“Col2”:

>> df = df.drop("Col2", axis=1)
>> df

   Col1    0     1
0  rec1  xyz   tew
1  rec2           
2  rec3  jkl  qwer

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM