简体   繁体   English

如何将Dataframe列分为两部分并用拆分值替换列

[英]How to split Dataframe column into two parts and replace column with splitted value

How can I split a dataframe column into two parts such that the value in dataframe column is later replaced by the splitted value. 如何将数据框列拆分为两部分,以便稍后将数据框列中的值替换为拆分后的值。 For example, I have a dataframe like : 例如,我有一个数据框,如:

col1       col2
"abc"      "A, BC"
"def"      "AX, Z"
"pqr"      "P, R"
"xyz"      "X, YZ"

I want to extract values before , and replace that cell with the extracted value. 我想先提取值,然后用提取的值替换该单元格。 So, the output should look like : 因此,输出应如下所示:

col1   col2
abc    A
def    AX
pqr    P
xyz    X

I am trying to do it as : 我正在尝试这样做:

df['col2'].apply(lambda x: x.split(',')[0])

But it gives me error. 但这给了我错误。 Please suggest how can I get the desired output. 请提出如何获得所需的输出的建议。

In this case you can you the str methods of pandas , that will use vectorized functions. 在这种情况下,您可以使用将使用矢量化函数的pandasstr方法。 It will also be faster that apply . apply速度也将更快。

df.col2 = df.col2.str.split(', ').str[0]

>>> df
Out[]:
  col1 col2
0  abc    A
1  def   AX
2  pqr    P
3  xyz    X

To use this on Series containing string, you should call the str attribute before any function. 要在包含字符串的Series上使用此函数,应在任何函数之前调用str属性。 See the doc for more details. 有关更多详细信息,请参见文档

In the above solution, note the .str.split(', ') that replace split . 在上述解决方案中,请注意替换了split.str.split(', ') And .str[0] that allow to slice the result of the split, whereas just using .str.split(', ')[0] would get index 0 of the Series . .str[0]允许切片拆分结果,而仅使用.str.split(', ')[0]将获得Series索引0。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM