[英]How to split Dataframe column into two parts and replace column with splitted value
How can I split a dataframe column into two parts such that the value in dataframe column is later replaced by the splitted value. 如何将数据框列拆分为两部分,以便稍后将数据框列中的值替换为拆分后的值。 For example, I have a dataframe like :
例如,我有一个数据框,如:
col1 col2
"abc" "A, BC"
"def" "AX, Z"
"pqr" "P, R"
"xyz" "X, YZ"
I want to extract values before , and replace that cell with the extracted value. 我想先提取值,然后用提取的值替换该单元格。 So, the output should look like :
因此,输出应如下所示:
col1 col2
abc A
def AX
pqr P
xyz X
I am trying to do it as : 我正在尝试这样做:
df['col2'].apply(lambda x: x.split(',')[0])
But it gives me error. 但这给了我错误。 Please suggest how can I get the desired output.
请提出如何获得所需的输出的建议。
In this case you can you the str
methods of pandas
, that will use vectorized functions. 在这种情况下,您可以使用将使用矢量化函数的
pandas
的str
方法。 It will also be faster that apply
. apply
速度也将更快。
df.col2 = df.col2.str.split(', ').str[0]
>>> df
Out[]:
col1 col2
0 abc A
1 def AX
2 pqr P
3 xyz X
To use this on Series
containing string, you should call the str
attribute before any function. 要在包含字符串的
Series
上使用此函数,应在任何函数之前调用str
属性。 See the doc for more details. 有关更多详细信息,请参见文档 。
In the above solution, note the .str.split(', ')
that replace split
. 在上述解决方案中,请注意替换了
split
的.str.split(', ')
。 And .str[0]
that allow to slice the result of the split, whereas just using .str.split(', ')[0]
would get index 0 of the Series
. 和
.str[0]
允许切片拆分结果,而仅使用.str.split(', ')[0]
将获得Series
索引0。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.