使用熊猫拆分几列

Question

I want to split string in several columns. 我想将字符串分成几列。 For example, I'd like to select some information from col2, col3 and col5 in below dataframe (but indeed I have more than hundred columns to do so). 例如，我想在下面的数据框中从col2，col3和col5中选择一些信息（但实际上我有一百多列要做）。

d = pd.DataFrame({
                  'col1' : ['USA', 'AGN'],
                  'col2' : ['0|0:0.014:0.986,0.013,0', '1|0:0.02:1.936,0.023,1'],
                  'col3' : ['1|0:0.024:0.9,0.01345,2', '0|2:0.213:0.92,0.1,2'],
                  'col4' : ['done', 'done'],
                  'col5' : ['2|0:0.02:1.936,0.023,1', '1|0:0.024:0.9,0.01345,2']
                  })

  col1                     col2                     col3  col4 .....
0  USA  0|0:0.014:0.986,0.013,0  1|0:0.024:0.9,0.01345,2  done .....  
1  AGN   1|0:0.02:1.936,0.023,1     0|2:0.213:0.92,0.1,2  done .....

I only need first 3 marks from that long string. 我只需要该长字符串的前3个标记 。 Then I expect I can see from my result such as below. 然后，我希望可以从如下结果中看到。

col1 col2  col3  col4  col5  ....
USA   0|0   1|0  done   2|0  ....
AGN   1|0   0|2  done   1|0  ....

Any hint please? 有什么提示吗？

Answer 1

if i understood your question correctly, you can do it this way: 如果我正确理解了您的问题，则可以这样进行：

In [254]: d.replace(r':.*', '', regex=True)
Out[254]:
  col1 col2 col3  col4 col5
0  USA  0|0  1|0  done  2|0
1  AGN  1|0  0|2  done  1|0

Answer 2

To get the first three string characters: 要获取前三个字符串字符：

>>> d.col2.str[:3]
0    0|0
1    1|0
Name: col2, dtype: object

To split on ":" and take the first item: 要分割“：”并采用第一项：

>>> d.col2.str.split(':', expand=True)[0]
0    0|0
1    1|0
Name: 0, dtype: object

To apply it to a group of columns: 要将其应用于一组列：

cols = ['col2', 'col3', 'col5']
d.loc[:, cols] = d.loc[:, cols].apply(lambda s: s.str[:3])

>>> d
  col1 col2 col3  col4 col5
0  USA  0|0  1|0  done  2|0
1  AGN  1|0  0|2  done  1|0

使用熊猫拆分几列

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-05-04 16:37:07

解决方案2
1 2016-05-04 17:03:59

使用熊猫拆分几列

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-05-04 16:37:07

解决方案2 1 2016-05-04 17:03:59

解决方案1
2 已采纳 2016-05-04 16:37:07

解决方案2
1 2016-05-04 17:03:59