简体   繁体   English

熊猫根据条件将一列分成两列

[英]Panda split a column into 2 columns based on condition

I have a column in my CSV file in the following way我的 CSV 文件中有一个列,方法如下

4048.187796
4254.6215672333340-0-0-
4229.9155995666671-0-0-
4427.0494321833340-0-0-
4303.428593050-0-0-
4256.6235064166670-0-0-
4132.5399525833330-0-0-
4263.5820142833341-0-0-
4320.6955591833340-0-0-
4342.1270119333330-0-0-
4447.8283416833340-0-0-
4409.2305202500010-0-0-
4280.650570850-1-0-
4283.5942898166680-0-0-
4341.1896358666670-0-0-
4263.1282187000010-0-0-
4222.3119095333330-0-0-
4314.9844073333331-0-0-

The format of the value is some fraction value + ((1|0)-.*)?值的格式是某个fraction value + ((1|0)-.*)? Some lines will only have a fraction value.有些行只有一个分数值。 E,g.例如。 the 1st line.第 1 行。 I want to split this into two columns as follows我想将其分成两列,如下所示

4048.187796, 
4254.621567233334, 0-0-0-
4229.915599566667, 1-0-0-
4427.049432183334, 0-0-0-
4303.42859305, 0-0-0-
4256.623506416667, 0-0-0-
4132.539952583333, 0-0-0-
4263.582014283334, 1-0-0-
4320.695559183334, 0-0-0-
4342.127011933333, 0-0-0-
4447.828341683334, 0-0-0-
4409.230520250001, 0-0-0-
4280.65057085, 0-1-0-
4283.594289816668, 0-0-0-
4341.189635866667, 0-0-0-
4263.128218700001, 0-0-0-
4222.311909533333, 0-0-0-
4314.984407333333, 1-0-0-

I can do this by reading line by line and then manipulate each value by finding the index of '-' and substring that by index - 2. But as I have several files and each file have more than 1000 lines I don't want to do that.我可以通过逐行读取来做到这一点,然后通过查找'-'的索引和 substring 的索引来操作每个值 - 2。但是因为我有几个文件,每个文件有超过 1000 行我不想去做。 Is there a way for me to do this directly using panda and slice functions?有没有办法让我直接使用 panda 和 slice 函数来做到这一点?

I tried df['new_col'] = df['last'].str.slice But Ican't give a fix value to slice start index as it changes from row to row我试过df['new_col'] = df['last'].str.slice但是我不能给切片开始索引一个固定值,因为它从行到行

Try regular expression ^([0-9.]+)((?:[01]-)*)$ + str.extract :尝试正则表达式^([0-9.]+)((?:[01]-)*)$ + str.extract

df.last.str.extract('^([0-9.]+)((?:[01]-)*)$')

#                    0       1
#0         4048.187796        
#1   4254.621567233334  0-0-0-
#2   4229.915599566667  1-0-0-
#3   4427.049432183334  0-0-0-
#4       4303.42859305  0-0-0-
#5   4256.623506416667  0-0-0-
#6   4132.539952583333  0-0-0-
#7   4263.582014283334  1-0-0-
#8   4320.695559183334  0-0-0-
#9   4342.127011933333  0-0-0-
#10  4447.828341683334  0-0-0-
#11  4409.230520250001  0-0-0-
#12      4280.65057085  0-1-0-
#13  4283.594289816668  0-0-0-
#14  4341.189635866667  0-0-0-
#15  4263.128218700001  0-0-0-
#16  4222.311909533333  0-0-0-
#17  4314.984407333333  1-0-0-

Play .

We can do two steps with contains and replace我们可以用containsreplace做两个步骤

df['New']=np.where(df.Check.str.contains('-'),df.Check.str[-6:],'')
df.Check=df.Check.replace(regex=r'(?i)'+ df['New'],value="")
df
                Check     New
0         4048.187796        
1   4254.621567233334  0-0-0-
2   4229.915599566667  1-0-0-
3   4427.049432183334  0-0-0-
4       4303.42859305  0-0-0-
5   4256.623506416667  0-0-0-
6   4132.539952583333  0-0-0-
7   4263.582014283334  1-0-0-
8   4320.695559183334  0-0-0-
9   4342.127011933333  0-0-0-
10  4447.828341683334  0-0-0-
11  4409.230520250001  0-0-0-
12      4280.65057085  0-1-0-
13  4283.594289816668  0-0-0-
14  4341.189635866667  0-0-0-
15  4263.128218700001  0-0-0-
16  4222.311909533333  0-0-0-
17  4314.984407333333  1-0-0-

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM