[英]Split and take part of string from column values and make new column from that in pandas python
I have a strings like this as a value of one column in my df. 我有一个这样的字符串作为我的df中一列的值。
ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232
ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232
How to get new column with part of this columns. 如何使用此列的一部分获取新列。 Part that I need is
我需要的部分是
74
89
string.split()
allows you to explode a string into a list of parts according to a separator (here /
and -
). string.split()
允许您根据分隔符(此处为/
和-
)将字符串分解为部件列表。
s = 'ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232'
print s.split('/')[2].split('-')[1]
# 74
Use pandas.apply()
to apply it to your column 使用
pandas.apply()
将其应用于您的列
df['b'] = df['a'].apply(lambda s:s.split('/')[2].split('-')[1])
print (df)
output 产量
a b
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232 74
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232 89
nb: Use @A-Za-z 's solution, it's faster than mine. nb:使用@ A-Za-z的解决方案,它比我的快。
If this is the df 如果这是df
val
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232
You can use str.extract 您可以使用str.extract
df['num_val'] = df.val.str.extract('LNFFF-(\d+)/', expand = False)
You get 你得到
val num_val
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232 74
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232 89
假设您的数据框名为df且列col:
df['sub_col'] = pd.Series([s[21:23] for s in df['col'].values], index=df.index)
It seems you need str.extract
: 看来你需要
str.extract
:
df = pd.DataFrame({'a': ['ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232',
'ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232']})
print (df)
a
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232
df['new'] = df['a'].str.extract('LNFFF-(\d+)', expand=False)
#if necessary convert to ints
df['new'] = df['new'].astype(int)
print (df)
a new
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232 74
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232 89
Solution with splitting by split
and selecting by indexing with str : 通过
split
并通过索引使用str进行选择的解决方案:
df['new'] = df['a'].str.split('/').str[2].str.extract('(\d+)', expand=False)
print (df)
a new
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232 74
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232 89
df['new'] = df['a'].str.split('/').str[2].str.split('-').str[1]
print (df)
a new
0 ttt-OPP/MKKL-7/LNFFF-74/OOOP-71/AAD-1/RRR-232 74
1 ttt-OPP/MKKL-7/LNFFF-89/OOOP-71/AAD-1/RRR-232 89
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.