简体   繁体   中英

Divide column values into sections and store section name in new column pandas

I have a column with multiple product names like

      Contract
0      O.U20
1      O.Z20
2      O.H21
3      O.M21
4      O.U21
5      O.Z21
6      O.H22
7      O.M22
8     S3.U20
9     S3.Z20
10    S6.M26
11    S6.U26
12    S6.Z26
13    S6.H27
14    S9.U26
15    S9.Z26
16    F3.U26
17    F3.Z26
18    F3.H27
19    F6.H26
20    F6.M26
21    F6.U26
22    F9.U20

What I want to do is assign Section name based on Contract name like

   Contract Sections
0     O.U20      O1
1     O.Z20      O1
2     O.H21      O1
3     O.M21      O1
4     O.U21      O2
5     O.Z21      O2
6     O.H22      O2
7     O.M22      O2
8    S3.U20       S3
9    S3.Z20       S3
10   S6.M26       S6
11   S6.U26       S6
12   S6.Z26       S6
13   S6.H27       S6
14   S9.U26       S9
15   S9.Z26       S9
16   F3.U26       F3
17   F3.Z26       F3
18   F3.H27       F3
19   F6.H26       F6
20   F6.M26       F6
21   F6.U26       F6
22   F9.U20       F9

For S and F series I can achieve the desired results using this code (Please let me know if there is a better way to achieve it)

df.loc[df['Contract'].str.contains('S3'),'Sections'] = 'S3'
df.loc[df['Contract'].str.contains('S6'),'Sections'] = 'S6'
df.loc[df['Contract'].str.contains('S9'),'Sections'] = 'S9'
df.loc[df['Contract'].str.contains('F3'),'Sections'] = 'F3'
df.loc[df['Contract'].str.contains('F6'),'Sections'] = 'F6'
df.loc[df['Contract'].str.contains('F9'),'Sections'] = 'F9'

Since it is just matching the string assigning the section name. Sadly O series does not have a number attached to it so I have to divide it into blocks of 4 like shown above

   Contract Sections
0     O.U20      O1
1     O.Z20      O1
2     O.H21      O1
3     O.M21      O1
4     O.U21      O2
5     O.Z21      O2
6     O.H22      O2
7     O.M22      O2

I tried the following code

df.loc[df['Contract'].str.contains('O'),'Sections'] = df.index // 4+1

but it's throwing the error

ValueError: could not broadcast input array from shape (23) into shape (8)

How can I achieve the results in a better and efficient way? Please note that this is just a sample data and the original dataset has many more values like this.

Change your code to

df.loc[df['Contract'].str.contains('O'),'Sections'] = 'O' +((df['Contract'].str.contains('O').cumsum().sub(1)//4) + 1).astype(str)

To simplify

df.loc[df['Contract'].str.contains('S3'),'Sections'] = 'S3'
df.loc[df['Contract'].str.contains('S6'),'Sections'] = 'S6'
df.loc[df['Contract'].str.contains('S9'),'Sections'] = 'S9'
df.loc[df['Contract'].str.contains('F3'),'Sections'] = 'F3'
df.loc[df['Contract'].str.contains('F6'),'Sections'] = 'F6'
df.loc[df['Contract'].str.contains('F9'),'Sections'] = 'F9'

just replace it with below 1 line of code:

df['Section'] = df['Contract'].str.split('.').str[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM