简体   繁体   中英

How to slice strings in dataframe based on string length of column in Python?

The problem i want to solve is: Use Len() on a column and the number of characters for each row needs to be applied to another column.

I have a dataframe with general ledger codes that don't have the same length and i need to find the lowest level of detail to prevent double counting. The way i can find it is by comparing the digits of the current row with the next row using the number of characters of the current row. For example, 11.0 and 111.0 are grouped accounts of 1111-1123. I only want 111-1123 and exclude the group accounts.

I can use the LEN function to get the number of characters of the current row, but i am not able to apply this for the entire column.

My dataframe looks like this:

:
df3['Next_Account'] = df3['Account'].shift(-1)
df3['Len_account'] = df3['Account'].str.len()-2

    Account    Amount Next_account  Len_Account  
0      11.0   1000.82        111.0            2   
1     111.0   1000.42       1111.0            3      
2    1111.0    791.51       1115.0            4     
3    1115.0   1802.19       1116.0            4      
4    1116.0    202.36       1117.0            4      
5    1117.0   1507.33       1118.0            4      
6    1118.0      0.03       1119.0            4       
7    1119.0      0.00       1120.0            4        
8    1120.0      0.00       1121.0            4        
9    1121.0     24.28       1122.0            4        
10   1122.0    376.87       1123.0            4       
11   1123.0      0.25         12.0            4          
14     12.0  80179.92        121.0            2        
15    121.0  80179.92      12101.0            3        
16  12101.0      0.00      12102.0            5        
      

I tried calculating this by adding a new column for the next row, adding a new column for the Length of the characters for the current row.

df3['current_digits_next'] = df3['Next_Account'].str[:df3['Len_Account']]
df3

    current_digits_next  
0                   NaN  
1                   NaN  
2                   NaN  
3                   NaN  
4                   NaN  
5                   NaN  
6                   NaN  
7                   NaN  
8                   NaN  
9                   NaN  
10                  NaN  
11                  NaN  
14                  NaN  
15                  NaN  
16                  NaN  

I tried getting the number of characters of the Next account by using the string function, but this does not work for some reason.

    current_digits_next  
0                   11  
1                   111  
2                   1115  
3                   1116 
4                   1117 
5                   1118 
6                   1119 
7                   1120 
8                   1121 
9                   1122 
10                  1123  
11                  12.0  
14                  12  
15                  121  
16                  12102  

The preferred output is:

 current_digits_next 0 11 1 111 2 1115 3 1116 4 1117 5 1118 6 1119 7 1120 8 1121 9 1122 10 1123 11 12.0 14 12 15 121 16 12102

With the preferred output i can match the data and exclude the grouped accounts. What am i doing wrong?

str accessor accepts int rather Series as index. You can try apply on rows

df3['current_digits_next'] = df3.apply(lambda row: str(row['Next_Account'])[:row['Len_account']], axis=1)
    Account    Amount Next_Account  Len_account current_digits_next
0      11.0   1000.82        111.0            2                  11
1     111.0   1000.42       1111.0            3                 111
2    1111.0    791.51       1115.0            4                1115
3    1115.0   1802.19       1116.0            4                1116
4    1116.0    202.36       1117.0            4                1117
5    1117.0   1507.33       1118.0            4                1118
6    1118.0      0.03       1119.0            4                1119
7    1119.0      0.00       1120.0            4                1120
8    1120.0      0.00       1121.0            4                1121
9    1121.0     24.28       1122.0            4                1122
10   1122.0    376.87       1123.0            4                1123
11   1123.0      0.25         12.0            4                12.0
12     12.0  80179.92        121.0            2                  12
13    121.0  80179.92      12101.0            3                 121

You can convert your Account field to a string and then use apply to check for the required condition

s1 = df['Account'].astype(int).astype(str)
s2 = df['Account'].astype(int).astype(str).shift(-1)
s3 = pd.concat([s1, s2], axis=1, ignore_index=True).loc[:len(s1), :].apply(lambda x: x[0] in x[1], axis=1)
df = pd.concat([df, s3], axis=1).fillna(False)
print(df)
    Account    Amount      0
0      11.0   1000.82   True
1     111.0   1000.42   True
2    1111.0    791.51  False
3    1115.0   1802.19  False
4    1116.0    202.36  False
5    1117.0   1507.33  False
6    1118.0      0.03  False
7    1119.0      0.00  False
8    1120.0      0.00  False
9    1121.0     24.28  False
10   1122.0    376.87  False
11   1123.0      0.25  False
14     12.0  80179.92   True
15    121.0  80179.92   True
16  12101.0      0.00  False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM