简体   繁体   English

如何根据 Python 中列的字符串长度对 dataframe 中的字符串进行切片?

[英]How to slice strings in dataframe based on string length of column in Python?

The problem i want to solve is: Use Len() on a column and the number of characters for each row needs to be applied to another column.我要解决的问题是:在一列上使用 Len() 并且每行的字符数需要应用于另一列。

I have a dataframe with general ledger codes that don't have the same length and i need to find the lowest level of detail to prevent double counting.我有一个 dataframe,总帐代码的长度不同,我需要找到最低级别的详细信息以防止重复计算。 The way i can find it is by comparing the digits of the current row with the next row using the number of characters of the current row.我找到它的方法是使用当前行的字符数将当前行的数字与下一行的数字进行比较。 For example, 11.0 and 111.0 are grouped accounts of 1111-1123.例如,11.0 和 111.0 是 1111-1123 的分组帐户。 I only want 111-1123 and exclude the group accounts.我只想要 111-1123 并排除组帐户。

I can use the LEN function to get the number of characters of the current row, but i am not able to apply this for the entire column.我可以使用 LEN function 获取当前行的字符数,但我无法将其应用于整列。

My dataframe looks like this:我的 dataframe 看起来像这样:

: :
df3['Next_Account'] = df3['Account'].shift(-1)
df3['Len_account'] = df3['Account'].str.len()-2

    Account    Amount Next_account  Len_Account  
0      11.0   1000.82        111.0            2   
1     111.0   1000.42       1111.0            3      
2    1111.0    791.51       1115.0            4     
3    1115.0   1802.19       1116.0            4      
4    1116.0    202.36       1117.0            4      
5    1117.0   1507.33       1118.0            4      
6    1118.0      0.03       1119.0            4       
7    1119.0      0.00       1120.0            4        
8    1120.0      0.00       1121.0            4        
9    1121.0     24.28       1122.0            4        
10   1122.0    376.87       1123.0            4       
11   1123.0      0.25         12.0            4          
14     12.0  80179.92        121.0            2        
15    121.0  80179.92      12101.0            3        
16  12101.0      0.00      12102.0            5        
      

I tried calculating this by adding a new column for the next row, adding a new column for the Length of the characters for the current row.我尝试通过为下一行添加一个新列,为当前行的字符长度添加一个新列来计算它。

df3['current_digits_next'] = df3['Next_Account'].str[:df3['Len_Account']]
df3

    current_digits_next  
0                   NaN  
1                   NaN  
2                   NaN  
3                   NaN  
4                   NaN  
5                   NaN  
6                   NaN  
7                   NaN  
8                   NaN  
9                   NaN  
10                  NaN  
11                  NaN  
14                  NaN  
15                  NaN  
16                  NaN  

I tried getting the number of characters of the Next account by using the string function, but this does not work for some reason.我尝试使用字符串 function 获取 Next 帐户的字符数,但由于某种原因这不起作用。

    current_digits_next  
0                   11  
1                   111  
2                   1115  
3                   1116 
4                   1117 
5                   1118 
6                   1119 
7                   1120 
8                   1121 
9                   1122 
10                  1123  
11                  12.0  
14                  12  
15                  121  
16                  12102  

The preferred output is:首选output为:

 current_digits_next 0 11 1 111 2 1115 3 1116 4 1117 5 1118 6 1119 7 1120 8 1121 9 1122 10 1123 11 12.0 14 12 15 121 16 12102

With the preferred output i can match the data and exclude the grouped accounts.使用首选 output 我可以匹配数据并排除分组帐户。 What am i doing wrong?我究竟做错了什么?

str accessor accepts int rather Series as index. str访问器接受 int 而不是 Series 作为索引。 You can try apply on rows您可以尝试在行上apply

df3['current_digits_next'] = df3.apply(lambda row: str(row['Next_Account'])[:row['Len_account']], axis=1)
    Account    Amount Next_Account  Len_account current_digits_next
0      11.0   1000.82        111.0            2                  11
1     111.0   1000.42       1111.0            3                 111
2    1111.0    791.51       1115.0            4                1115
3    1115.0   1802.19       1116.0            4                1116
4    1116.0    202.36       1117.0            4                1117
5    1117.0   1507.33       1118.0            4                1118
6    1118.0      0.03       1119.0            4                1119
7    1119.0      0.00       1120.0            4                1120
8    1120.0      0.00       1121.0            4                1121
9    1121.0     24.28       1122.0            4                1122
10   1122.0    376.87       1123.0            4                1123
11   1123.0      0.25         12.0            4                12.0
12     12.0  80179.92        121.0            2                  12
13    121.0  80179.92      12101.0            3                 121

You can convert your Account field to a string and then use apply to check for the required condition您可以将您的Account字段转换为字符串,然后使用apply来检查所需的条件

s1 = df['Account'].astype(int).astype(str)
s2 = df['Account'].astype(int).astype(str).shift(-1)
s3 = pd.concat([s1, s2], axis=1, ignore_index=True).loc[:len(s1), :].apply(lambda x: x[0] in x[1], axis=1)
df = pd.concat([df, s3], axis=1).fillna(False)
print(df)
    Account    Amount      0
0      11.0   1000.82   True
1     111.0   1000.42   True
2    1111.0    791.51  False
3    1115.0   1802.19  False
4    1116.0    202.36  False
5    1117.0   1507.33  False
6    1118.0      0.03  False
7    1119.0      0.00  False
8    1120.0      0.00  False
9    1121.0     24.28  False
10   1122.0    376.87  False
11   1123.0      0.25  False
14     12.0  80179.92   True
15    121.0  80179.92   True
16  12101.0      0.00  False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM