简体   繁体   中英

python check if dataframe column contains string with specific length

I need to create a function to check the length of a string in dataframe columns.

I have this code

df['XXX'] = df['XXX'].map(lambda x: x if isinstance(x, (datetime)) else None)
df_col_len = int(df['XXX']].str.encode(encoding='utf-8').str.len().max())
if df_col_len > 4:
  print("In this step it will send a email")

The problem is that I have about 20 columns and each column should have a different length.

I need to check if the 1st column has max length <4, the 3rd column max length <50, the 7th column max length <47, etc. And then if a column does not meet the condition, write which column does not meet it.

Do you have an idea how to check the necessary columns at once?

Thanks

You can use .lt (lower than) on dataframes:

Sample data:

import pandas as pd
import numpy as np

d1 = {'A': {0: 'a', 1: 'ab', 2: 'abc'}, 'B': {0: 'abcd', 1: 'abcde', 2: 'abcdef'}, 'C': {0: 'abcdefg', 1: 'abcdefge', 2: 'abcdefgeh'}}
df = pd.DataFrame(d1)

Code:

max_len = {'A': 2, 'B': 5, 'C': 10}

# return length of element in your dataframe
df_check = df.applymap(len)
# create a new auxiallry dataframe with the values you want as a maximum
df_max = pd.DataFrame(np.repeat(pd.DataFrame(max_len, index=[1]).values, len(df), axis=0), columns=df.columns)

# check if the length of the actual value are *lower than* their max
df_check.lt(df_max)

Output:

Input, looks like:

     A       B          C
0    a    abcd    abcdefg
1   ab   abcde   abcdefge
2  abc  abcdef  abcdefgeh


Output, looks like:

       A      B     C
0   True   True  True
1  False  False  True
2  False  False  True

Additional notes:

To then find the column name you can look into this question .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM