df:
company_name product Id rating
0 matrix mobile Id456 2.5
1 ins-faq alpha1 Id956 3.5
2 metric5 sounds-B Id-356 2.5
3 ingsaf digital Id856 4star
4 matrix win11p Idklm 2.0
5 4567 mobile 596 3.5
df2:
Col_name Datatype
0 company_name String #(pure string)
1 Product String #(pure string)
2 Id Alpha-Numeric #(must contain atleast 1 number and 1 alphabet)
3 rating Float or int
df is the main dataframe and df2 is the expected datatype information of main dataframe.
how to check every column values extract wrong datatype values.
output:
row_num col_name current_value expected_dtype
0 2 company_name metric5 string
1 5 company_name 4567 string
2 1 Product alpha1 string
3 4 Product win11p string
4 4 Id Idklm Alpha-Numeric
5 5 Id 596 Alpha-Numeric
6 3 rating 4star Float or int
For columns that cannot contain numbers, you can find the exceptions with:
In [5]: df['product'].str.contains(r'[0-9]')
Out[5]:
0 False
1 True
2 False
3 False
4 True
5 False
Name: product, dtype: bool
For Alpha-Numeric columns identify compliance with:
In [7]: df['Id'].str.contains(r'(?:\d\D)|(?:\D\d)')
Out[7]:
0 True
1 True
2 True
3 True
4 False
5 False
Name: Id, dtype: bool
For int or float columns find exceptions with
In [8]: df['rating'].str.contains(r'[^0-9.+-]')
Out[8]:
0 False
1 False
2 False
3 True
4 False
5 False
That may be problematic, it won't catch things with multiple or misplaced plus,minus, or dot characters, like 9.4.1
or 6+3.-12
. But you could use
In [11]: def check(thing):
...: try:
...: return bool(float(thing)) or thing==0
...: except ValueError:
...: return False
...:
In [12]: df['rating'].apply(check)
Out[12]:
0 True
1 True
2 True
3 False
4 True
5 True
Name: rating, dtype: bool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.