[英]Verifying the data format in columns for a pandas dataframe
我有一个代表数字(整数和浮点数)的字符串 dataframe。
我想实现验证以确保某些列中的字符串仅代表整数。
这是一个包含两列的 dataframe ,其中 header str as ints
, str as double
,以字符串格式表示整数和浮点数。
# Import pandas library
import pandas as pd
# initialize list elements
data = ['10','20','30','40','50','60']
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['str as ints'])
df['str as double'] = ['10.0', '20.0', '30.0', '40.0', '50.0', '60.0']
这是我写的 function,它检查字符串中的基数以确定它是 integer 还是浮点数。
def includes_dot(s):
return '.' in s
I want to see if I can use the apply function on this dataframe, or do I need to write another function where I pass in the name of the dataframe and the list of column headers and then call includes_dot
like this:
def check_df(df, lst):
for val in lst:
apply(df[val]...?)
# then print out the results if certain columns fail the check
或者,如果有更好的方法可以完全解决这个问题。
预期的 output 是不符合条件的列标题列表:如果我有一个列表['str as ints', 'str as double']
,则应打印str as double
,因为该列不包含所有整数。
for col in df:
if df[col].str.contains('\.').any():
print(col, "contains a '.'")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.