验证 pandas dataframe 的列中的数据格式

Question

我有一个代表数字（整数和浮点数）的字符串 dataframe。

我想实现验证以确保某些列中的字符串仅代表整数。

这是一个包含两列的 dataframe ，其中 header str as ints ， str as double ，以字符串格式表示整数和浮点数。

# Import pandas library
import pandas as pd

# initialize list elements
data = ['10','20','30','40','50','60']

# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['str as ints'])
df['str as double'] = ['10.0', '20.0', '30.0', '40.0', '50.0', '60.0']

这是我写的 function，它检查字符串中的基数以确定它是 integer 还是浮点数。

def includes_dot(s):
    return '.' in s

I want to see if I can use the apply function on this dataframe, or do I need to write another function where I pass in the name of the dataframe and the list of column headers and then call includes_dot like this:

def check_df(df, lst):
    for val in lst:
        apply(df[val]...?)
    # then print out the results if certain columns fail the check

或者，如果有更好的方法可以完全解决这个问题。

预期的 output 是不符合条件的列标题列表：如果我有一个列表['str as ints', 'str as double'] ，则应打印str as double ，因为该列不包含所有整数。

Answer 1

for col in df:
    if df[col].str.contains('\.').any():
        print(col, "contains a '.'")

验证 pandas dataframe 的列中的数据格式

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-04 15:31:38

验证 pandas dataframe 的列中的数据格式

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-04 15:31:38

解决方案1
1 已采纳 2022-08-04 15:31:38