如何使用 Pandas Python 计算一行中的列数

Question

I am writing a program where I want to count the number of columns in each row as each file has a different number of columns.我正在编写一个程序，我想计算每行中的列数，因为每个文件都有不同的列数。 It means I want to check if any row is missing a cell, and if it does, then I want to highlight the cell number.这意味着我想检查是否有任何行缺少单元格，如果有，那么我想突出显示单元格编号。 I am using pandas for that to read the file.我正在使用熊猫来读取文件。 I have multiple gzip files which contain another CSV file.我有多个包含另一个CSV文件的gzip文件。 My code for reading the files:我读取文件的代码：

#running this under loop
data = pd.read_csv(files,
    compression='gzip'
    on_bad_lines='warn'
    low_memory=False,
    sep=r'|',
    header=None,
    na_values=['NULL',' ','NaN'],
    keep_default_na = False
    )

I checked StackOverflow but there's no answer related to this situation.我检查了 StackOverflow，但没有与这种情况相关的答案。 I would be really glad if someone can help me out here.如果有人可以在这里帮助我，我会非常高兴。

Answer 1

Not sure if i'm interpreting this right but if you want to count the number of columns in each pandas dataframe within a loop , there are plenty of options.不确定我是否正确解释了这一点，但如果您想计算循环内每个 pandas 数据框中的列数，有很多选择。

1) data.shape[1]
2) len(data.columns)
3) len(list(data))

Here is a minimal reproducibility code.这是一个最小的重现性代码。 Replace "data = pd.DataFrame(dict)" with "data = pd.read_csv(...)"将“data = pd.DataFrame(dict)”替换为“data = pd.read_csv(...)”

# Import Required Libraries
import pandas as pd
import numpy as np

# Create dictionaries for the dataframe
dict1 = {'Name': ['Anne', 'Bob', 'Carl'], 
         'Age': [22, 20, 22], 
         'Marks': [90, 84, 82]}

dict2 = {'Name': ['Dan', 'Ely', 'Fan'], 
         'Age': [52, 30, 12], 
         'Marks': [40, 54, 42]}

for i in [dict1, dict2]:
    # Read data
    data = pd.DataFrame(dict1)

    # Get columns
    shape = data.shape # (3,3)
    col = shape[1] # 3
  
    # Printing Number of columns
    print(f'Number of columns for file <>: {col}')

"This works fine, but after trying your suggestion I am getting the total number of columns that we have in our data frame. I want to print the number of columns each row contains. For eg: S.no Name 1 Adam 2 George 3 NULL so, 1st row will print 2, the second will be 2, but the third will print one." “这很好用，但是在尝试了您的建议后，我得到了我们数据框中的总列数。我想打印每行包含的列数。例如：S.no Name 1 Adam 2 George 3 NULL 所以，第一行将打印 2，第二行将打印 2，但第三行将打印 1。”
– Ramoxx – 拉莫克斯

Below is the updated answer(s) for your specification以下是您的规范的更新答案

Get counts of non-nulls for each row获取每行的非空计数

data.apply(lambda x: x.count(), axis=1)

data:数据：

  A   B   C
0:  1   2   3
1:  2   nan nan
2:  nan nan nan

output:输出：

0:  3
1:  1
2:  0

Add counts of non-nulls for each row into dataframe将每行的非空计数添加到数据框中

data['count'] = data.apply(lambda x: x.count(), axis=1)

result:结果：

    A   B   C   count
0:  1   1   3   3
1:  2   nan nan 1
2:  nan nan nan 0

如何使用 Pandas Python 计算一行中的列数

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-30 19:06:29

如何使用 Pandas Python 计算一行中的列数

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-30 19:06:29

解决方案1
1 已采纳 2022-05-30 19:06:29