简体   繁体   English

如何使用 Pandas Python 计算一行中的列数

[英]How to count the number of columns in a row using Pandas Python

I am writing a program where I want to count the number of columns in each row as each file has a different number of columns.我正在编写一个程序,我想计算每行中的列数,因为每个文件都有不同的列数。 It means I want to check if any row is missing a cell, and if it does, then I want to highlight the cell number.这意味着我想检查是否有任何行缺少单元格,如果有,那么我想突出显示单元格编号。 I am using pandas for that to read the file.我正在使用熊猫来读取文件。 I have multiple gzip files which contain another CSV file.我有多个包含另一个CSV文件的gzip文件。 My code for reading the files:我读取文件的代码:

#running this under loop
data = pd.read_csv(files,
    compression='gzip'
    on_bad_lines='warn'
    low_memory=False,
    sep=r'|',
    header=None,
    na_values=['NULL',' ','NaN'],
    keep_default_na = False
    )

I checked StackOverflow but there's no answer related to this situation.我检查了 StackOverflow,但没有与这种情况相关的答案。 I would be really glad if someone can help me out here.如果有人可以在这里帮助我,我会非常高兴。

Not sure if i'm interpreting this right but if you want to count the number of columns in each pandas dataframe within a loop , there are plenty of options.不确定我是否正确解释了这一点,但如果您想计算循环内每个 pandas 数据框中的列数,有很多选择。

1) data.shape[1]
2) len(data.columns)
3) len(list(data))

Here is a minimal reproducibility code.这是一个最小的重现性代码。 Replace "data = pd.DataFrame(dict)" with "data = pd.read_csv(...)"将“data = pd.DataFrame(dict)”替换为“data = pd.read_csv(...)”

# Import Required Libraries
import pandas as pd
import numpy as np

# Create dictionaries for the dataframe
dict1 = {'Name': ['Anne', 'Bob', 'Carl'], 
         'Age': [22, 20, 22], 
         'Marks': [90, 84, 82]}

dict2 = {'Name': ['Dan', 'Ely', 'Fan'], 
         'Age': [52, 30, 12], 
         'Marks': [40, 54, 42]}

for i in [dict1, dict2]:
    # Read data
    data = pd.DataFrame(dict1)

    # Get columns
    shape = data.shape # (3,3)
    col = shape[1] # 3
  
    # Printing Number of columns
    print(f'Number of columns for file <>: {col}')

"This works fine, but after trying your suggestion I am getting the total number of columns that we have in our data frame. I want to print the number of columns each row contains. For eg: S.no Name 1 Adam 2 George 3 NULL so, 1st row will print 2, the second will be 2, but the third will print one." “这很好用,但是在尝试了您的建议后,我得到了我们数据框中的总列数。我想打印每行包含的列数。例如:S.no Name 1 Adam 2 George 3 NULL 所以,第一行将打印 2,第二行将打印 2,但第三行将打印 1。”
– Ramoxx – 拉莫克斯

Below is the updated answer(s) for your specification以下是您的规范的更新答案

Get counts of non-nulls for each row获取每行的非空计数

data.apply(lambda x: x.count(), axis=1)

data:数据:

  A   B   C
0:  1   2   3
1:  2   nan nan
2:  nan nan nan

output:输出:

0:  3
1:  1
2:  0

Add counts of non-nulls for each row into dataframe将每行的非空计数添加到数据框中

data['count'] = data.apply(lambda x: x.count(), axis=1)

result:结果:

    A   B   C   count
0:  1   1   3   3
1:  2   nan nan 1
2:  nan nan nan 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM