简体   繁体   English

熊猫计算行内填充单元格的数量

[英]pandas count number of filled cells within row

I have a large dataset with columns labelled from 1 - 65 (among other titled columns), and want to find how many of the columns, per row, have a string (of any value) in them.我有一个大型数据集,其中的列标记为 1 - 65(以及其他标题列),并且想找出每行有多少列包含字符串(具有任何值)。 For example, if all rows 1 - 65 are filled, the count should be 65 in this particular row, if only 10 are filled then the count should be 10.例如,如果第 1 - 65 行都已填充,则该特定行中的计数应为 65,如果仅填充 10,则计数应为 10。

Is there any easy way to do this?有什么简单的方法可以做到这一点吗? I'm currently using the following code, which is taking very long as there are a large number of rows.我目前正在使用以下代码,由于有大量行,因此需要很长时间。

array = pd.read_csv(csvlocation, encoding = "ISO-8859-1")

for i in range (0, lengthofarray)
    for k in range(1,66):
        if array[k][i]!="":
            array["count"][i]=array["count"][i]+1

From my understanding of the post and the subsequent comments, you are interested in knowing the number of strings in each row for columns labels 1 through 65. There are two steps, the first is to subset your data down to columns 1 through 65, and then the following is the count the number of strings in each row.根据我对这篇文章和后续评论的理解,您有兴趣知道列标签 1 到 65 中每行的字符串数。有两个步骤,第一步是将数据子集划分到第 1 到 65 列,然后然后下面是计算每行中字符串的数量。 To do this:要做到这一点:

import pandas as pd
import numpy as np

# create sample data
df = pd.DataFrame({'col1': list('abdecde'),
                   'col2': np.random.rand(7)})

# change one val of column two to string for illustration purposes    
df.loc[3, 'col2'] = 'b'

# to create the subset of columns, you could use 
# subset = [str(num) for num in list(range(1, 66))]
# and then just use df[subset]

# for each row, count the number of columns that have a string value
# applymap operates elementwise, so we are essentially creating 
# a new representation of your data in place, where a 1 represents a 
# string value was there, and a 0 represent not a string.
# we then sum along the rows to get the final counts
col_str_counts = np.sum(df.applymap(lambda x: 1 if isinstance(x, str) else 0), axis=1)

# we changed the column two value above, so to check that the count is 2 for that row idx:
col_str_counts[3]
>>> 2

# and for the subset, it would simply become:
# col_str_counts = np.sum(df[subset].applymap(lambda x: 1 if isinstance(x, str) else 0), axis=1)

You should be able to adapt your problem to this example您应该能够使您的问题适应此示例

Say we have this dataframe假设我们有这个数据框

df = pd.DataFrame([["","foo","bar"],["","","bar"],["","",""],["foo","bar","bar"]])

     0    1    2
0       foo  bar
1            bar
2               
3  foo  bar  bar

Then we create a boolean mask where a cell != "" and sum those values然后我们创建一个布尔掩码,其中一个单元格!= ""并将这些值相加

df['count'] = (df != "").sum(1)
print(df)

     0    1    2  count
0       foo  bar      2
1            bar      1
2                     0
3  foo  bar  bar      3
df = pandas.DataFrame([["","foo","bar"],["","","bar"],["","",""],["foo","bar","bar"]])
total_cells = df.size
df['filled_cell_count'] = (df != "").sum(1)
print(f"{df}")
     0    1    2  filled_cell_count
0       foo  bar                  2
1            bar                  1
2                                 0
3  foo  bar  bar                  3

total_filled_cells = df['filled_cell_count'].sum()/total_cells
print()
print(f"Total Filled Cells in dataframe: {total_filled_cells}")
Total Filled Cells in dataframe: 0.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM