熊猫计算行内填充单元格的数量

Question

I have a large dataset with columns labelled from 1 - 65 (among other titled columns), and want to find how many of the columns, per row, have a string (of any value) in them.我有一个大型数据集，其中的列标记为 1 - 65（以及其他标题列），并且想找出每行有多少列包含字符串（具有任何值）。 For example, if all rows 1 - 65 are filled, the count should be 65 in this particular row, if only 10 are filled then the count should be 10.例如，如果第 1 - 65 行都已填充，则该特定行中的计数应为 65，如果仅填充 10，则计数应为 10。

Is there any easy way to do this?有什么简单的方法可以做到这一点吗？ I'm currently using the following code, which is taking very long as there are a large number of rows.我目前正在使用以下代码，由于有大量行，因此需要很长时间。

array = pd.read_csv(csvlocation, encoding = "ISO-8859-1")

for i in range (0, lengthofarray)
    for k in range(1,66):
        if array[k][i]!="":
            array["count"][i]=array["count"][i]+1

Answer 1

From my understanding of the post and the subsequent comments, you are interested in knowing the number of strings in each row for columns labels 1 through 65. There are two steps, the first is to subset your data down to columns 1 through 65, and then the following is the count the number of strings in each row.根据我对这篇文章和后续评论的理解，您有兴趣知道列标签 1 到 65 中每行的字符串数。有两个步骤，第一步是将数据子集划分到第 1 到 65 列，然后然后下面是计算每行中字符串的数量。 To do this:要做到这一点：

import pandas as pd
import numpy as np

# create sample data
df = pd.DataFrame({'col1': list('abdecde'),
                   'col2': np.random.rand(7)})

# change one val of column two to string for illustration purposes    
df.loc[3, 'col2'] = 'b'

# to create the subset of columns, you could use 
# subset = [str(num) for num in list(range(1, 66))]
# and then just use df[subset]

# for each row, count the number of columns that have a string value
# applymap operates elementwise, so we are essentially creating 
# a new representation of your data in place, where a 1 represents a 
# string value was there, and a 0 represent not a string.
# we then sum along the rows to get the final counts
col_str_counts = np.sum(df.applymap(lambda x: 1 if isinstance(x, str) else 0), axis=1)

# we changed the column two value above, so to check that the count is 2 for that row idx:
col_str_counts[3]
>>> 2

# and for the subset, it would simply become:
# col_str_counts = np.sum(df[subset].applymap(lambda x: 1 if isinstance(x, str) else 0), axis=1)

Answer 2

You should be able to adapt your problem to this example您应该能够使您的问题适应此示例

Say we have this dataframe假设我们有这个数据框

df = pd.DataFrame([["","foo","bar"],["","","bar"],["","",""],["foo","bar","bar"]])

     0    1    2
0       foo  bar
1            bar
2               
3  foo  bar  bar

Then we create a boolean mask where a cell != "" and sum those values然后我们创建一个布尔掩码，其中一个单元格!= ""并将这些值相加

df['count'] = (df != "").sum(1)
print(df)

     0    1    2  count
0       foo  bar      2
1            bar      1
2                     0
3  foo  bar  bar      3

Answer 3

df = pandas.DataFrame([["","foo","bar"],["","","bar"],["","",""],["foo","bar","bar"]])
total_cells = df.size
df['filled_cell_count'] = (df != "").sum(1)
print(f"{df}")
     0    1    2  filled_cell_count
0       foo  bar                  2
1            bar                  1
2                                 0
3  foo  bar  bar                  3

total_filled_cells = df['filled_cell_count'].sum()/total_cells
print()
print(f"Total Filled Cells in dataframe: {total_filled_cells}")
Total Filled Cells in dataframe: 0.5

熊猫计算行内填充单元格的数量

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-02-18 19:11:38

解决方案2
2 2018-02-18 20:02:43

解决方案3
0 2020-10-02 07:45:32

熊猫计算行内填充单元格的数量

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-02-18 19:11:38

解决方案2 2 2018-02-18 20:02:43

解决方案3 0 2020-10-02 07:45:32

解决方案1
2 已采纳 2018-02-18 19:11:38

解决方案2
2 2018-02-18 20:02:43

解决方案3
0 2020-10-02 07:45:32