遍历行和列以在Pandas中添加计数

Question

I am trying to iterate over columns AND rows in Pandas to cross-reference a list I have and count the cooccurrences. 我正在尝试遍历Pandas中的列和行以交叉引用我拥有的列表并计算共现次数。

My dataframe looks like: 我的数据框看起来像：

+-------+-----+-----+----+----+-------+-------+------+
| Lemma | Dog | Cat | Sg | Pl |  Good |  Okay |  Bad |
+-------+-----+-----+----+----+-------+-------+------+
| Dog   |   0 |   0 |  0 |  0 |   0   |   0   |  0   |
| Cat   |   0 |   0 |  0 |  0 |   0   |   0   |  0   |
+-------+-----+-----+----+----+-------+-------+------+

I have a list like: 我有一个类似的清单：

c=[[dog, Sg, Good], [cat, Pl, Okay], [dog, Pl, Bad]

I want to go through every item in Lemma , find it in c and then for that list item look for any of the column names. 我想遍历Lemma每个项目，在c找到它，然后为该列表项目查找任何列名。 If those column names are seen, I was to add +1. 如果看到这些列名，我将添加+1。 I also want to add a count if the Lemma items occur in a 3 word window of each other. 如果引理项出现在彼此的3个单词窗口中，我还想添加一个计数。

I've tried something like the following (ignoring the word window issue): 我已经尝试过类似以下操作（忽略单词窗口问题）：

for idx, row in df.iterrows():
    for columns in df:
        for i in c:
            if i[0]==row:
                if columns in c[1]:
                    df.ix['columns','row'] +=1

But I get the error: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." 但是我得到了一个错误：“ ValueError：系列的真值不明确。请使用a.empty，a.bool（），a.item（），a.any（）或a.all（）。”

My ideal results look like: 我的理想结果如下：

+-------+-----+-----+----+----+-------+-------+------+
| Lemma | Dog | Cat | Sg | Pl |  Good |  Okay |  Bad |
+-------+-----+-----+----+----+-------+-------+------+
| Dog   |   1 |   1 |  1 |  1 |   1   |   0   |  1   |
| Cat   |   2 |   0 |  0 |  1 |   0   |   1   |  0   |
+-------+-----+-----+----+----+-------+-------+------+

Thanks! 谢谢！

Answer 1

You have several things that need to be changed. 您有几件事需要更改。

1) Your list probably needs to have Dog instead of dog , Cat instead of cat 1）您的列表可能需要用Dog代替dog ，用Cat代替cat

2) You probably want: for column in df.columns instead of for columns in df 2）您可能想要： for column in df.columns中for columns in df而不是for columns in df中for columns in df

3) You probably want: if i[0] == row['Lemma'] instead of if i[0]==row: (this is where it was breaking 3）您可能想要： if i[0] == row['Lemma']而不是if i[0]==row:这是中断的地方

4) You probably want if column in i instead of if columns in c[1] 4）您可能想要if column in i的if columns in c[1]而不是if columns in c[1]的if columns in c[1]

5) You probably want df.ix[idx, column] += 1 instead of df.ix['columns','row'] +=1 5）您可能希望df.ix[idx, column] += 1而不是df.ix['columns','row'] +=1

Answer 2

The ideal result shown in the question is not accurate. 问题中显示的理想结果不准确。 There should never be a cat in the dog column and vise versa. dog栏里绝对不能有cat ，反之亦然。
I wouldn't iterate through the DataFrame , I'd unpack the list of lists into a dict then load the dict into a DataFrame , as shown below. 我不会通过重复DataFrame ，我解开list中lists到dict那么加载dict入DataFrame ，如下图所示。

Code: 码：

import pandas as pd

c=[['dog', 'Sg', 'Good'], ['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Bad'],
   ['dog', 'Sg', 'Good'], ['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Okay'],
   ['dog', 'Sg', 'Good'], ['cat', 'Sg', 'Good'], ['dog', 'Pl', 'Bad'],
   ['dog', 'Sg', 'Good'],['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Bad']]

Lemma = {'dog': {'dog': 0, 'Sg': 0, 'Pl': 0, 'Good': 0, 'Okay': 0, 'Bad': 0},
         'cat': {'cat': 0, 'Sg': 0, 'Pl': 0, 'Good': 0, 'Okay': 0, 'Bad': 0}}

Note: Each value in a list from c is a key in Lemma . 注意： c list每个值都是Lemma的key 。 Reference python dictionaries . 参考python字典。 eg With x = ['dog', 'Sg', 'Good'] , Lemma[x[0]][x[2]] is the same as Lemma['dog']['Good'] . 例如，当x = ['dog', 'Sg', 'Good'] ， Lemma[x[0]][x[2]]与Lemma['dog']['Good'] 。 The initial value of Lemma['dog']['Good'] = 0, therefore Lemma['dog']['Good'] = 0 + 1, then next time it would be 1 + 1, etc. Lemma['dog']['Good']的初始值= 0，因此Lemma['dog']['Good'] = 0 + 1，然后下一次将是1 + 1，依此类推。

for x in c:
    Lemma[x[0]][x[0]] = Lemma[x[0]][x[0]] + 1
    Lemma[x[0]][x[1]] = Lemma[x[0]][x[1]] + 1
    Lemma[x[0]][x[2]] = Lemma[x[0]][x[2]] + 1

df = pd.DataFrame.from_dict(Lemma, orient='index')

Output: 输出：

Plot 情节

df.plot(kind='bar', figsize=(6, 6))

Create the `dict` programmatically: 以编程方式创建`dict` ：

create `sets` of words for the `dict` `keys` from the `list` of `lists` : 创建`sets`的话`dict` `keys`从`list`中`lists` ：

outer_keys = set()
inner_keys = set()
for x in c:
    outer_keys.add(x[0])  # first word is outer key
    inner_keys |= set(x[1:])  # all other words

create `dict` of `dicts` : 创建`dict`的`dicts` ：

Lemma = {j: dict.fromkeys(inner_keys | {j}, 0) for j in outer_keys}

final `dict` : 最后的`dict` ：

{'dog': {'Okay': 0, 'Pl': 0, 'Good': 0, 'Bad': 0, 'Sg': 0, 'dog': 0},
 'cat': {'Okay': 0, 'Pl': 0, 'Good': 0, 'Bad': 0, 'Sg': 0, 'cat': 0}}

遍历行和列以在Pandas中添加计数

问题描述

2 个解决方案

解决方案1
0 2019-08-06 17:13:03

解决方案2
0 已采纳 2019-08-06 17:27:19

Code: 码：

Output: 输出：

Plot 情节

Create the `dict` programmatically: 以编程方式创建`dict` ：

create `sets` of words for the `dict` `keys` from the `list` of `lists` : 创建`sets`的话`dict` `keys`从`list`中`lists` ：

create `dict` of `dicts` : 创建`dict`的`dicts` ：

final `dict` : 最后的`dict` ：

遍历行和列以在Pandas中添加计数

问题描述

2 个解决方案

解决方案1 0 2019-08-06 17:13:03

解决方案2 0 已采纳 2019-08-06 17:27:19

Code: 码：

Output: 输出：

Plot 情节

Create the dict programmatically: 以编程方式创建dict ：

create sets of words for the dict keys from the list of lists : 创建sets的话dict keys从list中lists ：

create dict of dicts : 创建dict的dicts ：

final dict : 最后的dict ：

解决方案1
0 2019-08-06 17:13:03

解决方案2
0 已采纳 2019-08-06 17:27:19

Create the `dict` programmatically: 以编程方式创建`dict` ：

create `sets` of words for the `dict` `keys` from the `list` of `lists` : 创建`sets`的话`dict` `keys`从`list`中`lists` ：

create `dict` of `dicts` : 创建`dict`的`dicts` ：

final `dict` : 最后的`dict` ：