[英]Iterating over Rows and Columns to add counts in Pandas
I am trying to iterate over columns AND rows in Pandas to cross-reference a list I have and count the cooccurrences. 我正在尝试遍历Pandas中的列和行以交叉引用我拥有的列表并计算共现次数。
My dataframe looks like: 我的数据框看起来像:
+-------+-----+-----+----+----+-------+-------+------+
| Lemma | Dog | Cat | Sg | Pl | Good | Okay | Bad |
+-------+-----+-----+----+----+-------+-------+------+
| Dog | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Cat | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-------+-----+-----+----+----+-------+-------+------+
I have a list like: 我有一个类似的清单:
c=[[dog, Sg, Good], [cat, Pl, Okay], [dog, Pl, Bad]
I want to go through every item in Lemma
, find it in c
and then for that list item look for any of the column names. 我想遍历
Lemma
每个项目,在c
找到它,然后为该列表项目查找任何列名。 If those column names are seen, I was to add +1. 如果看到这些列名,我将添加+1。 I also want to add a count if the Lemma items occur in a 3 word window of each other.
如果引理项出现在彼此的3个单词窗口中,我还想添加一个计数。
I've tried something like the following (ignoring the word window issue): 我已经尝试过类似以下操作(忽略单词窗口问题):
for idx, row in df.iterrows():
for columns in df:
for i in c:
if i[0]==row:
if columns in c[1]:
df.ix['columns','row'] +=1
But I get the error: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." 但是我得到了一个错误:“ ValueError:系列的真值不明确。请使用a.empty,a.bool(),a.item(),a.any()或a.all()。”
My ideal results look like: 我的理想结果如下:
+-------+-----+-----+----+----+-------+-------+------+
| Lemma | Dog | Cat | Sg | Pl | Good | Okay | Bad |
+-------+-----+-----+----+----+-------+-------+------+
| Dog | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| Cat | 2 | 0 | 0 | 1 | 0 | 1 | 0 |
+-------+-----+-----+----+----+-------+-------+------+
Thanks! 谢谢!
You have several things that need to be changed. 您有几件事需要更改。
1) Your list probably needs to have Dog
instead of dog
, Cat
instead of cat
1)您的列表可能需要用
Dog
代替dog
,用Cat
代替cat
2) You probably want: for column in df.columns
instead of for columns in df
2)您可能想要:
for column in df.columns
中for columns in df
而不是for columns in df
中for columns in df
3) You probably want: if i[0] == row['Lemma']
instead of if i[0]==row:
(this is where it was breaking 3)您可能想要:
if i[0] == row['Lemma']
而不是if i[0]==row:
这是中断的地方
4) You probably want if column in i
instead of if columns in c[1]
4)您可能想要
if column in i
的if columns in c[1]
而不是if columns in c[1]
的if columns in c[1]
5) You probably want df.ix[idx, column] += 1
instead of df.ix['columns','row'] +=1
5)您可能希望
df.ix[idx, column] += 1
而不是df.ix['columns','row'] +=1
cat
in the dog
column and vise versa. dog
栏里绝对不能有cat
,反之亦然。 DataFrame
, I'd unpack the list
of lists
into a dict
then load the dict
into a DataFrame
, as shown below. DataFrame
,我解开list
中lists
到dict
那么加载dict
入DataFrame
,如下图所示。 import pandas as pd
c=[['dog', 'Sg', 'Good'], ['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Bad'],
['dog', 'Sg', 'Good'], ['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Okay'],
['dog', 'Sg', 'Good'], ['cat', 'Sg', 'Good'], ['dog', 'Pl', 'Bad'],
['dog', 'Sg', 'Good'],['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Bad']]
Lemma = {'dog': {'dog': 0, 'Sg': 0, 'Pl': 0, 'Good': 0, 'Okay': 0, 'Bad': 0},
'cat': {'cat': 0, 'Sg': 0, 'Pl': 0, 'Good': 0, 'Okay': 0, 'Bad': 0}}
Note: Each value in a list
from c
is a key
in Lemma
. 注意:
c
list
每个值都是Lemma
的key
。 Reference python dictionaries . 参考python字典 。 eg With
x = ['dog', 'Sg', 'Good']
, Lemma[x[0]][x[2]]
is the same as Lemma['dog']['Good']
. 例如,当
x = ['dog', 'Sg', 'Good']
, Lemma[x[0]][x[2]]
与Lemma['dog']['Good']
。 The initial value of Lemma['dog']['Good']
= 0, therefore Lemma['dog']['Good']
= 0 + 1, then next time it would be 1 + 1, etc. Lemma['dog']['Good']
的初始值= 0,因此Lemma['dog']['Good']
= 0 + 1,然后下一次将是1 + 1,依此类推。
for x in c:
Lemma[x[0]][x[0]] = Lemma[x[0]][x[0]] + 1
Lemma[x[0]][x[1]] = Lemma[x[0]][x[1]] + 1
Lemma[x[0]][x[2]] = Lemma[x[0]][x[2]] + 1
df = pd.DataFrame.from_dict(Lemma, orient='index')
df.plot(kind='bar', figsize=(6, 6))
dict
programmatically: dict
: sets
of words for the dict
keys
from the list
of lists
: sets
的话dict
keys
从list
中lists
: outer_keys = set()
inner_keys = set()
for x in c:
outer_keys.add(x[0]) # first word is outer key
inner_keys |= set(x[1:]) # all other words
dict
of dicts
: dict
的dicts
: Lemma = {j: dict.fromkeys(inner_keys | {j}, 0) for j in outer_keys}
dict
: dict
: {'dog': {'Okay': 0, 'Pl': 0, 'Good': 0, 'Bad': 0, 'Sg': 0, 'dog': 0},
'cat': {'Okay': 0, 'Pl': 0, 'Good': 0, 'Bad': 0, 'Sg': 0, 'cat': 0}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.