如何计算两列中唯一字符串的数量？

Question

I have a DataFrame with two columns containing strings, like: 我有一个包含两列包含字符串的DataFrame，例如：

col1 --- col2 col1 --- col2
Ernst --- Jim 恩斯特-吉姆
Peter --- Ernst 彼得-恩斯特
Bill --- NaN 比尔-NaN
NaN --- Doug NaN ---道格
Jim --- Jake 吉姆-杰克

Now I want to create a new DataFrame with a list of unique strings in the first column and in the second column the number of occurrences of each string in the 2 original columns, like: 现在，我想创建一个新的DataFrame，第一列中包含一个唯一字符串列表，第二列中的两个原始列中每个字符串的出现次数，例如：

str --- occurences str --- 发生
Ernst --- 2 恩斯特-2
Peter --- 1 彼得--- 1
Bill --- 1 比尔--- 1
Jim --- 2 吉姆-2
Jake --- 1 杰克--- 1
Doug --- 1 道格-1

How do I do that in the most efficient way? 如何以最有效的方式做到这一点？ Thanks! 谢谢！

Answer 1

First combine your original two columns into one: 首先将原始的两列合并为一个：

In [127]: s = pd.concat([df.col1, df.col2], ignore_index=True)

In [128]: s
Out[128]: 
0    Ernst
1    Peter
2     Bill
3      NaN
4      Jim
5      Jim
6    Ernst
7      NaN
8     Doug
9     Jake
dtype: object

and then use value_counts : 然后使用value_counts ：

In [129]: s.value_counts()
Out[129]: 
Ernst    2
Jim      2
Bill     1
Doug     1
Jake     1
Peter    1
dtype: int64

Answer 2

I'd do that way (assuming you taking the data from a file your_file.txt and you want to print out the result): 我会这样做（假设您从文件your_file.txt获取数据，并且您想打印出结果）：

from collections import Counter;

separator = ' --- '
with open('your_file.txt') as f:
    content = f.readlines()  # here you got a list of elements corresponding to the lines
    people = separator.join(content).split(separator) # here you got a list of all elements
    people_count = Counter(people) # you got here a dict-like object with key=name value=count
    for name, val in people_count.iteritems():
        # print the column the way you want
        print '{name}{separator}{value}'.format(name=name, separator=separator, value=val)

The example use the Counter object which allows you to efficiently count element from an iterable. 该示例使用Counter对象，该对象使您可以从可迭代对象中有效地计数元素。 the rest of the code is only string manipulation. 其余代码仅是字符串操作。

Answer 3

Try this: 尝试这个：

df = pd.DataFrame({"col1" : ["Ernst", "Peter","Bill",np.nan,"Jim"],
 "col2" : ["Jim","Ernst",np.nan,"Doug","Jake"]})
print df
df1 = df.groupby("col1")["col1"].count()
df2 = df.groupby("col2")["col2"].count()
print df1.add(df2,fill_value=0)

如何计算两列中唯一字符串的数量？

问题描述

3 个解决方案

解决方案1
7 已采纳 2014-01-20 17:25:26

解决方案2
0 2014-01-20 17:21:10

解决方案3
0 2014-01-20 17:27:47

如何计算两列中唯一字符串的数量？

问题描述

3 个解决方案

解决方案1 7 已采纳 2014-01-20 17:25:26

解决方案2 0 2014-01-20 17:21:10

解决方案3 0 2014-01-20 17:27:47

解决方案1
7 已采纳 2014-01-20 17:25:26

解决方案2
0 2014-01-20 17:21:10

解决方案3
0 2014-01-20 17:27:47