索引熊猫中的列值

Question

How would you return all the different strings contained in one particular column using pandas ? 您如何使用pandas返回包含在一个特定列中的所有不同字符串？ I have a csv containing a lot of repeated data but I know there are only about 10 different variations of the string (in the same column), I would like to return an index of all the different strings and then filter out my csv based on those strings . 我有一个包含大量重复数据的csv，但我知道该字符串只有大约10种不同的变体（在同一列中），我想返回所有不同字符串的索引，然后根据以下内容过滤出我的csv那些弦。

for example : 例如：

2013,string A,13
2013,string A,14
2013,string B,13
2013,string C,12
2013,string A,11
2013,string B,11

How do I return this in the first place : 我如何首先将其退回：

String A
String B
String C

and then print out only the rows containing "String A" ? 然后仅打印出包含“字符串A”的行？

Answer 1

Given a frame like 给定一个像

>>> df
      0         1   2
0  2013  string A  13
1  2013  string A  14
2  2013  string B  13
3  2013  string C  12
4  2013  string A  11
5  2013  string B  11

[6 rows x 3 columns]

You can get the unique elements of a column using .unique() : 您可以使用.unique()获得列的唯一元素：

>>> df[1].unique()
array(['string A', 'string B', 'string C'], dtype=object)

and select matching columns using .loc and a boolean array: 并使用.loc和布尔数组选择匹配的列：

>>> df.loc[df[1] == "string A"]
      0         1   2
0  2013  string A  13
1  2013  string A  14
4  2013  string A  11

[3 rows x 3 columns]

Alternatively, if you want them all, you can use groupby on the column (here 1 , although it might be different in your frame): 另外，如果您希望全部使用它们，则可以在该列上使用groupby （此处为1 ，尽管在您的框架中可能有所不同）：

>>> grouped = df.groupby(1)
>>> for k,g in grouped:
...     print k
...     print g
...     
string A
      0         1   2
0  2013  string A  13
1  2013  string A  14
4  2013  string A  11

[3 rows x 3 columns]
string B
      0         1   2
2  2013  string B  13
5  2013  string B  11

[2 rows x 3 columns]
string C
      0         1   2
3  2013  string C  12

[1 rows x 3 columns]

and it's straightforward to turn that into lots of other structures (eg a dictionary). 而且很容易将其转换为许多其他结构（例如字典）。

索引熊猫中的列值

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-01-02 04:28:08

索引熊猫中的列值

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-01-02 04:28:08

解决方案1
3 已采纳 2014-01-02 04:28:08