[英]Indexing column values in pandas
How would you return all the different strings contained in one particular column using pandas ? 您如何使用pandas返回包含在一个特定列中的所有不同字符串? I have a csv containing a lot of repeated data but I know there are only about 10 different variations of the string (in the same column), I would like to return an index of all the different strings and then filter out my csv based on those strings .
我有一个包含大量重复数据的csv,但我知道该字符串只有大约10种不同的变体(在同一列中),我想返回所有不同字符串的索引,然后根据以下内容过滤出我的csv那些弦。
for example : 例如 :
2013,string A,13
2013,string A,14
2013,string B,13
2013,string C,12
2013,string A,11
2013,string B,11
How do I return this in the first place : 我如何首先将其退回:
String A
String B
String C
and then print out only the rows containing "String A" ? 然后仅打印出包含“字符串A”的行?
Given a frame like 给定一个像
>>> df
0 1 2
0 2013 string A 13
1 2013 string A 14
2 2013 string B 13
3 2013 string C 12
4 2013 string A 11
5 2013 string B 11
[6 rows x 3 columns]
You can get the unique elements of a column using .unique()
: 您可以使用
.unique()
获得列的唯一元素:
>>> df[1].unique()
array(['string A', 'string B', 'string C'], dtype=object)
and select matching columns using .loc
and a boolean array: 并使用
.loc
和布尔数组选择匹配的列:
>>> df.loc[df[1] == "string A"]
0 1 2
0 2013 string A 13
1 2013 string A 14
4 2013 string A 11
[3 rows x 3 columns]
Alternatively, if you want them all, you can use groupby
on the column (here 1
, although it might be different in your frame): 另外,如果您希望全部使用它们,则可以在该列上使用
groupby
(此处为1
,尽管在您的框架中可能有所不同):
>>> grouped = df.groupby(1)
>>> for k,g in grouped:
... print k
... print g
...
string A
0 1 2
0 2013 string A 13
1 2013 string A 14
4 2013 string A 11
[3 rows x 3 columns]
string B
0 1 2
2 2013 string B 13
5 2013 string B 11
[2 rows x 3 columns]
string C
0 1 2
3 2013 string C 12
[1 rows x 3 columns]
and it's straightforward to turn that into lots of other structures (eg a dictionary). 而且很容易将其转换为许多其他结构(例如字典)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.