简体   繁体   中英

Indexing column values in pandas

How would you return all the different strings contained in one particular column using pandas ? I have a csv containing a lot of repeated data but I know there are only about 10 different variations of the string (in the same column), I would like to return an index of all the different strings and then filter out my csv based on those strings .

for example :

2013,string A,13
2013,string A,14
2013,string B,13
2013,string C,12
2013,string A,11
2013,string B,11

How do I return this in the first place :

String A
String B
String C

and then print out only the rows containing "String A" ?

Given a frame like

>>> df
      0         1   2
0  2013  string A  13
1  2013  string A  14
2  2013  string B  13
3  2013  string C  12
4  2013  string A  11
5  2013  string B  11

[6 rows x 3 columns]

You can get the unique elements of a column using .unique() :

>>> df[1].unique()
array(['string A', 'string B', 'string C'], dtype=object)

and select matching columns using .loc and a boolean array:

>>> df.loc[df[1] == "string A"]
      0         1   2
0  2013  string A  13
1  2013  string A  14
4  2013  string A  11

[3 rows x 3 columns]

Alternatively, if you want them all, you can use groupby on the column (here 1 , although it might be different in your frame):

>>> grouped = df.groupby(1)
>>> for k,g in grouped:
...     print k
...     print g
...     
string A
      0         1   2
0  2013  string A  13
1  2013  string A  14
4  2013  string A  11

[3 rows x 3 columns]
string B
      0         1   2
2  2013  string B  13
5  2013  string B  11

[2 rows x 3 columns]
string C
      0         1   2
3  2013  string C  12

[1 rows x 3 columns]

and it's straightforward to turn that into lots of other structures (eg a dictionary).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM