How would you return all the different strings contained in one particular column using pandas ? I have a csv containing a lot of repeated data but I know there are only about 10 different variations of the string (in the same column), I would like to return an index of all the different strings and then filter out my csv based on those strings .
for example :
2013,string A,13
2013,string A,14
2013,string B,13
2013,string C,12
2013,string A,11
2013,string B,11
How do I return this in the first place :
String A
String B
String C
and then print out only the rows containing "String A" ?
Given a frame like
>>> df
0 1 2
0 2013 string A 13
1 2013 string A 14
2 2013 string B 13
3 2013 string C 12
4 2013 string A 11
5 2013 string B 11
[6 rows x 3 columns]
You can get the unique elements of a column using .unique()
:
>>> df[1].unique()
array(['string A', 'string B', 'string C'], dtype=object)
and select matching columns using .loc
and a boolean array:
>>> df.loc[df[1] == "string A"]
0 1 2
0 2013 string A 13
1 2013 string A 14
4 2013 string A 11
[3 rows x 3 columns]
Alternatively, if you want them all, you can use groupby
on the column (here 1
, although it might be different in your frame):
>>> grouped = df.groupby(1)
>>> for k,g in grouped:
... print k
... print g
...
string A
0 1 2
0 2013 string A 13
1 2013 string A 14
4 2013 string A 11
[3 rows x 3 columns]
string B
0 1 2
2 2013 string B 13
5 2013 string B 11
[2 rows x 3 columns]
string C
0 1 2
3 2013 string C 12
[1 rows x 3 columns]
and it's straightforward to turn that into lots of other structures (eg a dictionary).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.