I use package "pandas" for python. And i have a question. I have DataFrame like this:
| first | last | datr |city|
|Zahir |Petersen|22.11.15|9 |
|Zahir |Petersen|22.11.15|2 |
|Mason |Sellers |10.04.16|4 |
|Gannon |Cline |29.10.15|2 |
|Craig |Sampson |20.04.16|2 |
|Craig |Sampson |20.04.16|4 |
|Cameron |Mathis |09.05.15|6 |
|Adam |Hurley |16.04.16|2 |
|Brock |Vaughan |14.04.16|10 |
|Xanthus |Murray |30.03.15|6 |
|Xanthus |Murray |30.03.15|7 |
|Xanthus |Murray |30.03.15|4 |
|Palmer |Caldwell|31.10.15|2 |
I want create pivot_table by fields ['first', 'last', 'datr'], but display ['first', 'last', 'datr','city'] where count of record by ['first', 'last', 'datr'] more than one, like this:
| first | last | datr |city|
|Zahir |Petersen|22.11.15|9 | 2
| | | |2 | 2
|Craig |Sampson |20.04.16|2 | 2
| | | |4 | 2
|Xanthus |Murray |30.03.15|6 | 3
| | | |7 | 3
| | | |4 | 3
UPD. If i groupby three fields from four, than
df['count'] = df.groupby(['first','last','datr']).transform('count')
is work, but if count of all columns - columns for "groupby" > 1 than this code throw error. For example(all columns - 4('first','last', 'datr', 'city'), columns for groupby - 2('first','last'), 4-2 = 2:
In [181]: df['count'] = df.groupby(['first','last']).transform('count')
...
ValueError: Wrong number of items passed 2, placement implies 1
You can do this with groupby
. Group by the three columns (first, last and datr), and then count the number of elements in each group:
In [63]: df['count'] = df.groupby(['first', 'last', 'datr']).transform('count')
In [64]: df
Out[64]:
first last datr city count
0 Zahir Petersen 22.11.15 9 2
1 Zahir Petersen 22.11.15 2 2
2 Mason Sellers 10.04.16 4 1
3 Gannon Cline 29.10.15 2 1
4 Craig Sampson 20.04.16 2 2
5 Craig Sampson 20.04.16 4 2
6 Cameron Mathis 09.05.15 6 1
7 Adam Hurley 16.04.16 2 1
8 Brock Vaughan 14.04.16 10 1
9 Xanthus Murray 30.03.15 6 3
10 Xanthus Murray 30.03.15 7 3
11 Xanthus Murray 30.03.15 4 3
12 Palmer Caldwell 31.10.15 2 1
From there, you can filter the frame:
In [65]: df[df['count'] > 1]
Out[65]:
first last datr city count
0 Zahir Petersen 22.11.15 9 2
1 Zahir Petersen 22.11.15 2 2
4 Craig Sampson 20.04.16 2 2
5 Craig Sampson 20.04.16 4 2
9 Xanthus Murray 30.03.15 6 3
10 Xanthus Murray 30.03.15 7 3
11 Xanthus Murray 30.03.15 4 3
And if you want these columns as the index (as in the example output in your question): df.set_index(['first', 'last', 'datr'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.