简体   繁体   中英

Remove cells that do not meet condition in pandas

在此处输入图片说明 I want to remove races (which are less than 1%) in a county. I am using pandas. If you notice some races have values less than 1% in a county. I want to ignore those race and display races with higher populations

CensusTract State   County  TotalPop    Men   Women Hispanic    White   Black   Native  Asian   Pacific
1001020100  Alabama Autauga  1948       940   1008     0.9      87.4    7.7      0.3    0.6      0
1001020400  Alabama Autauga  4423       2172  2251     10.5     82.8    3.7      1.6    0        0

I tried this

dataset = tract_data.query("Income >= 50000 & Poverty > 50")

dataset.loc[:,'Races'] = dataset.apply(lambda row: list(zip(list(row.index) 
[6:12], list(row)[6:12])), axis =1)
dataset.loc[:,'Races'] = dataset.Races.apply(lambda x: '; '.join(['{}: 
{}'.format(t[0], t[1]) for t in  list(filter(lambda x: x[1]> 1, x))]))
income = dataset[['CensusTract', 'State', 'County','Races']]

print(dataset['Races'])

But I still have error

This is what I expect to have

CensusTract State       County  races
1001020100 Alabama Autauga White: 87.4 Black: 7.7
1001020400 Alabama Autauga Hispanic: 10.5 White: 82.8 Black: 3.7 Native: 1.6

This is one way to achieve your goal

df['Races'] = df.apply(lambda row: list(zip(list(row.index)[6:], list(row)[6:])), axis =1)
df['Races'] = df.Races.apply(lambda x: '; '.join(['{}: {}'.format(t[0], t[1]) for t in  list(filter(lambda x: x[1]> 1, x))]))

Finally, if we print df , here is what we get.

    CensusTract State   County  TotalPop    Men Women   Hispanic    White   Black   Native  Asian   Pacific Races
0   1001020100  Alabama Autauga 1948    940     1008    0.9         87.4    7.7      0.3       0.6     0.0  White: 87.4; Black: 7.7
1   1001020400  Alabama Autauga 4423    2172    2251    10.5        82.8    3.7      1.6       0.0     0.0  Hispanic: 10.5; White: 82.8; Black: 3.7; Nativ...

Here is the idea. The values we want to compare are in the 6th to the last columns. For each row, we want to show the row name as well as the value in it if the value is greater than 1. Now list(row.index) gives us the column names for that row and list(row) gives us the values in that row as list. We can zip these list to get a list of tuples [(column_name, value)] .

Then we can filter the list of tuples by key = value to contain only the tuples where value is greater than 1. After filtering, we will get a list of tuples and the rest of the work is just to format the list of tuples to display an answer in a manner that we love. To understand how the filtering is works, just try:

x = [('col1', 8), ('col2', 10), ('col3', 0.9), ('col4', 30)]
'; '.join(['{}: {}'.format(t[0], t[1]) for t in  list(filter(lambda x: x[1]> 1, x))])

The result should be;

>>> 'col1: 8; col2: 10; col4: 30'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM