简体   繁体   中英

python how to count dummy variable in if statement

I have a dataframe "total" including columns like:

  • latitude(lat)
  • longitude(lon)
  • f2(ex ) 1950-12-06, 1959-08-01,...) that represents some day
  • F_chicken(1:chicken, 0: not chicken, dummy variable)

I want to count the number of F_chicken which some row 'lat' & 'lon' are the same and some row 'f2' is smaller.

我的 dataframe.head()]

I try to make this c_chicken column using for loop, but fail...

n = len(total['f2'])
def col_counts(col):
    count = []
    for i,j in range(n):
        if (i != j) and (total['f2'][i] <= total['f2'][j]) and (total['lat'][i]==total['lat'][j]) and (total['lon'][i]==total['lon'][j]) and(col[j] == 1): count[i] += 1
    return count
total['c_chicken'] = col_counts(total.F_chicken)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)
<ipython-input-114-879544b4f09a> in <module>
----> 1 total['c_chicken'] = col_counts(total.F_chicken)

<ipython-input-113-ece8cb8d9ef5> in col_counts(col)
      2 def col_counts(col):
      3     count = []
----> 4     for i,j in range(n):
      5         if (i != j) and (total['f2'][i] <= total['f2'][j]) and 
(total['lat'][i]==total['lat'][j]) and (total['lon'][i]==total['lon'][j]) 
and(col[j] == 1): count[i] += 1
      6     return count

TypeError: cannot unpack non-iterable int object

Totally ignoring your logic.

Error is in your loop statement, range(n) returns one iterator and you're trying it to unpack into 2 ie i and j

Any reason you can't use a nested for loop?

for i in range(n):
   for j in range(n):
       #code that uses i and j

Now for your logic

I would recommend using pandas methods instead of explicit for loops.
If f2 is in datetime format, good. Otherwise you should convert it to datetime format via

total['f2'] = pd.to_datetime(total['f2'], format='%Y-%m-%d')

As you want rows with smallest f2 values, you should sort the dataframe on f2 column.

total.sort_values(by='f2')

Now you can drop duplicates based on lat and lon with keep = first and count rows where F_chicken == 1

tmp = total.drop_duplicates(['lat', 'lon'], keep='first')
total['c_chicken'] = tmp[tmp['F_chicken'] == 1].shape[0] # assuming it is int not str, otherwise use '1'

I assume you know it will assign same values to whole c_chickens columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM