I have following data-frame.
PredictedFeature Document_IDs did avg
2000.0 [160, 384, 3, 217, 324, 11, 232, 41, 377, 48] 11 0.6
2664.0 [160, 384, 3, 217, 324, 294,13,11] 13 0.9
SO, like this I have a dataframe which has more data like this. Now, what I am trying is I have this did column
in which I have Id
,
Now there is one more column Document_IDs
, which has id's
, so, I want to check weather the 11
document ID is present in this Document ID's
column which is an array like wise.
So, like,
Final output would be like,
did avg present
11 0.6 2
13 0.9 1
2 is 2 times document id 11 is present in this Document Id's column
.
I am totally new to this. So any small help will be great.
You can extract column Document_IDs
with DataFrame.pop
, then flatten values by chain.from_iterable
, so possible sum
matched values in generator with apply
:
import ast
from itertools import chain
df['Document_IDs'] = df['Document_IDs'].fillna('[]').apply(ast.literal_eval)
s = list(chain.from_iterable(df.pop('Document_IDs')))
df['pres'] = df['did'].map(lambda x: sum(y == x for y in s))
print (df)
PredictedFeature did avg pres
0 2000.0 11 0.6 2
1 2664.0 13 0.9 1
Or:
import ast
from itertools import chain
from collections import Counter
df['Document_IDs'] = df['Document_IDs'].fillna('[]').apply(ast.literal_eval)
df['pres'] = df['did'].map(Counter(chain.from_iterable(df.pop('Document_IDs'))))
print (df)
PredictedFeature did avg pres
0 2000.0 11 0.6 2
1 2664.0 13 0.9 1
EDIT:
from ast import literal_eval
def literal_eval_cust(x):
try:
return literal_eval(x)
except Exception:
return []
df['Document_IDs'] = df['Document_IDs'].apply(literal_eval_cust)
Solution using Counter
and map
import collections
c = collections.Counter(df.Document_IDs.sum())
df['Present'] = df.did.map(c)
df[['did', 'avg', 'Present']]
Out[584]:
did avg Present
0 11 0.6 2
1 13 0.9 1
If you want to use a pandas native solution, try this:
df['pres'] = df.apply(lambda x: list(x['Document_IDs']).count(x['did']), axis=1)
I have not tested for calculation speed.
You can also count instances of an item in a list.
For example mylist.count(item)
So I would create a function to apply this to the rows:
def get_id(row):
res = x['Document_IDs'].count(x['did'])
return res
Then apply it, creating a new result
column.
df['result'] = df.apply(get_id,axis=1)
Although I'm sure somebody will come along with a faster version:)
Given the following input:
df = pd.DataFrame([[[3,4,5,6,3,3,5,4], 3], [[1,4,7,8,4,5,1], 4]], columns=['Document_IDs','did'])
In one line:
df['Present'] = df.apply(lambda row: row.Document_IDs.count(row.did), axis=1)
If you want to print the results that interest you:
print(df[['did', 'avg', 'Present']])
did avg Present
0 3 0.6 3
1 4 0.8 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.