[英]How do I add data to a column only if a certain value exists in previous column using Python and Faker?
我对 Python 还很陌生,甚至不知道谷歌要做什么。 我要做的是创建一个 Pandas DataFrame 使用 Faker 填充虚假数据。 我遇到的问题是每一列都在一个孤岛中生成假数据。 我希望能够根据先前列中存在的内容创建虚假数据。
所以在下面的示例中,我有pc_type ["PC", "Apple]
从那里我有操作系统,选项是 Windows 10、Windows 11 和 MacOS。现在我只想要pc_type = "Apple"
列填充 MacOS 的值。然后对于类型为 PC 的所有内容,它是 50% Windows 10 和 50% Windows 11。
我将如何编写此代码,以便在 function 主体中我可以清楚地区分这种区别并且结果将反映这一点?
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
pc_type = ['PC', 'Apple']
fake = Faker()
def create_data(x):
project_data = {}
for i in range(0, x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
project_data[i]['With Windows 10'] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With Windows 11 '] = fake.boolean(chance_of_getting_true=25)
project_data[i]['With MacOS'] = fake.boolean(chance_of_getting_true=50)
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
要获得一致的值,您可以使用以下内容:
from faker import Faker
import pandas as pd
import numpy as np
def create_data(x):
pc_type = ['PC', 'Apple']
fake = Faker()
data = {'Name': [fake.name() for _ in range(x)],
'PC Type': np.random.choice(pc_type, x)}
df = pd.DataFrame(data)
df['With MacOS'] = df['PC Type'] == 'Apple'
pc = df['PC Type'] == 'PC'
w10 = np.random.choice([True, False], len(df), p=(0.5, 0.5))
df['With Windows 10'] = pc & w10
df['With Windows 11'] = pc & ~w10
return df
df = create_data(10)
Output:
>>> df
Name PC Type With MacOS With Windows 10 With Windows 11
0 Charles Dawson PC False True False
1 Patricia Bautista PC False False True
2 Ruth Clark PC False True False
3 Justin Lopez PC False True False
4 Grace Russell PC False True False
5 Grant Moss PC False True False
6 Tracy Ho Apple True False False
7 Connie Mitchell Apple True False False
8 Catherine Nichols Apple True False False
9 Nathaniel Bryant PC False False True
我会稍微改变方法并生成一个列OS
。 如果需要,您可以将此列转换为With MacOS
等。
使用这种方法,更容易在 Windows 中获得 0.5 / 0.5 拆分:
from faker import Faker
from faker.providers import BaseProvider, DynamicProvider
import numpy as np
import pandas as pd
from datetime import datetime
import random
from collections import OrderedDict
pc_type = ['PC', 'Apple']
wos_type = OrderedDict([('With Windows 10', 0.5), ('With Windows 11', 0.5)])
fake = Faker()
def create_data(x):
project_data = {}
for i in range(x):
project_data[i] = {}
project_data[i]['Name'] = fake.name()
project_data[i]['PC Type'] = fake.random_element(pc_type)
if project_data[i]['PC Type'] == 'PC':
project_data[i]['OS'] = fake.random_element(elements = wos_type)
else:
project_data[i]['OS'] = 'MacOS'
return project_data
df = pd.DataFrame(create_data(10)).transpose()
df
Output
Name PC Type OS
0 Nicholas Walker Apple MacOS
1 Eric Hull PC With Windows 10
2 Veronica Gonzales PC With Windows 11
3 Mrs. Krista Richardson Apple MacOS
4 Anne Craig PC With Windows 10
5 Joseph Hayes PC With Windows 10
6 Mary Nelson Apple MacOS
7 Jill Hunt Apple MacOS
8 Mark Taylor PC With Windows 11
9 Kyle Thompson PC With Windows 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.