bottom

anonymous-demoPersonal Data Generator

The "Personal Data" shared generator is for producing personal data with multiple controlled realism attributes such as geolocation, gender, ages, etc. We're using this generator very intensively for creating customer data to be injected into systems under test.

Most of the time we complete those data with other generator to produce business data such as banking information (bank account, credit card number, etc.). Then we format completed data with a last generator that produce XML requests for a WebService API that will create the customer data strcuture in the target system.

In this generator, provided fields are :

Country Code  Salutation  Birthdate  Street Number Zip Code eMail
Language Code  First Name  Age  Street Name   Phone Job Title
UUID  Last Name   Majority City Name  Country  
Familiy Headcount  Maiden Name   Marital Status State Name  Geoloc  

For the impatients (yes, there are some) below are CSV sample files generated from GEDIS Studio Online with different configuration to illustrate control over produced data.

Personal Data France

French Personal Data (sample of 300 records - UTF8)

  • All are single (family headcount is always 1),
  • All are located in Isle of France (Paris and closest cities in France),
  • 70% are males.
  • Ages are distributed according to Census2012
Personal Data France 

French Personal Data (sample of 300 records - UTF8)

  • 40 % are single,
  • Locations are Paris, Marseille and Lyon
  • 50 % Males
  • Ages are the subset of [ 18 , 25 ] of Census 2012

The extensive control on the generated values provided by our personnal data genertor allow us to produce millions of records segmented in ages, locations, gender etc. Doing so, we are able to create subset of personnal data for different usages or users but injected in the same target system. Allowing to preserve non overlapping / reuse of personnal data between different testing project sharing the same plateform.

We hope you'll enjoy that generator and find it as usefull as we do :)

Names

Names are selected from lists that matches people language code. For example, if you selects FRA language code you will have names like "Paul", "Jean", "Marie", but if you select ENG you will have names like "Paul", "John", "Mary".

Names are also selected based on the gender. There's no gender field in the produced file but the field does exist in the generator. The salutation field is computed based in the Gender, and the Marital Status for women.

Addresses

Addresses are randomly selected in a place that matches i) the country code, ii) a city name filter and iii) a state name filter. These locations are real ones, that means that city, state and zipcode fields are real ones and are correlated. Generated Street names are fake but it is possible to generate real ones as well (we did it for a customer's project based on the OpenStreeMap database).

  • COUNTRY_CODE is generating expression that produce the country code of each record. It means that sucessive addresses may be located in different countries,
  • CITY_FILTER is a regular expression parameter that, when defined, constrains selected city names to those for which there's matching with the regular expression. For example, "Chicago" or "(Chicago)|(Los Angeles)" are valid parameter values.
  • STATE_FILTER is a similar parameter for the state's name

Geolocation

Each address is also geolocated (with a latitude and longitude in degree) so that you can easily spot each record with Google Map or any other Map API. Since street names and numbers are fake, the geolocation of the record is generated from the location of the city center and with distance range controled by a parameter whose default value is 1,500 meters. Take care that since geolocation is randomly generated it may fall into stange points sometimes ...

Phone numbers

Generated values for phone numbers depend on the location of the address. We use information from the international telecom numbering plans to get the prefix of the phone number of each generated location and to generate a realistic phone number.

Family

Records are aggregated into family. A family is a sequence of record which share a same family name and address. Each record has a headcount field providing the number of people in the current family. You have control on the number of record for each new family.

The control over family is acheived using the parameter RATIO_FAMILY which is a number between 0 and 100 used to select if each new sequence is one of a single person or of a family sequence. When it is a family sequence, the headcount is randomly computed from a weighted list of six values from 2 to 7.

Ages and Birthdate

Ages are generated from a Census 2012 statistics avaiable from the US Census 2012. From there you can apply a filter to obtainonly ages that belongs to a range controled by two paramters MIN_AGE and MAX_AGE. Default value for those parameter is for range [ 1, 100 ].

Then the generator computes a birthdate based onthe current date.